Example: stock market

Functional coding haplotypes and machine-learning feature ...

Functional coding haplotypes and machine-learningfeature elimination identifies predictors ofMethotrexate Response in rheumatoid ArthritispatientsAshley Lim,a,yLee Jin Lim,a,yBrandon Ooi,aEe Tzun Koh,bJustina Wei Lynn Tan,bTTSH RA Study GroupbSamuel S. Chong,cChiea Chuen Khor,dLisa Tucker-Kellogg,eKhai Pang Leong,b,f,#**and Caroline G. Lee,a,g,h,i*,#aDept of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, SingaporebDepartment of Rheumatology, Allergy and Immunology, Tan Tock Seng Hospital, SingaporecDept of Pediatrics, Yong Loo Lin School of Medicine, National University of Singapore, SingaporedDivision of Human Genetics, Genome Institute of Singapore, SingaporeeCentre for Computational Biology, and Cancer and Stem Cell Biology, Duke-NUS Medical School, SingaporefClinical Research & Innovation Office, Tan Tock Seng Hospital, SingaporegDiv of Cellular & Molecular Research, Humphrey Oei Institute of Cancer Research, National Cancer Centre Singapore,SingaporehDuke-NUS Medical School, SingaporeiNUS Graduate School, National University of Singapore.

response in rheumatoid arthritis (RA) patients. Methods Exome sequencing from 349 RA patients were analysed, of which they were split into training and unseen test set. Inferred pfcHaps were combined with 30 non-genetic features to undergo ML recursive feature elimination with cross-validation using the training set.

Tags:

  Rheumatoid

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Advertisement

Transcription of Functional coding haplotypes and machine-learning feature ...

1 Functional coding haplotypes and machine-learningfeature elimination identifies predictors ofMethotrexate Response in rheumatoid ArthritispatientsAshley Lim,a,yLee Jin Lim,a,yBrandon Ooi,aEe Tzun Koh,bJustina Wei Lynn Tan,bTTSH RA Study GroupbSamuel S. Chong,cChiea Chuen Khor,dLisa Tucker-Kellogg,eKhai Pang Leong,b,f,#**and Caroline G. Lee,a,g,h,i*,#aDept of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, SingaporebDepartment of Rheumatology, Allergy and Immunology, Tan Tock Seng Hospital, SingaporecDept of Pediatrics, Yong Loo Lin School of Medicine, National University of Singapore, SingaporedDivision of Human Genetics, Genome Institute of Singapore, SingaporeeCentre for Computational Biology, and Cancer and Stem Cell Biology, Duke-NUS Medical School, SingaporefClinical Research & Innovation Office, Tan Tock Seng Hospital, SingaporegDiv of Cellular & Molecular Research, Humphrey Oei Institute of Cancer Research, National Cancer Centre Singapore,SingaporehDuke-NUS Medical School, SingaporeiNUS Graduate School, National University of Singapore.

2 SingaporeSummaryBackgroundMajor challenges in large scale genetic association studies include not only the identification of causa-tive single nucleotide polymorphisms (SNPs), but also accounting for SNP-SNP interactions. This study thus pro-poses a novel feature engineering approach integrating potentially Functional coding haplotypes (pfcHap) withmachine-learning (ML) feature selection to identify biologically meaningful, possibly causative genetic factors, thattake into consideration potential SNP-SNP interactions within the pfcHap, to best predict for methotrexate (MTX)response in rheumatoid arthritis (RA) sequencing from 349 RA patients were analysed, of which they were split into training and unseentest set. Inferred pfcHaps were combined with 30 non-genetic features to undergo ML recursive feature eliminationwith cross-validation using the training set. Predictive capacity and robustness of the selected features were assessedusing six popular machine learning models through a train set cross-validation and evaluated in an unseen test , 100 features (95 pfcHaps, 5 non-genetic factors) were identified to have good predictive per-formance (AUC: ; Sensitivity: ; Specificity: ) across all six ML models in anunseen test dataset for the prediction of MTX response in RA of the predictive pfcHap SNPs were predicted to be potentially Functional and some of thegenes in which the pfcHap resides in were identified to be associated with previously reported MTX/RA Ministry of Health s National Medical Research Council (NMRC) [NMRC/CBRG/0095/2015;CG12 Aug17; CGAug16M012; NMRC/CG/017/2013]; National Cancer Center Research Fund and block fundingDuke-NUS Medical School.

3 ; Singapore Ministry of Education Academic Research Fund Tier 2 grant MOE2019-T2-1-138.*Address correspondence to: Caroline GL. Lee, Department of Biochemistry, Yong Loo Lin School of Medicine, NationalUniversity of Singapore c/o MD7, Level 2, 8 Medical Drive, Singapore 117597. Tel: (+65) 64368353.**Leong Khai Pang, MBBS,Department of Rheumatology, Allergy and Immunology Tan Tock Seng Hospital, 11 Jln Tan Tock Seng, Singapore 308433. Tel:(+65) Lee).yLee Jin Lim and Ashley Lim contributed equally to this work#Equal Contribution to this workeBioMedicine 2022;75:103800 Published online 10 Jan-uary 2022 Vol 75 Month January, 20221 ArticlesCopyright 2022 The Authors. Published by Elsevier This is an open access article under the CC BY-NC-NDlicense ( )Keywords: rheumatoid Arthritis; Methotrexate; Genetic polymorphism; Machine learning; feature selection;HaplotypesIntroductionOf the diverse factors influencing drug response, eluci-dating the genetic basis that underlie differences indrug response may facilitate the development of tools topredict drug response even before the drug is adminis-tered.

4 Although there is growing interest in pharmaco-genomics, the focus has thus far been primarily on theidentification of common gene variants with large effectsize on known pharmacokinetic, pharmacodynamicsand/or immuno-pharmacogenomics candidate 3 However, polymorphisms in other genes or raregene variants, either alone or in combination, may, alsomodulate drug response but have yet to be the advent of high throughput geno-mic tools, it is now possible to explore the association ofother genes with drug response using GWAS (GenomeWide Association Study), exome sequencing or evenWGS (Whole Genome Sequencing).3 Current genomicapproaches which mainly focused on tag-SNPs inGWAS only represent a very small proportion of allpotentially Functional SNPs (pfSNPs) in the humangenome with likelihood to be causative. While WGSwould be the most ideal approach to examine all SNPsincluding pfSNPs, exome sequencing is a cost-effectiveway to facilitate the interrogation of all pfSNPs in themost informative Functional region of the genome,namely, the coding far, traditional statistical methods have beenthe primary tool to associate genetic and other featureswith drug response and/or disease susceptibility.

5 How-ever, these statistical approaches have some 6 This includes the requirement for largesample sizes, which can potentially be mitigated bymachine learning (ML) which are suited for highdimensionality complex problems, including the con-sideration for non-linear interactions ,8 Nonetheless, a major limitation of biomedi-cal datasets, even for ML, is that their high dimensional-ity is often coupled with limited labelled sample size,which pose a challenge for learning models to predictindividualized response to drugs. ML-based dimension-ality reduction and feature selection strategies can helpreduce the feature space and select the most informativefeatures that can accurately predict an outcome. None-theless, to achieve acceptable accuracy in pharmacoge-nomics, careful data pre-processing and featurehandcrafting with strong domain knowledge9is , we introduce a novel biologically meaningful, feature pre-processing/engineering strategy focused onhaplotypes of SNPs in the coding regions of genes withthe potential to be Functional (pfcHap).

6 By integratingResearch in ContextEvidence before this studyMethotrexate (MTX) is afirst-line medication for rheu-matoid arthritis (RA) patients despite low monotherapyresponse of only 25 to 45 percent. Identification of non-responders is necessary to help mitigate disease pro-gression. However, evaluation of treatment responseoften takes three to six months following MTX adminis-tration. Current research on MTX response in RA are pri-marily focused on genetic variation in specific genesinvolved in MTX-related pathways and RA susceptibilitywith few interrogating the entire genome usinggenome-wide association studies (GWAS). Nonetheless,several challenges are associated with current GWAS including high data dimensionality and the identifica-tion of causative value of this studyIn this study, we employed a cost-effective exomesequencing approach to interrogate haplotype of singlenucleotide polymorphisms (SNPs) in the coding regionof genes which is deemed as one of the most informa-tive Functional regions of the genome.

7 To reduce thehigh dimensionality data, and capitalize on the propertythat SNPs within a Functional unit ( coding region)interact to modulate structure/function of the targetprotein, we inferred SNP haplotypes in the codingregion and employed machine-learning (ML) to identifypotentially Functional coding haplotypes (pfcHaps) thatbest predicts MTX response in RA patients. Notably, thepredictive pfcHap SNPs and genes were predicted to befunctional and associated with previously reportedMTX/RA pathways, respectively, highlighting the prom-ise of this of all the available evidenceTaken together, we envision that the best predictorsidentified will be effective in aiding decision-making forthe treatment of RA patients after further validation inlarger multi-institutional studies. Furthermore, webelieve that our analysis pipeline for handling and inter-pretation of genetic data will also be applicable in othercontexts beyond MTX response in RA Vol 75 Month January, 2022pfcHap together with ML feature elimination and selec-tion, the strategy identifies a signature of potentiallycausative genetic and non-genetic factors that canrobustly predict response to the methotrexate (MTX)drug in rheumatoid Arthritis (RA)

8 Patients acrossdiverse ML was proposed to be particularly appropriate forpersonalized therapy because of the costly therapy dueto prolonged disease duration, low response to the con-ventional therapy, trial-and-error nature of therapy pre-scription, and the risk of serious drug-induced disability brought on by comorbiditiessuch as coronary artery disease and hyperlipidaemia11incombination with poorly controlled RA adds to thehealthcare costs and further strains the health care is a chronic inflammatory disease involving pri-marily the joints with a prevalence of an average of 5per 1000 people12that varies across different imposes huge socioeconomic burden on boththe patient and society as it commonly affects middle-aged adults at their economic 22 Inadequatetreatment of RA leads to irreversible joint damageresulting in potential disabilities that affects thepatient s quality of life and work productivity, and evenpremature 25 Hence, timely and appropriatecontrol of this condition is critical to minimize the mor-bidity and the disease-modifyingantirheumatic drugs (DMARDs)

9 In RA, MTX is theanchor agent and the recommended first-line choice forthe majority of RA ,27 Approximately 25 to 40percent of patients improve with MTX monotherapy,which is further increased to 50 percent for patientsreceiving combination therapy with with inadequate response to MTX monother-apy are offered alternative biologic and targeted syn-thetic current state-of-the-artmanagement of RA is still primarily based on trial-and-error with recommendations from the European LeagueAgainst Rheumatism (EULAR) being to assess the effec-tiveness of MTX therapy between three to six months ofadministering the drug and re-evaluating the treatmentapproach of poor responders suggeststhat rheumatologists only know the effectiveness ofMTX after the patient is already on the drug for 3-6months. Earlier identification of poor responders ofMTX, preferably even before drug administration, willenable prompt initiation of alternative treatment whichcould help mitigate disease date, the study of MTX response in RA have beenfocused on the genetic variability in specific genes, ofteninvolving those in MTX-related pathways or RA suscep-tibility or Genome wide association study (GWAS) inter-rogating individual recent reviewsummarised 125 SNPs from 34 genes involved withMTX metabolism, transport or RA progression/patho-genesis were previously evaluated for associations withMTX , some of these studies havereported contradictory results, including conflictingreports of associations of polymorphism rs1045642(3435C>T) in the ATP-binding cassette B1 (ABCB1)

10 Transporter gene, with MTX efficacy in two separate Jap-anese cohort ,33 While there is recent increasing interest to employML for electronic diagnosis, prediction of disease pro-gression and drug response of RA patients,34thesemethods remain at its infancy, with few studies explor-ing predictive models to evaluate MTX drug mainly focus on specific subsets of SNPs withnon-genetic factors35,36or other molecular signatures( transcription/epigenetic-based signatures).37 Mostof the methotrexate predictive models employed simplemachine learning models such as logistic regressionand mainly focused on electronic medical records, orjuvenile RA. Thus far, more complex ML models withcareful domain-based, biologically meaningful featurehandcrafting have yet to be applied comprehensively toimprove predictive performance of response to haplotype-based studies for other diseasesmainly examined haplotype of SNPs within specificwindow sizes38or employed to account for familial cor-relation for association between rare haplotypes andcomplex , we report a novel, cost-effec-tive approach through exome sequencing to interrogatehaplotypes of SNPs in the coding regions of genes(pfcHap) of the entire human genome, since codingregions, which are translated into proteins representone of the most Functional regions of genes.


Related search queries