Example: confidence

Genomic BLUP Decoded: A Look into the Black Box of …

Genomic SELECTIONG enomic BLUP decoded : a look into the Black Boxof Genomic PredictionDavid Habier,*, ,1 Rohan L. Fernando,* and Dorian J. Garrick**Department of Animal Science and Center for Integrated Animal genomics , Iowa State University, Ames, Iowa 50011,and DuPont Pioneer, Johnston, Iowa 50131 ABSTRACTG enomic best linear unbiased prediction (BLUP) is a statistical method that uses relationships between individualscalculated from single-nucleotide polymorphisms (SNPs) to capture relationships at quantitative trait loci (QTL). We show that genomicBLUP exploits not only linkage disequilibrium (LD) and additive-genetic relationships, but also cosegregation to capture relationships atQTL. Simulations were used to study the contributions of those types of information to accuracy of Genomic estimated breeding values(GEBVs), their persistence over generations without retraining, and their effect on the correlation of GEBVs within families.

GENOMIC SELECTION Genomic BLUP Decoded: A Look into the Black Box of Genomic Prediction David Habier,*,†,1 Rohan L. Fernando,* and Dorian J. Garrick* *Department of Animal Science and Center for Integrated Animal Genomics, Iowa State University, Ames, Iowa 50011,

Tags:

  Black, Prediction, Look, Genomics, Publ, A look, Decoded, Genomic blup decoded, Black box of genomic prediction

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Genomic BLUP Decoded: A Look into the Black Box of …

1 Genomic SELECTIONG enomic BLUP decoded : a look into the Black Boxof Genomic PredictionDavid Habier,*, ,1 Rohan L. Fernando,* and Dorian J. Garrick**Department of Animal Science and Center for Integrated Animal genomics , Iowa State University, Ames, Iowa 50011,and DuPont Pioneer, Johnston, Iowa 50131 ABSTRACTG enomic best linear unbiased prediction (BLUP) is a statistical method that uses relationships between individualscalculated from single-nucleotide polymorphisms (SNPs) to capture relationships at quantitative trait loci (QTL). We show that genomicBLUP exploits not only linkage disequilibrium (LD) and additive-genetic relationships, but also cosegregation to capture relationships atQTL. Simulations were used to study the contributions of those types of information to accuracy of Genomic estimated breeding values(GEBVs), their persistence over generations without retraining, and their effect on the correlation of GEBVs within families.

2 We showthat accuracy of GEBVs based on additive-genetic relationships can decline with increasing training data size and speculate thatmodeling polygenic effects via pedigree relationships jointly with Genomic breeding values using Bayesian methods may prevent thatdecline. Cosegregation information from half sibs contributes little to accuracy of GEBVs in current dairy cattle breeding schemes butfrom full sibs it contributes considerably to accuracy within family in corn breeding. Cosegregation information also declines withincreasing training data size, and its persistence over generations is lower than that of LD, suggesting the need to model LD andcosegregation explicitly. The correlation between GEBVs within families depends largely on additive-genetic relationship information,which is determined by the effective number of SNPs and training data size.

3 As Genomic BLUP cannot capture short-range LDinformation well, we recommend Bayesian methods witht-distributed best linear unbiased prediction (BLUP) is a sta-tistical method that has been used to predict height inhumans (Yanget ) and breeding values for selectionin animal and plant breeding (VanRaden 2008). It uses a so-called Genomic relationship matrix that describes geneticrelationships between individuals calculated from genotypesat single-nucleotide polymorphisms (SNPs). In Genomic se-lection applications (Meuwissenet ), those individ-uals comprise both training individuals that are phenotypedfor a quantitative trait and genotyped at SNPs and selectioncandidates that are genotyped BLUP differs from the traditional pedigree BLUP(Henderson 1975) in the replacement of the pedigreerelationship matrix with a Genomic relationship of the pedigree relationship matrix describeadditive-genetic relationships (Mal cot 1948) between indi-viduals at quantitative trait loci (QTL) conditional on pedi-gree information, but it is not obvious to what extent thegenomic relationship matrix explains genetic covariancesbetween individuals at QTL.

4 Despite this, several authorscalled the Genomic relationship matrix the actual (Hill andWeir 2011) or realized relationship matrix (Goddard 2009;Hayeset ; Leeet ) as it describesidentity-by-descentat SNPs (Hayeset ), assuming an ancientfounder population. However, these terms are misleadingbecause only genetic relationships at QTL matter in quanti-tative-genetic understand better how Genomic relationships capturerelationships at QTL, we propose to apply concepts ofpedigree analyses that define founders in a recent pastgeneration. Based on these concepts, we show that coef-ficients of the Genomic relationship matrix do not explaingenetic covariances between individuals at QTL unlesseither there is linkage disequilibrium (LD) between QTLand SNPs measured in founders or selection candidates arerelated by pedigree to the training individuals.

5 The latterresults in cosegregation of alleles at QTL and SNPs that areCopyright 2013 by the Genetics Society of Americadoi: received December 13, 2012; accepted for publication April 21, 2013 Supporting information is available online data deposited in the Dryad Repository: author: DuPont Pioneer, 8305 NW 62nd Ave., PO Box 7060, Johnston,IA 50131-7060. E-mail: Vol. 194, 597 607 July 2013597linked, also known aslinkageinformation, and in additive-genetic relationships at QTL captured by SNPs. These threetypes of information affect differently the persistence of ac-curacy of Genomic estimated breeding values (GEBVs) fromBLUP over generations (Habieret ), realized selec-tion intensities, and inbreeding. The contributions of theseparameters to accuracy of GEBVs depending on trainingdata size, extent of LD, and mating design have not beendemonstrated; a better understanding will allow us tooptimize statistical models, training data, and observed in the training data werefirst believed to bethe only source of information until Habieret al.

6 (2007) andGianolaet al.(2009) demonstrated that SNP genotypes alsocapture pedigree relationships. Habieret al.(2007) parti-tioned the observed accuracy of GEBVs into a part due toLD in the training data and a remainder due to pedigreerelationships. Accuracy due to LD is the component of accu-racy that persists over generations without retraining andprovides the accuracy for individuals that are unrelated tothe training individuals. Compared to Bayesian methodswitht-distributed priors (Meuwissenet ), accuracydue to LD tends to be lower with Genomic BLUP (Habieret , 2010a, 2011).Goddard (2009) presented formulas for calculating theaccuracy due to LD, but derivations assume that the markerscompletely capture the variability at the QTL. Nevertheless,that accuracy was calculated as a function of the effectivenumber of chromosomal segments, which was estimatedonly from effective population size and genome length.

7 Realdata analyses have shown that accuracy due to LD varies forquantitative traits with similar heritability (Habieret , 2011), and thus different genetic architectures can-not be described by those and similar formulas (Daetwyleret , 2010). Also, modeling pedigree relationshipsbetween training individuals and selection candidates isnot straightforward if only LD parameters are used to ex-plain is traditionally exploited in linkage anal-yses. The advantage of cosegregation information is theability to explain both rare allelic variants and structuralvariations if they segregate within families. Several authors(Goddard 2009; Hayeset ; Habieret ;Goddardet ) assumed it is utilized in Genomic BLUP,but that has never been formally proven in the presence ofLD and pedigree relationships or quantified.

8 A statisticalmethod that explicitly models both LD and cosegregationwas proposed for Genomic selection (Caluset ),but it did not outperform a Bayesian method similar toBayesA (Meuwissenet ). The question remains,how much cosegregation is captured implicitly by genomicBLUP compared to methods that model LD and cosegrega-tion explicitly ( , Meuwissenet ; Fernando 2003;P rez-Enciso 2003; Legarra and Fernando 2009)?This article has two objectives: (1) to present conceptsthat allow us to disentangle LD, cosegregation, and additive-genetic relationships and (2) to study the contributions ofthese parameters to accuracy of GEBVs depending on SNPdensity, training data size, and extent of LD. Dairy cattle andcorn breeding scenarios were simulated to evaluate accuracyof GEBVs both within and across families obtained bydifferent types of information, discrepancy between accu-racy of GEBVs due to additive-genetic relationships andaccuracy of traditional pedigree-based selection indexes,persistence of accuracy due to LD and due to cosegregationfrom one generation to the next without retraining, and theeffect of each type of information on the correlation ofGEBVs within families.

9 Accuracies within families for thecase of linkage equilibrium between QTL and SNPs wereused to demonstrate unambiguously that Genomic BLUP captures cosegregation, as there are no additive-geneticrelationships within family. In addition, formulas for thecovariance between true and estimated breeding valueswere derived for a simplified scenario to prove that all threesources of information are utilized by Genomic modelTrait phenotypes of training individuals are simulated by theassumed true genetic modely 1m Wa e(1)(Goddard 2009; Hayeset : Goddardet ),wherey,a, andeare vectors containing trait phenotypes,additive QTL effects, and residual effects, respectively;misthe overall mean; andWis a matrix of genotype scores atbiallelic QTL. Each score is coded as the number of one ofthe two alleles at a locus adjusted by twice the frequency ofthe counted allele in founders.

10 Both QTL and residual effectsare treated as random with mean zero and with variance covariance matricesIs2aandIs2e, respectively. The aim of thefollowing statistical analysis is to useyfor estimating thetrue breeding value of an individualigiven bygi w9ia,wherew9icontains QTL genotype modelPhenotypes generated by the genetic model are used iny 1m g e;(2)wheregandeare vectors containing breeding values andresidual effects, respectively. Breeding values ingare ran-dom with mean zero and variance covariance matrixGs2b,whereG=ZZ9,Zis a matrix of genotype scores atKSNPs(VanRaden 2008),s2b s2A=2 PKk 1pk 12pk (Habieret ),s2A s2aPNqtlq 12pq 12pq is the additive-genetic var-iance (Gianolaet , Equation 18),s2ais the varianceof additive QTL effects with mean zero,pqis the allele fre-quency at QTLqin founders, andpkis the allele frequencyat SNPkin founders.


Related search queries