Example: biology

E th n i c i ty E s ti ma te 2 0 2 1 W h i te P a p e r

Ethnicity Estimate 2021 White PaperJeffrey Adrion, Nathan Berkowitz, Keith Noto, Alisa Sedghifar, Barry Starr, DavidTurissini, Yong Wang, Aaron Wolf (in alphabetical order)Summary:The AncestryDNA science team has developed a fast, sophisticated, and accurate method forestimating the historical origins of customers DNA going back several hundred to over 1,000years. Our newest approach improves upon our previous version in the number of possibleregions that a customer might be assigned (from 70 to 77) as well as an increase in accuracy toboth regions assigned and the percentage assigned to each region.

cust omer ’s DNA l ooks most si mi l ar t o DNA i n t he ref erence panel f rom peopl e f rom Norway, t hat sect i on of t he cust omer ’s DNA i s sai d t o be f rom Norway, and so on. T he end resul t i s a port rai t of a cust omer ’s DNA made up of percent ages of t he 77 regi ons cont ai ned i n t he ref erence ...

Tags:

  Cust

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of E th n i c i ty E s ti ma te 2 0 2 1 W h i te P a p e r

1 Ethnicity Estimate 2021 White PaperJeffrey Adrion, Nathan Berkowitz, Keith Noto, Alisa Sedghifar, Barry Starr, DavidTurissini, Yong Wang, Aaron Wolf (in alphabetical order)Summary:The AncestryDNA science team has developed a fast, sophisticated, and accurate method forestimating the historical origins of customers DNA going back several hundred to over 1,000years. Our newest approach improves upon our previous version in the number of possibleregions that a customer might be assigned (from 70 to 77) as well as an increase in accuracy toboth regions assigned and the percentage assigned to each region.

2 We have added seven netnew regions as well as made improvements to the composition of our reference panel, resultingin more accurate estimates overall. Given the cutting-edge nature of this type of science, we willcontinue to refine our approach and improve basic idea behind ethnicity estimation involves comparing a customer s DNA to the DNA ofpeople with long family histories in a particular region or group, what we call the referencepanel, and looking for segments of DNA that are most similar. If, for example, a section of acustomer s DNA looks most similar to DNA in the reference panel from people from Norway, thatsection of the customer s DNA is said to be from Norway, and so on.

3 The end result is a portraitof a customer s DNA made up of percentages of the 77 regions contained in the is a short version of how AncestryDNA determines a customer s ethnicity estimate. Therest of the white paper will delve more deeply into1. How the reference panel samples are chosen, their makeup, and how the panel is validated2. How the algorithm that determines a customer s genetic ethnicity works and how it isvalidated1. IntroductionGenetic ethnicity estimates that determine which populations in a reference panel are mostsimilar to someone s DNA are a major component of the DNA Story provided by its name suggests, DNA Story provides customers with insights into their past by analyzingtheir has employed a team of highly trained scientists with backgrounds in populationgenetics, statistics, machine learning, and computational biology to develop a fast,sophisticated, and accurate method for estimating genetic ethnicity for our customers.

4 In thisdocument, we describe the approach we use to estimate customers genetic ethnicity. We willdiscuss the development of the reference panel we compare each customer sample against, theinference method we apply to estimate genetic ethnicity, and finally the extensive testingregimen we employ to assess the quality of our Having ancestry from multiple A variant in the DNA sequence. For example,a SNP (defined below) could have two alleles: Aor (cM) A unit of genetic length in thegenome. Two genomic positions that are acentimorgan apart have a 1% chance during each meiosis (the cell division that creates egg cells orsperm) of experiencing a recombination event between A large, inherited piece of DNA.

5 Humanstypically have 23 pairs of chromosomes withone copy of each pair inherited from each All of someone s genetic information; theDNA on all chromosomesGenotype A general term for observed genetic variationeither for a single site or the whole A stretch of DNA along a chromosomeHidden Markov model (HMM) A statistical model fordetermining a series of hidden states based on aset of observationsLocus A location in the genome. It could be a singlesite or a larger stretch of a DNA microarray is a way to analyzehundreds of thousands of DNA markers all at DNA is composed of strings of moleculescalled nucleotides (also called bases).

6 There arefour different types and they are usually represented by their initials: A, C, G, A group of peoplePhasing The assignment of DNA to contiguous segments corresponding to the DNA inherited fromMom or Dad. This is done with an Before chromosomes are passed downfrom parent to child, each pair ofchromosomes usually exchange long segments between one another and then are reattached in aprocess called nucleotide polymorphism (SNP) A single position(nucleotide) in the genome where differentvariants (alleles) are seen in different Reference Calculating an Ethnicity EstimateTwo chromosomes from the same geographic region or the same population will share more DNA withone another than will two chromosomes from different regions or groups.

7 So two pieces of DNA with ahistorical connection to Portugal will have more DNA in common than will a piece of DNA from Korea anda piece of DNA from Portugal. This is the basic premise behind the ethnicity estimate AncestryDNAprovides to its create the ethnicity estimate, we compare a customer s DNA to a panel of DNA from people withknown origins (referred to as the reference panel) and look to see which parts of the customer s DNA aresimilar to those from people represented in groups in the reference panel. If, for example, a section of acustomer s DNA is most similar to the reference panel samples from Senegal, then we identify thatsection of the customer s DNA as coming from accuracy of our ethnicity estimate depends on the quality of our reference panel.

8 Because of this,AncestryDNA has invested a significant amount of effort in developing the best possible set of : Reference Panel Refinement the ethnicity estimation reference panel refinement cycle. Instep 1we select candidate reference samples from published data, the AncestryDNA customer list, and the AncestryDNA proprietaryreference collection. For AncestryDNA samples we rely on pedigree data to select those with deep ancestry from a singlepopulation. Instep 2we filter out pieces of DNAbetween closely related samples from the candidate list. Instep 3we use principalcomponent analysis (PCA) to remove samples that show a disagreement in pedigree and genetic origin.

9 We also use PCA to guidethe identification of population groups. Instep 4the panel is performance tested using numerous metrics and compared to theprevious release. The final result is a high-quality, well-tested reference panel. The entire procedure is cyclic, and AncestryDNA willcontinue to make improvements to the panel with the goal of providing the most accurate ethnicity estimation possible with the rest of section 2 describes the steps taken to develop our current reference panel, including sampleselection, quality control, and testing. The ethnicity update that we describe here is not only an update ofthe reference panel from our 2020 version but also increases the number of global regions from 70 to Who should be included in the reference panel?

10 Identifying the best candidates for the reference panel is key to providing the most accurate ethnicityestimate possible from a customer s DNA sample. Under perfect circumstances, we would construct ourreference panel using DNA samples from people who lived hundreds of years ago. Unfortunately, it is notyet possible to reliably sample historical populations in this way. Instead, we must rely on DNA samplescollected from people alive today and focus on those who can trace their ancestry to a single geographiclocation or population asked to trace familial origins, most people can only reliably go back one to five generations,making it difficult to find individuals with knowledge about more distant ancestry.


Related search queries