Example: biology

Lecture 10: Multiple Testing - UW Genome Sciences

Lecture 10: Multiple TestingGoals Correcting for Multiple Testing in R Methods for addressing Multiple Testing (FWERand FDR) Define the Multiple Testing problem and relatedconceptsType I and II ErrorsCorrect Decision1 - Correct Decision1 - Incorrect Decision Incorrect Decision H0 TrueH0 FalseDo NotReject H0 Rejct H0 Actual Situation Truth DecisionType II ErrorType I Error)()(ErrorIITypePErrorITypeP==!"Why Multiple Testing MattersGenomics = Lots of Data = Lots of hypothesis TestsA typical microarray experiment might result in performing10000 separate hypothesis tests . If we use a standard p-valuecut-off of , we d expect 500 genes to be deemed significant by chance.

Why Multiple Testing Matters Genomics = Lots of Data = Lots of Hypothesis Tests A typical microarray experiment might result in performing 10000 separate hypothesis tests.

Tags:

  Tests, Multiple, Testing, Hypothesis, Hypothesis tests, Multiple testing

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Lecture 10: Multiple Testing - UW Genome Sciences

1 Lecture 10: Multiple TestingGoals Correcting for Multiple Testing in R Methods for addressing Multiple Testing (FWERand FDR) Define the Multiple Testing problem and relatedconceptsType I and II ErrorsCorrect Decision1 - Correct Decision1 - Incorrect Decision Incorrect Decision H0 TrueH0 FalseDo NotReject H0 Rejct H0 Actual Situation Truth DecisionType II ErrorType I Error)()(ErrorIITypePErrorITypeP==!"Why Multiple Testing MattersGenomics = Lots of Data = Lots of hypothesis TestsA typical microarray experiment might result in performing10000 separate hypothesis tests . If we use a standard p-valuecut-off of , we d expect 500 genes to be deemed significant by chance.

2 In general, if we perform m hypothesis tests , what is theprobability of at least 1 false positive?Why Multiple Testing MattersP(Making an error) = P(Not making an error) = 1 - P(Not making an error in m tests ) = (1 - )mP(Making at least 1 error in m tests ) = 1 - (1 - )mProbability of At Least 1 False PositiveCounting Errors Assume we are Testing H1, H2, .., Hm m0 = # of true hypotheses R = # of rejected hypothesesV = # Type I errors [false positives]mm-m0m0 RSVC alledSignificantm - RTUNot CalledSignificantTotalTrueTrueAlternativ eNullWhat Does Correcting for MultipleTesting Mean?

3 When people say adjusting p-values for the number ofhypothesis tests performed what they mean iscontrolling the Type I error rate Very active area of statistics - many different methodshave been described Although these varied approaches have the same goal,they go about it in fundamentally different waysDifferent Approaches To Control Type I Errors Per comparison error rate (PCER): the expected value of the numberof Type I errors over the number of hypotheses, PCER = E(V)/m Per-family error rate (PFER): the expected number of Type I errors, PFE = E(V).

4 Family-wise error rate: the probability of at least one type I error FEWR = P(V 1) False discovery rate (FDR) is the expected proportion of Type I errorsamong the rejected hypotheses FDR = E(V/R | R>0)P(R>0) Positive false discovery rate (pFDR): the rate that discoveries arefalse pFDR = E(V/R | R > 0)Digression: p-values Implicit in all Multiple Testing procedures is theassumption that the distribution of p-values is correct This assumption often is not valid for genomics datawhere p-values are obtained by asymptotic theory Thus, resampling methods are often used to calculatecalculate the problem: think carefully about the null andalternative a test the test statistic for the original labeling of the labels and recalculate the test statistic Do all permutations: Exact Test Randomly selected subset.

5 Monte Carlo p-value by comparing where the observed teststatistic value lies in the permuted distributed of test statisticsExample: What to Permute?GeneCase 1 Case 2 Case 3 Case 4 Control 1 Control 2 Control 3 Control 41X11X12X13X14X15X16X17X182X21X22X23X24X 25X26X27X283X31X32X33X34X35X36X37X384X41 X42X43X44X45X46X47X48mXm1Xm2Xm3Xm4Xm5Xm6 Xm7Xm8 Gene expression matrix of m genes measured in 4 casesand 4 To Multiple Testing : FWER Many procedures have been developed to control theFamily Wise Error Rate (the probability of at least onetype I error):P(V 1) Two general types of FWER step: equivalent adjustments made to : adaptive adjustment made to each p-valueSingle Step Approach.

6 Bonferroni Very simple method for ensuring that the overall Type Ierror rate of is maintained when performing mindependent hypothesis tests Rejects any hypothesis with p-value /m:! p j=min[mpj, 1] For example, if we want to have an experiment wide Type Ierror rate of when we perform 10,000 hypothesis tests ,we d need a p-value of = 5 x 10-6 to declaresignificancePhilosophical Objections to BonferroniCorrections Bonferroni adjustments are, at best, unnecessaryand, at worst, deleterious to sound statisticalinference Perneger (1998) Counter-intuitive: interpretation of finding depends on thenumber of other tests performed The general null hypothesis (that all the null hypotheses aretrue) is rarely of interest High probability of type 2 errors, of not rejecting thegeneral null hypothesis when important effects existFWER: Sequential Adjustments Simplest sequential method is Holm s Method Order the unadjusted p-values such that p1 p2.

7 Pm For control of the FWER at level , the step-down Holm adjusted p-values are The point here is that we don t multiply every pi by the same factor m! p j=min[(m"j+1) pj, 1]! p 1=10000 p1, p 2=9999 p2,.., p m=1 pm For example, when m = 10000:Who Cares About Not Making ANYType I Errors? FWER is appropriate when you want to guard againstANY false positives However, in many cases (particularly in genomics) wecan live with a certain number of false positives In these cases, the more relevant quantity to control isthe false discovery rate (FDR)False Discovery Ratemm-m0m0 RSVC alledSignificantm - RTUNot CalledSignificantTotalTrueTrueAlternativ eNullV = # Type I errors [false positives] False discovery rate (FDR) is designed to control the proportionof false positives among the set of rejected hypotheses (R)

8 FDR vs FPRmm-m0m0 RSVC alledSignificantm - RTUNot CalledSignificantTotalTrueTrueAlternativ eNull! FDR=VR! FPR=Vm0 Benjamini and Hochberg FDR To control FDR at level :! p(j)"#jm2. Then find the test with the highest rank, j, for which the pvalue, pj, is less than or equal to (j/m) x 1. Order the unadjusted p-values: p1 p2 .. pm3. Declare the tests of rank 1, 2, .., j as significantB&H FDR H0 ?(j/m) P-valueRank (j)Controlling the FDR at = s positive FDR (pFDR)! BH:FDR=EVR|R>0" # $ % & ' P(R>0)! Storey:pFDR=EVR|R>0" # $ % & ' Since P(R > 0) is ~ 1 in most genomics experiments FDRand pFDR are very similar Omitting P(R > 0) facilitated development of a measure ofsignificance in terms of the FDR for each hypothesisWhat s a q-value?

9 Q-value is defined as the minimum FDR that can be attained whencalling that feature significant ( , expected proportion of falsepositives incurred when calling that feature significant) The estimated q-value is a function of the p-value for that testand the distribution of the entire set of p-values from the family oftests being considered (Storey and Tibshiriani 2003) Thus, in an array study Testing for differential expression, if gene Xhas a q-value of it means that of genes that show p-values at least as small as gene X are false positivesEstimating The Proportion of Truly NullTestsDistribution of P-values under the Under the null hypothesis p-values are expected to be uniformlydistributed between 0 and The Proportion of Truly NullTests Under the alternative hypothesis p-values are skewed towards 0 Estimating The Proportion of Truly NullTests Combined distribution is a mixture of p-values from the null andalternative The Proportion of Truly NullTests For p-values greater than say.

10 We can assume they mostlyrepresent observations from the null of 0 is the proportion of truly null ! " 0(#)=#{pi>#;i=1,2,..,m}m(1$#)! " 0 1 - is the proportion of truly alternative tests (very useful!)! " 0


Related search queries