Transcription of q-value - CBCB
1 q-value Tiffany Chao Beth Johnson Steven Lee Hypothesis testing Test for each gene null hypothesis: no differential expression Two kinds of errors type I error (false positive). say that a gene is differentially expressed when it actually isn't; wrongly reject a true null hypothesis type II error (false negative). say that a gene isn't differentially expressed when it actually is; fail to reject a false null hypothesis Thinking about p-values Probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming the null hypothesis is true Minimum false positive rate at which an observed statistic can be called significant If null hypothesis is simple, then a null p- value is uniformly distributed Multiple comparison problem Even if we have useful approximations for our p-values.
2 We still face the multiple comparison problem When performing many independent tests, p-values no longer have the same interpretation Not only in genomics! "Statistical Comparisons of Classifiers over Multiple Datasets", Demsar, JMLR 2006. "Permutation Tests for Studying Classifier Performance", Ojala, JMLR 2010. "On Comparing Classifiers: Pitfalls to avoid and a recommended approach", Salzberg, 1997, Data Mining and Knowledge Discovery Multiple hypothesis testing Called Called not Total significant significant Null true F m0 F m0.
3 Alternative true T m1 T m1. Total S m S m Suppose we care about p-values Error rates (more on this later). Per comparison error rate (PCER). E[F] / m Per family error rate (PFER). E[F]. Family-wise error rate (FWER). Pr(F 1). False discovery rate (FDR)*. E[F/S] (and set F/S = 0 when S = 0). = E[F/S | S > 0] Pr(S > 0). Positive false discovery rate (pFDR)*. E[F/S | S > 0]. MHT error controlling procedure Suppose you test m hypotheses and get m p- values: p1 , p2 , p3 , .. pm A multiple hypothesis test error controlling procedure is a function T(p; ) such that rejecting all nulls with pi T(p; ) implies that Error.
4 Error is a population quantity (not random). Weak and strong control Weak: T(p; ) is such that Error only when m0 = m Strong: T(p; ) is such that Error for any value of m0. note that m0 is not an argument for T(p; )! Bonferroni correction provides strong control: but too restrictive Why FDR and q-value ? To help us interpret these values, two pieces of information would be useful Estimate of the overall proportion of features that are truly alternative (even if they cannot be precisely identified). Measure of significance that can be associated with each feature so that thresholding the numbers at a particular value has an easy interpretation FDR.
5 Would like an error measure that provides a balance between Number of false positive features (F). Number of true positive features (T). FDR. The false discovery rate is the expected value of the proportion of false positive features among all those called significant *Some possibility S = 0, so some adjustment has to be made to definition of FDR. Estimating FDR. Therefore, the FDR depends on what threshold (t) we are using to determine significance Estimating FDR. Because we are considering many features (m is very large), we can approximate Estimating FDR.
6 We now need to approximate E[S(t)] and E[F(t)]. To illustrate how FDR is determined, for m genes we have m p values denoted p1, p2, ,pm Define F(t) and S(t). can count these for a given t Estimating FDR. Approximating F(t) is more difficult because we do know how many values called significant were truly null Assuming null p values are uniformly distributed, the probability(null p t) = t (# of null features x probability of null feature called significant). Estimating FDR. We do not know true value of m0, (# of null features) so we must estimate Equivalently, we can estimate the proportion of features that are truly null (denoted by 0).
7 Assuming a uniform distribution for null p- values, we can estimate this quantity using a histogram Estimating 0. Find where p- values look like a uniform distribution and set . Estimating 0. Note 0 does not depend on t (1- ). Estimating 0. Can also fit a cubic function to the 0. vs data to determine 0(1). (because most of the p values at 1. would be expected to be null). FDR. Estimate for False Discovery Rate is Graphical Interpretation q-value definition for a given feature, the q-value is the expected FDR incurred if it is called significant (every other p_j <= p_i is also called significant).
8 In practical terms: a q-value threshold is the "proportion of significant features that turn out to be false leads". Graphical Interpretation Graphical Interpretation q-value a measure of each feature's significance p- value is in terms of the false positive rate vs q-value is in terms of the FDR. this takes into account that thousands of features are simultaneously being tested (via FDR). uses a better model of where the significant features are likely to be p vs q Example: m = 10000. p-values: cutoff at.
9 01 assumes that you likely found about 100 false positives cutoff of .0001 assumes that you only found 1. false positive, but at what cost? q-values: set q-value cutoff at .05, and be sure that only 5%. of the significant genes found are likely to be false positives Algorithm for Determining q- Values Compute test statistic (p- value ) for m genes Estimate 0. Using histogram Find region where p-values are uniform + set . Count p-values > and compute (1- )m (number of values). Using cubic spline For each p- value calculate FDR for each threshold t >= p only choose t values for each unique p in the gene set choose minimum FDR as q-value q-value (cutoff).
10 q-value accuracy assumes that the dependence between features will generally be weak dependence genes are actually dependent in pathways, which can be modeled as blocks if so, when m is large, calling all features significant with q <= alpha, implies the FDR <= alpha the estimated q value of each feature is greater than or equal to it's true q-value conservative is desirable q-value summary A standard measure of significance that can be universally interpreted between studies better than using just p-values arbitrary selection of alpha, where it is selected so the expected number of false positives is < 1.