Example: biology

Likelihood Ratio Tests

Math 541: statistical Theory IILikelihood Ratio TestsInstructor: Songfeng ZhengA very popular form of hypothesis test is the Likelihood Ratio test, which is a generalization ofthe optimal test for simple null and alternative hypotheses that was developed by Neymanand Pearson (We skipped Neyman-Pearson lemma because we are short of time). Thelikelihood Ratio test is based on the Likelihood functionfn(X 1, , Xn| ), and the intuitionthat the Likelihood function tends to be highest near the true value of . Indeed, this is alsothe foundation for maximum Likelihood estimation. We will start from a very simple The Simplest Case: Simple HypothesesLet us first consider the simple hypotheses in which both the null hypothesis and alternativehypothesis consist one value of the parameter. SupposeX1, , Xnis a random sample ofsizenfrom an exponential distributionf(x| ) =1 e x/ ;x >0 Conduct the following simple hypothesis testing problem:H0: = 0vs.

Math 541: Statistical Theory II Likelihood Ratio Tests Instructor: Songfeng Zheng A very popular form of hypothesis test is the likelihood ratio test, which is a generalization of the optimal test for simple null and alternative hypotheses that was developed by Neyman and Pearson (We skipped Neyman-Pearson lemma because we are short of time). The

Tags:

  Statistical, Hypothesis

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Advertisement

Transcription of Likelihood Ratio Tests

1 Math 541: statistical Theory IILikelihood Ratio TestsInstructor: Songfeng ZhengA very popular form of hypothesis test is the Likelihood Ratio test, which is a generalization ofthe optimal test for simple null and alternative hypotheses that was developed by Neymanand Pearson (We skipped Neyman-Pearson lemma because we are short of time). Thelikelihood Ratio test is based on the Likelihood functionfn(X 1, , Xn| ), and the intuitionthat the Likelihood function tends to be highest near the true value of . Indeed, this is alsothe foundation for maximum Likelihood estimation. We will start from a very simple The Simplest Case: Simple HypothesesLet us first consider the simple hypotheses in which both the null hypothesis and alternativehypothesis consist one value of the parameter. SupposeX1, , Xnis a random sample ofsizenfrom an exponential distributionf(x| ) =1 e x/ ;x >0 Conduct the following simple hypothesis testing problem:H0: = 0vs.

2 Ha: = 1,where 1< 0. Suppose the significant level is .If we assumeH0were correct, then the Likelihood function isfn(X1, , Xn| 0) =n i=11 0e Xi/ 0= n0exp{ Xi/ 0}.Similarly, ifH1were correct, the Likelihood function isfn(X1, , Xn| 1) = n1exp{ Xi/ 1}.We define the Likelihood Ratio as follows:LR=fn(X1, , Xn| 0)fn(X1, , Xn| 1)= n0exp{ Xi/ 0} n1exp{ Xi/ 1}=( 0 1) nexp{(1 1 1 0) Xi}12 Intuitively, if the evidence (data) supportsH1, then the Likelihood functionfn(X1, , Xn| 1)should be large, therefore the Likelihood Ratio is small. Thus, we reject the null hypothesisif the Likelihood Ratio is small, k, wherekis a constant such thatP(LR k) = under the null hypothesis ( = 0).To find what kind of test results from this criterion, we expand the condition =P(LR k) =P ( 0 1) nexp{(1 1 1 0) Xi} k =P(exp{(1 1 1 0) Xi} ( 0 1)nk)=P((1 1 1 0) Xi log[( 0 1)nk])=P( Xi logk+nlog 0 nlog 11 1 1 0)=P(2 0 Xi 2 0logk+nlog 0 nlog 11 1 1 0)=P(V 2 0logk+nlog 0 nlog 11 1 1 0)whereV=2 0 Xi.

3 From the property of exponential distribution, we know under the nullhypothesis,2 0 Xifollows 22distribution, consequently,Vfollows a Chi square distributionwith 2ndegrees of freedom. Thus, by looking at the chi-square table, we can find the valueof the chi-square statistic with 2ndegrees of freedom such that the probability thatVis lessthan that number is , that is, solve forc, such thatP(V c) = . Once you find the valueofc, you can solve forkand define the test in terms of Likelihood example, suppose thatH0: = 2 andHa: = 1, and we want to do the test ata significance level = with a random sample of sizen= 5 from an exponentialdistribution. We can look at the chi-square table under 10 degrees of freedom to find is the value under which there is area. Using this, we can obtainP(22 Xi ) = This implies that we should reject the null hypothesis if Xi in find a rejection criterion directly in terms of the Likelihood function, we can solve forkby2 0logk+nlog 0 nlog 11 1 1 0= ,and the solution isk= So going back to the original Likelihood Ratio , we reject thenull hypothesis if( 0 1) nexp{(1 1 1 0) Xi}=(21) 5exp{(11 12) Xi} General Likelihood Ratio TestLikelihood Ratio Tests are useful to test a composite null hypothesis against a compositealternative that the null hypothesis specifies that (may be a vector) lies in a particular setof possible values, say 0, : 0; the alternative hypothesis specifies that liesin another set of possible values a, which does not overlap 0, : a.

4 Let = 0 a. Either or both of the hypothesesH0andHacan be ( 0) be the maximum (actually the supremum) of the Likelihood function for all is,L( 0) = max 0L( ).L( 0) represents the best explanation for the observeddata for all 0. Similarly,L( ) = max L( ) represents the best explanation forthe observed data for all = 0 a. IfL( 0) =L( ), then a best explanationfor the observed data can be found inside 0and we should not reject the null hypothesisH0: 0. However, ifL( 0)< L( ), then the best explanation for the observed datacould be found inside a, and we should consider rejectingH0in favor ofHa. A likelihoodratio test is based on the ratioL( 0)/L( ).Define the Likelihood Ratio statistic by =L( 0)L( )=max 0L( )max L( ),A Likelihood Ratio test ofH0: : aemploys as a test statistic, and therejection region is determined by , 0 1. A value of close to zero indicates that the Likelihood of the sample ismuch smaller underH0than it is underHa, therefore the data suggest actually value ofkis chosen so that achieves the desired lot of previously introduced testing procedure can be reformulated as Likelihood Ratio test,such at the example below:Example 1: Testing Hypotheses about the mean of a normal distribution withunknown thatX= (X1, , Xn) is a random sample from a normaldistribution with unknown mean and unknown variance 2.

5 We wish to test the hypothesesH0: = 0vs. Ha: 6= 0 Solution:In this example, the parameter is = ( , 2). Notice that 0is the set{( 0, 2) : 2>0}, and a={( , 2) : 6= 0, 2>0}, and hence that = 0 a={( , 2) : < < , 2>0}. The value of the constant 2is completely unspecified. We mustnow findL( 0) andL( ).4 For the normal distribution, we haveL( ) =L( , 2) =(1 2 )nexp[ n i=1(Xi )22 2].Restricting to 0implies that = 0, and we can findL( 0) if we can determine thevalue of 2that maximizesL( , 2) subject to the constraint that = 0. It is easy to seethat when = 0, the value of 2that maximizesL( 0, 2) is 20=1nn i=1(Xi 0) ,L( 0) can be obtained by replacing with 0and 2with 20inL( , 2), which yieldsL( 0) =(1 2 0)nexp[ n i=1(Xi 0)22 20]=(1 2 0)ne now turn to findingL( ). Let ( , 2) be the point in the set which maximizes thelikelihood functionL( , 2), by the method of maximum Likelihood estimation, we have = Xand 2=1nn i=1(Xi ) ( ) is obtained by replacing with and 2with 2, which givesL( ) =(1 2 )nexp[ n i=1(Xi )22 2]=(1 2 )ne , the Likelihood Ratio is calculated as =L( 0)L( )=(1 2 0)ne n/2(1 2 )ne n/2=( 2 20)n/2=[ ni=1(Xi X)2 ni=1(Xi 0)2]n/2 Notice that 0< 1 because 0 , thus when < kwe would rejectH0, wherek <1is a constant.

6 Becausen i=1(Xi 0)2=n i=1[(Xi X) + ( X 0)]2=n i=1(Xi X)2+n( X 0)2,the rejection region, < k, is equivalent to ni=1(Xi X)2 ni=1(Xi 0)2< k2/n=k 5 ni=1(Xi X)2 ni=1(Xi X)2+n( X 0)2< k 11 +n( X 0)2 ni=1(Xi X)2< k .This inequality is equivalent ton( X 0)2 ni=1(Xi X)2>1k 1 =k n( X 0)21n 1 ni=1(Xi X)2>(n 1)k By definingS2=1n 1n i=1(Xi X)2,the above rejection region is equivalent to n( X 0)S > (n 1)k .We can recognize that n( X 0)/Sis thetstatistic employed in previous sections, and thedecision rule is exactly the same as previous. Consequently, in this situation, the likelihoodratio test is equivalent to thettest. For two-sided Tests , we can also verify that likelihoodratio test is equivalent to 2:SupposeX1, , Xnfrom a normal distributionN( , 2) where both and are unknown. We wish to test the hypothesesH0: 2= 20vs. Ha: 26= 20at the level . Show that the Likelihood Ratio test is equivalent to the :The parameter is = ( , 2).

7 Notice that 0is the set{( , 20) : < < },and a={( , 2) : < < , 26= 20}, and hence that = 0 a={( , 2) : < < , 2>0}. We must now findL( 0) andL( ).For the normal distribution, we haveL( ) =L( , 2) =(1 2 )nexp[ n i=1(Xi )22 2].In the subset 0, we have 2= 20, and we can findL( 0) if we can determine the value of that maximizesL( , 2) subject to the constraint that 2= 20. It is easy to see that the6value of that maximizesL( , 20) is 0= X. Thus,L( 0) can be obtained by replacing with 0and 2with 20inL( , 2), which yieldsL( 0) =(1 2 0)nexp[ n i=1(Xi 0)22 20].Next, We findL( ). Let ( , 2) be the point in the set which maximizes the likelihoodfunctionL( , 2), by the method of maximum Likelihood estimation, we have = Xand 2=1nn i=1(Xi ) ( ) is obtained by replacing with and 2with 2, which givesL( ) =(1 2 )nexp[ n i=1(Xi )22 2]=(1 2 )ne , the Likelihood Ratio is calculated as =L( 0)L( )=(1 2 0)nexp[ ni=1(Xi 0)22 20](1 2 )ne n/2=en/2( 2 20)n/2exp[ n2 2 20]Notice that 0< 1 because 0 , thus when < kwe would rejectH0, wherek <1is a constant.

8 The rejection region, < k, is equivalent to( 2 20)n/2exp[ n2 2 20]< ke n/2=k Viewing the left hand side as a function of 2/ 20, the above inequality holds if 2/ 20is toobig or too small, 2 20< aor 2 20> bThis inequality is equivalent ton 2 20< naorn 2 20> can recognize thatn 2/ 20is the 2statistic employed in previous sections, and thedecision rule is exactly the same as previous. Consequently, in this situation, the likelihoodratio test is equivalent to the Ratio statistic is a function of the sampleX1, , Xn, and we can prove that itonly depends on the sample through a sufficient statistic. Formally, supposeX1, , Xnis a7random sample from the distributionf(x| ), where is the unknown parameter (vector).Furthermore, assume thatT(X) is a sufficient statistic, then by factorization theorem thejoint distribution ofX1, , Xncan be decomposed asf(x| ) =u(x)v[T(x), ],which is also the Likelihood us assume we want to test the hypothesesH0: 0vs.

9 Ha: awhere 0and aare disjoint subsets of the parameter space , and 0 a= . Usinglikelihood Ratio test, we first need to find the maximal points in 0and . In 0, let 0= arg max 0f(X| ) = arg max 0u(X)v[T(X), ] = arg max 0v[T(X), ],then clearly 0depends on the dataX1, , Xnonly through the sufficient statisticT(X),and let us denote this relation as 0=g(T(X)). Similarly, in the set , we have = arg max f(X| ) = arg max u(X)v[T(X), ] = arg max v[T(X), ] =h(T(X)).Therefore, the Likelihood Ratio statistic can be calculated as =L( 0)L( )=U(X)v[T(X), 0]U(X)v[T(X), ]=v[T(X), g(T(X))]v[T(X), h(T(X))],which depends on the sufficient statistic only. For example, in example 1, the final likelihoodratio test depends on XandS, and we know that Xis a sufficient statistic for , andSisa sufficient statistic for .Here, we see the importance of sufficient statistic another time.

10 Previously, we saw that MLEand Bayesian estimators are functions of sufficient statistics, and in exponential family, theefficient estimator is a linear function of sufficient can be verified that thettest andFtest used for two sample hypothesis testing problemscan also be reformulated as Likelihood Ratio test. Unfortunately, the Likelihood Ratio methoddoes not always produce a test statistic with a known probability distribution. If the samplesize is large, however, we can obtain an approximation to the distribution of if somereasonable regularity conditions are satisfied by the underlying population distribution(s).These are general conditions that hold for most (but not all) of the distributions that wehave considered. The regularity conditions mainly involve the existence of derivatives, withrespect to the parameters, of the Likelihood function. Another key condition is that theregion over which the Likelihood function is positive cannot depend on unknown parametervalues.


Related search queries