Example: dental hygienist

Maximum Likelihood Estimation 1 Maximum Likelihood …

Math 541: Statistical Theory IIMaximum Likelihood EstimationLecturer: Songfeng Zheng1 Maximum Likelihood EstimationMaximum Likelihood is a relatively simple method of constructing an estimator for an un-known parameter . It was introduced by R. A. Fisher, a great English mathematical statis-tician, in 1912. Maximum Likelihood Estimation (MLE) can be applied in most problems, ithas a strong intuitive appeal, and often yields a reasonable estimator of . Furthermore, ifthe sample is large, the method will yield an excellent estimator of . For these reasons, themethod of Maximum Likelihood is probably the most widely used method of Estimation that the random variablesX1, , Xnform a random sample from a distributionf(x| ); ifXis continuous random variable ,f(x| ) is pdf, ifXis discrete random variable ,f(x| ) is point mass function. We use the given symbol to represent that the distributionalso depends on a parameter , where could be a real-valued unknown parameter or avector of parameters.

Example 1: Suppose that X is a discrete random variable with the following probability ... Example 5 and 6 illustrate one shortcoming of the concept of an MLE. We know that it is irrelevant whether the pdf of the uniform distribution is chosen to be equal to 1= ...

Tags:

  Concept, Variable, Random, Random variables

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Maximum Likelihood Estimation 1 Maximum Likelihood …

1 Math 541: Statistical Theory IIMaximum Likelihood EstimationLecturer: Songfeng Zheng1 Maximum Likelihood EstimationMaximum Likelihood is a relatively simple method of constructing an estimator for an un-known parameter . It was introduced by R. A. Fisher, a great English mathematical statis-tician, in 1912. Maximum Likelihood Estimation (MLE) can be applied in most problems, ithas a strong intuitive appeal, and often yields a reasonable estimator of . Furthermore, ifthe sample is large, the method will yield an excellent estimator of . For these reasons, themethod of Maximum Likelihood is probably the most widely used method of Estimation that the random variablesX1, , Xnform a random sample from a distributionf(x| ); ifXis continuous random variable ,f(x| ) is pdf, ifXis discrete random variable ,f(x| ) is point mass function. We use the given symbol to represent that the distributionalso depends on a parameter , where could be a real-valued unknown parameter or avector of parameters.

2 For every observed random samplex1, , xn, we definef(x1, , xn| ) =f(x1| ) f(xn| )(1)Iff(x| ) is pdf,f(x1, , xn| ) is the joint density function; iff(x| ) is pmf,f(x1, , xn| )is the joint probability. Now we callf(x1, , xn| ) as thelikelihood function. As we cansee, the Likelihood function depends on the unknown parameter , and it is always denotedasL( ).Suppose, for the moment, that the observed random samplex1, , xncame from a discretedistribution. If an estimate of must be selected, we would certainly not consider any valueof for which it would have been impossible to obtain the datax1, , xnthat was actuallyobserved. Furthermore, suppose that the probabilityf(x1, , xn| ) of obtaining the actualobserved datax1, , xnis very high when has a particular value, say = 0, and is verysmall for every other value of . Then we would naturally estimate the value of to be 0. When the sample comes from a continuous distribution, it would again be natural totry to find a value of for which the probability densityf(x1, , xn| ) is large, and to usethis value as an estimate of.

3 For any given observed datax1, , xn, we are led by thisreasoning to consider a value of for which the Likelihood functionL( ) is a Maximum andto use this value as an estimate of .12 The meaning of Maximum Likelihood is as follows. We choose the parameter that makesthe Likelihood of having the obtained data at hand Maximum . With discrete distributions,the Likelihood is the same as the probability. We choose the parameter for the density thatmaximizes the probability of the data coming from , if we had no actual data, maximizing the Likelihood function will give usa function ofnrandom variablesX1, , Xn, which we shall call Maximum likelihoodestimate . When there are actual data, the estimate takes a particular numerical value,which will be the Maximum Likelihood requires us to Maximum the Likelihood functionL( ) with respect to the unknownparameter . From Eqn. 1,L( ) is defined as a product ofnterms, which is not easyto be maximized.

4 MaximizingL( ) is equivalent to maximizing logL( ) because log is amonotonic increasing function. We define logL( ) aslog Likelihood function, we denote it asl( ), ( ) = logL( ) = logn i=1f(Xi| ) =n i=1logf(Xi| )Maximizingl( ) with respect to will give us the MLE ExamplesExample 1:Suppose thatXis a discrete random variable with the following probabilitymass function: where 0 1 is a parameter. The following 10 independent observationsX0123P(X)2 /3 /32(1 )/3(1 )/3were taken from such a distribution: (3,0,2,1,3,2,1,0,2,1). What is the Maximum likelihoodestimate of .Solution:Since the sample is (3,0,2,1,3,2,1,0,2,1), the Likelihood isL( ) =P(X= 3)P(X= 0)P(X= 2)P(X= 1)P(X= 3) P(X= 2)P(X= 1)P(X= 0)P(X= 2)P(X= 1)(2)Substituting from the probability distribution given above, we haveL( ) =n i=1P(Xi| ) =(2 3)2( 3)3(2(1 )3)3(1 3)2 Clearly, the Likelihood functionL( ) is not easy to us look at the log Likelihood functionl( ) = logL( ) =n i=1logP(Xi| )= 2(log23+ log )+ 3(log13+ log )+ 3(log23+ log(1 ))+ 2(log13+ log(1 ))=C+ 5 log + 5 log(1 )whereCis a constant which does not depend on.

5 It can be seen that the log likelihoodfunction is easier to maximize compared to the Likelihood the derivative ofl( ) with respect to be zero:dl( )d =5 51 = 0and the solution gives us the MLE, which is = We remember that the method ofmoment Estimation is = 5/12, which is different from 2:SupposeX1, X2, , Xnare random variables with density functionf(x| ) =12 exp( |x| ), please find the Maximum Likelihood estimate of .Solution:The log- Likelihood function isl( ) =n i=1[ log 2 log |Xi| ]Let the derivative with respect to be zero:l ( ) =n i=1[ 1 +|Xi| 2]= n + ni=1|Xi| 2= 0and this gives us the MLE for as = ni=1|Xi|nAgain this is different from the method of moment Estimation which is = ni=1X2i2nExample 3:Use the method of moment to estimate the parameters and for the normaldensityf(x| , 2) =1 2 exp{ (x )22 2},4based on a random sampleX1, , :In this example, we have two unknown parameters, and , therefore the pa-rameter = ( , ) is a vector.

6 We first write out the log Likelihood function asl( , ) =n i=1[ log 12log 2 12 2(Xi )2]= nlog n2log 2 12 2n i=1(Xi )2 Setting the partial derivative to be 0, we have l( , ) =1 2n i=1(Xi ) = 0 l( , ) = n + 3n i=1(Xi )2= 0 Solving these equations will give us the MLE for and : =Xand = 1nn i=1(Xi X)2 This time the MLE is the same as the result of method of these examples, we can see that the Maximum Likelihood result may or may not be thesame as the result of method of 4:The Pareto distribution has been used in economics as a model for a densityfunction with a slowly decaying tail:f(x|x0, ) = x 0x 1, x x0, >1 Assume thatx0>0 is given and thatX1, X2, , Xnis an sample. Find the MLE of .Solution:The log- Likelihood function isl( ) =n i=1logf(Xi| ) =n i=1(log + logx0 ( + 1) logXi)=nlog +n logx0 ( + 1)n i=1logXiLet the derivative with respect to be zero:dl( )d =n +nlogx0 n i=1logXi= 05 Solving the equation yields the MLE of : MLE=1logX logx0 Example 5:Suppose thatX1, , Xnform a random sample from a uniform distributionon the interval (0, ), where of the parameter >0 but is unknown.

7 Please find MLE of .Solution:The pdf of each observation has the following form:f(x| ) ={1/ ,for 0 x 0,otherwise(3)Therefore, the Likelihood function has the formL( ) ={1 n,for 0 xi (i= 1, , n)0,otherwiseIt can be seen that the MLE of must be a value of for which xifori= 1, , nandwhich maximizes 1/ namong all such values. Since 1/ nis a decreasing function of , theestimate will be the smallest possible value of such that xifori= 1, , n. This valueis = max(x1, , xn), it follows that the MLE of is = max(X1, , Xn).It should be remarked that in this example, the MLE does not seem to be a suitableestimator of . We know that max(X1, , Xn)< with probability 1, and therefore surely underestimates the value of .Example 6:Suppose again thatX1, , Xnform a random sample from a uniform distribu-tion on the interval (0, ), where of the parameter >0 but is unknown. However, supposenow we write the density function asf(x| ) ={1/ ,for 0< x < 0,otherwise(4)We will prove that in this case, the MLE for does not :The only difference between Eqn.}}}

8 3 and Eqn. 4 is that the value of the pdf atthe two endpints 0 and has been changed by replacing the weak inequalities in Eqn. 3with strict inequalities in Eqn. 4. Either equation could be used as the pdf of the , if Eqn. 4 is used as the pdf, then an MLE of will be a value of for which > xifori= 1, , nand which maximizes 1/ namong all such values. It should be noted thatthe possible values of no longer include the value = max(x1, , xn), since must bestrictlygreater than each observed valuexifori= 1, , n. Since can be chosen arbitrarilyclose to the value max(x1, , xn) but cannot be chosen equal to this value, it follows thatthe MLE of does not exist in this 5 and 6 illustrate one shortcoming of the concept of an MLE. We know that it isirrelevant whether the pdf of the uniform distribution is chosen to be equal to 1/ over theopen interval 0< x < or over the closed interval 0 x . Now, however, we see thatthe existence of an MLE depends on this typically irrelevant and unimportant choice.

9 Thisdifficulty is easily avoided in Example 5 by using the pdf given by Eqn. 3 rather than thatgiven by Eqn. 4. In many other problems, as well, in which there is a difficulty of this typein regard to the existence of an MLE, the difficulty can be avoided simply by choosing oneparticular appropriate version of the pdf to represent the given 7:Suppose thatX1, , Xnform a random sample from a uniform distributionon the interval ( , + 1), where the value of the parameter is unknown ( < < ).Clearly, the density function isf(x| ) ={1,for x + 10,otherwiseWe will see that the MLE for is not :In this example, the Likelihood function isL( ) ={1,for xi + 1 (i= 1, , n)0,otherwiseThe condition that xifori= 1, , nis equivalent to the condition that min(x1, , xn).Similarly, the condition thatxi + 1 fori= 1, , nis equivalent to the condition that max(x1, , xn) 1. Therefore, we can rewrite the the Likelihood function asL( ) ={1,for max(x1, , xn) 1 min(x1, , xn)0,otherwiseThus, we can select any value in the interval [max(x1, , xn) 1,min(x1, , xn)] as theMLE for.}}}

10 Therefore, the MLE is not uniquely specified in this ExercisesExercise 1:LetX1, , Xnbe an sample from a Poisson distribution with parameter , ,P(X=x| ) = xe x!.Please find the MLE of the parameter .Exercise 2:LetX1, , Xnbe an sample from an exponential distribution with thedensity functionf(x| ) =1 e x ,with 0 x < .7 Please find the MLE of the parameter .Exercise 3:Gamma distribution has a density function asf(x| , ) =1 ( ) x 1e x,with 0 x < .Suppose the parameter is known, please find the MLE of based on an sampleX1, , 4:Suppose thatX1, , Xnform a random sample from a distribution for whichthe pdff(x| ) is as follows:f(x| ) ={ x 1,for 0< x <10,forx 0 Also suppose that the value of is unknown ( >0). Find the MLE of .Exercise 5:Suppose thatX1, , Xnform a random sample from a distribution for whichthe pdff(x| ) is as follows:f(x| ) =12e |x |for < x < Also suppose that the value of is unknown ( < < ). Find the MLE of.}


Related search queries