Reading 10b: Maximum Likelihood Estimates

Maximum Likelihood EstimatesClass 10, Orloff and Jonathan Bloom1 Learning Goals1. Be able to define the Likelihood function for a parametric model given Be able to compute the Maximum Likelihood estimate of unknown parameter(s).2 IntroductionSuppose we know we have data consisting of valuesx1,..,xndrawn from an exponentialdistribution. The question remains: which exponential distribution?!We have casually referred totheexponential distribution orthebinomial distribution orthenormal distribution. In fact the exponential distribution exp( ) is not a single distributionbut rather a one-parameter family of distributions.

Each value of defines a different dis-tribution in the family, with pdff (x) = e xon [0, ). Similarly, a binomial distributionbin(n,p) is determined by the two parametersnandp, and a normal distributionN( , 2)is determined by the two parameters and 2(or equivalently, and ). Parameterizedfamilies of distributions are often calledparametric distributionsorparametric are often faced with the situation of having random data which we know (or believe)is drawn from a parametric model, whose parameters we do not know. For example, inan election between two candidates, polling data constitutes draws from a Bernoulli(p)distribution with unknown parameterp.]

In this case we would like to use the data toestimate the value of the parameterp, as the latter predicts the result of the , assuming gestational length follows a normal distribution, we would like to usethe data of the gestational lengths from a random sample of pregnancies to draw inferencesabout the values of the parameters and focus so far has been on computing theprobability of dataarising from a parametricmodel withknown parameters. Statistical inference flips this on its head: we will estimatetheprobability of parametersgiven a parametric model andobserved datadrawn from the coming weeks we will see how parameter values are naturally viewed as hypotheses,so we are in fact estimating the probability of various hypotheses given the Maximum Likelihood EstimatesThere are many methods for estimating unknown parameters from data.

We will firstconsider themaximum Likelihood estimate(MLE), which answers the question:For which parameter value does the observed data have the biggest probability?The MLE is an example of apoint estimatebecause it gives a single value for the unknownparameter (later our Estimates will involve intervals and probabilities). Two advantages class 10, Maximum Likelihood Estimates , Spring 20142the MLE are that it is often easy to compute and that it agrees with our intuition in simpleexamples. We will explain the MLE through a series of coin is flipped 100 times. Given that there were 55 heads, find the maximumlikelihood estimate for the probabilitypof heads on a single actually solving the problem, let s establish some notation and can think of counting the number of heads in 100 tosses as an experiment.

For a givenvalue ofp, the probability of getting 55 heads in this experiment is the binomial probabilityP(55 heads) =(100)p55(1p55 ) probability of getting 55 heads depends on the value ofp, so let s includepin by usingthe notation of conditional probability:P(55 heads|p) =(100)p55(1 p) should readP(55 heads|p) as: the probability of 55 heads givenp, or more precisely as the probability of 55 heads given that the probability of heads on a single toss isp. Here are some standard terms we will use as we do statistics. Experiment: Flip the coin 100 times and count the number of heads.

Data: The data is the result of the experiment. In this case it is 55 heads . Parameter(s) of interest: We are interested in the value of the unknown parameterp. Likelihood , orlikelihood function: this isP(data|p).Note it is a function of both thedata and the parameterp. In this case the Likelihood isP(55 heads|p) =(10055)p55(1 p) likelihoodP(data|p) changes as the parameter of carefully at the definition. One typical source of confusion is to mistake the likeli-hoodP(data|p) forP(p|data). We know from our earlier work with Bayes theorem thatP(data|p) andP(p|data) are usually very :Given data themaximum Likelihood estimate (MLE)for the parameterpisthe value ofpthat maximizes the likelihoodP(data|p).

That is, the MLE is the value ofpfor which the data is most :For the problem at hand, we saw above that the likelihood100P(55 heads|p) =(55)p55(1 p) class 10, Maximum Likelihood Estimates , Spring 20143We ll use the notationp for the MLE. We use calculus to find it by taking the derivative ofthe Likelihood function and setting it to (datap) =(55p54(1p)4545p55(1p)44) = |(55) Solving this forpwe get55p54(1 p)45= 45p55(1 p)4455(1 p) = 45p55 = 100pthe MLE isp =.55 MLE forpturned out to be exactly the fraction of heads we saw in our MLE is computed from the data.

That is, it is a you should check that the critical point is indeed a Maximum . You can do thiswith the second derivative Log likelihoodIf is often easier to work with the natural log of the Likelihood function. For short this issimply called thelog Likelihood . Since ln(x) is an increasing function, the maxima of thelikelihood and log Likelihood the previous example using log :We had the likelihoodP(55 heads|p) =(100)p55(1 p)45. Therefore the log55likelihood isln(P(55 heads|p) = ln((100))+ 55 ln(p) + 45 ln(1 p).55 Maximizing Likelihood is the same as maximizing log Likelihood .)

We check that calculusgives us the same answer as before:ddp(log Likelihood ) =d100lndp[ ((55))+ 55 ln(p) + 45 ln(1 p)55]=p 45= 01 p 55(1 p) = 45p p =. Maximum Likelihood for continuous distributionsFor continuous distributions, we use the probability density function to define the show this in a few examples. In the next section we explain how this is analogous towhat we did in the discrete class 10, Maximum Likelihood Estimates , Spring 20144 Example 3. Light bulbsSuppose that the lifetime ofBadgerbrand light bulbs is modeled by an exponential distri-bution with (unknown) parameter.

We test 5 bulbs and find they have lifetimes of 2, 3,1, 3, and 4 years, respectively. What is the MLE for ?answer:We need to be careful with our notation. With five different values it is best touse subscripts. LetXbe the lifetime of theithjbulb and letxibe the valueXitakes. TheneachX xiihas pdffXi(xi) = e . We assume the lifetimes of the bulbs are independent,so the joint pdf is the product of the individual densities:f(x ,x ,x ,x ,x| ) = ( e x1)( e x2)( e x3)( e x4)( e x5) = 5e (x1+x2+x3+x4+x5)1 2 3 4 that we write this as a conditional density, since it depends on.

Reading 10b: Maximum Likelihood Estimates

Tags:

Information

Transcription of Reading 10b: Maximum Likelihood Estimates

Related search queries

Reading 10b: Maximum Likelihood Estimates

Tags:

Information

Documents from same domain

Related documents

Related search queries