Example: quiz answers

Introduction to Likelihood Statistics

Introduction to Likelihood Statistics 1. The Likelihood function. 2. Use of the Likelihood function to model data. 3. Comparison to standard frequentist and Bayesean Statistics . Edward L. Robinson* Department of Astronomy and McDonald Observatory University of Texas at Austin *Look for: Data Analysis for scientists and engineers Princeton University Press, Sept 2016. The Likelihood Function Let a probability distribution function for have m+1parameters ajf( ,a0,a1, ,am)=f( ,~a),The joint probability distribution for n samples of isf( 1, 2, , n,a0,a1, ,am)=f(~ ,~a). Now make measurements. For each variable ithere is a measuredvalue xi. To obtain the Likelihood function L(~x,~a), replace each variable iwiththe numerical value of the corresponding data point xi:L(~x,~a) f(~x,~a)=f(x1,x2, ,xn,~a).In the Likelihood function the~x are known and fixed, while the~aarethe Simple Example Suppose the probabilitydistribution for the data isf( ,a)=a2 e a.

*Look for: “Data Analysis for Scientists and Engineers” Princeton University Press, Sept 2016. The Likelihood Function • Let a probability distribution function for ⇠ have m+1 parameters a j f(⇠,a 0,a 1,···,a m)=f(⇠,~a),

Tags:

  Engineer, Scientist, For scientists and engineers

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Introduction to Likelihood Statistics

1 Introduction to Likelihood Statistics 1. The Likelihood function. 2. Use of the Likelihood function to model data. 3. Comparison to standard frequentist and Bayesean Statistics . Edward L. Robinson* Department of Astronomy and McDonald Observatory University of Texas at Austin *Look for: Data Analysis for scientists and engineers Princeton University Press, Sept 2016. The Likelihood Function Let a probability distribution function for have m+1parameters ajf( ,a0,a1, ,am)=f( ,~a),The joint probability distribution for n samples of isf( 1, 2, , n,a0,a1, ,am)=f(~ ,~a). Now make measurements. For each variable ithere is a measuredvalue xi. To obtain the Likelihood function L(~x,~a), replace each variable iwiththe numerical value of the corresponding data point xi:L(~x,~a) f(~x,~a)=f(x1,x2, ,xn,~a).In the Likelihood function the~x are known and fixed, while the~aarethe Simple Example Suppose the probabilitydistribution for the data isf( ,a)=a2 e a.

2 Measure a single data point. Itturns out to be x=2. The Likelihood function isL(x=2,a)=2a2e Somewhat More Realistic Example Suppose we have n independent data points, each drawn from the sameprobability distributionf( ,a)=a2 e a .The joint probability distribution isf(~ ,a)=f( 1,a)f( 2,a) f( n,a)=nYi=1a2 ie a data points are xi. The resulting Likelihood function isL(~x,a)=nYi=1a2xie Is the Likelihood Function? 1 The Likelihood function is not a probability distribution. It does not transform like a probability distribution. Normalization is not do we deal with this?Traditional approach: Use theLikelihood compare the Likelihood of two possible sets ofparameters~a1and~a2, construct the Likelihood ratio:LR=L(~x,~a1)L(~x,~a2)=f(~x,~a1)f(~ x,~a2).This is the ratio of the probabilities that data~xwould be produced by parameter values~a1and~ Is the Likelihood Function?

3 - 2 Compareallparameter values to a single set of fiducial parameter values~a0. The Likelihood ratio becomesLR=L(~x,~a)L(~x,~a0)/L(~x,~a).Thi s Likelihood ratio and therefore the likelihoodfunction itself is proportional to the probability thatthe observed data~x would be produced by param-eter values~ Is the Likelihood Function? - 3 An increasingly common and highly attractive approach (although it isunclear that everyone knows what they are doing):Treat the Likelihood function like a probability distribution! The quantityL(~x,~a)d~a=L(~x,~a)da0da1da2 damdoes transform like probability. This usage is entirely consistent withthe standard definition of probability density. To normalize L(~x,~a), define a multiplicative factor A such that1=AZL(~x,~a)d~ AL(~x,~a)is normalized (but normalization never needed). Likelihood Statistics The Likelihood function contains information about the new data.

4 (I am in the camp that says it containsallthe new information.)One can extract information from L(~x,~a)in the same way one extractsinformation from an (un-normalized) probability distribution: calculate the mean, median, and mode of parameters. plot the Likelihood and its marginal distributions. calculate variances and confidence intervals. Use it as a basis for 2minimization!But beware: One can usually get away with thinking of the likelihoodfunction as the probability distribution for the parameters~a, but this isnot really correct. It is the probability that a specific set of parameterswould yield the observed Maximum Likelihood Principle The maximum Likelihood principle is one way to extract information fromthe Likelihood function. It says, in e ect, Use the modal values of the parameters. The Maximum Likelihood PrincipleGiven data points~x drawn from a joint probability dis-tribution whose functional form is known to be f(~ ,~a),the best estimate of the parameters~a are those thatmaximize the Likelihood function L(~x,~a)=f(~x,~a).

5 Find the maximum values by setting@L@aj a= m+1 equations are the Likelihood equations. Maximum Likelihood Estimation Recall our previous example: n independent data points xidrawn fromf( ,a)=a2 e a .The Likelihood function isL(~x,a)=nYi=1a2xie best estimate of a occurs at themaximum of L(~x,a), which occurs at@L(~x,a)@a a=0)ha bit ofalgebrai) a=2nPxiIf n=4 and~x=(2,4,5,5),then a=1 Log- Likelihood Function For computational convenience, one often prefers to deal with the log ofthe Likelihood function in maximum Likelihood is okay because the maxima of the Likelihood and its log occur atthe same value of the log- Likelihood is defined to be`(~x,~a)=ln{L(~x,~a)}and the Likelihood equations become@`@aj a= Fully Realistic Example - 1 We have n independent measurements(xi, i)drawn from the Gaussiansfi( i, i,a)=1p2 iexp 12( i a)2 2i.

6 Thus, the measurements all have the same mean value but have di erentnoise. The noise is described by the width of the Gaussians, a di erentwidth for each joint probability distribution for the data points isf(~ ,~ ,a)=nYi=11p2 iexp 12( i a)2 2i ,The joint Likelihood function for all the measurements isL(~x,~ ,a)=nYi=11p2 iexp 12(xi a)2 2i .A Fully Realistic Example - 2 The log- Likelihood function is`(~x,~ ,a)=nXi=1ln 1p2 i 12nXi=1(xi a)2 elementary probability theory we know that the mean value of theGaussian is a. Let us estimate the mean Likelihood equation for a is0=@`@a a=@@a" 12nXi=1(xi a)2 2i# some algebra we arrive at a=Pni=1wixiPni=1wi,where wi=1/ 2iThis is the same as the weighted average in freqentist Statistics !Fit a Model to Data - 1 yxHave independent data points:(xi,yi, i),i=1, ,nThe yihave errors know the yiare drawn fromGaussian distributionsf(y)/exp (y )22 2.

7 To represent the data both and must depend on x. The values of are given explicitly for each xi. The values of are given as a function of x: = (x)Fit a Model to Data - 2 yxThe functional form of (x)can be ascomplicated as keep the example simple, fit a straightline to the data: =a0+a1xThe individual yiare now each drawn from Gaussian distributionsf(y)/exp (y a0 a1xi)22 2i .Because the data points are independent of each other, the jointprobability distribution is the product of the individual Likelihood function is, therefore,L(~x,~y,~ ,a0,a1)/nYi=1exp (yi a0 a1xi)22 2i .Fit a Model to Data - 3 yxThe log Likelihood function is`(~x,~y,~ ,a0,a1)= 12nXi=1wi(yi a0 a1xi)2+c,where wi=1/ 2iand c is a two Likelihood equations are@`@a0 ~ a=0 and@`@a1 ~ a=0,which lead to the following equations (inmatrix form) for a0and a1: PiwiPiwixiPiwixiPiwix2i a0 a1 = PiwiyiPiwixiyi.

8 These are identical to the normal equations for a weightedleast squares (or 2minimization) solution for a0and to Standard Frequentist Statistics The probability distributions from which the data points are drawnmustbe known to apply Likelihood Statistics , but not for manystandard frequentist techniques. If the data have Gaussian distributions, Likelihood Statistics reducesto ordinary frequentist Statistics . Likelihood Statistics provides a solid foundation for treating data withnon-Gaussian distributions ( , the Poisson distribution in someastronomical applications). If treated as probability distributions, Likelihood functions can beanalyzed with all the tools developed to analyze posteriordistributions of Bayesian Statistics ( , marginal distributions andMCMC sampling).Comparison to Bayesian Statistics The salient feature of Bayesian Statistics : Combine new data with existingknowledge using Bayes equation:(Posterior Probabiltiy)/( Likelihood ) (Prior Probability).

9 Mathematically, Likelihood Statistics isessentially Bayesian statisticswithout a prior probability function()Posterior distributionLikelihood ratio()Bayes factor It isnotBayesian Statistics with a flat or uninformative prior. Flatness is not an invariant concept. The prior must know about the Likelihood function to be trulyuninformative. Likelihood statisticsdefines probability as a frequency, not as aBayesian state of knowledge or state of belief.


Related search queries