Example: marketing

Fisher Information and Cram¶er-Rao Bound

Math 541: Statistical Theory IIFisher Information and Cram er-Rao BoundInstructor: Songfeng ZhengIn the parameter estimation problems, we obtain Information about the parameter from asample of data coming from the underlying probability distribution. A natural question is:how much Information can a sample of data provide about the unknown parameter? Thissection introduces such a measure for Information , and we can also see that this informationmeasure can be used to find bounds on the variance of estimators, and it can be used toapproximate the sampling distribution of an estimator obtained from a large sample, andfurther be used to obtain an approximate confidence interval in case of large this section, we consider a random variableXfor which the pdf or pmf isf(x| ), where is an unknown parameter and , with is the parameter Fisher InformationMotivation:Intuitively, if a

Fisher Information and Cram¶er-Rao Bound Instructor: Songfeng Zheng In the parameter estimation problems, we obtain information about the parameter from a sample of data coming from the underlying probability distribution. A natural question is: ... Proof: Let g(xj„) be the p.d.f. or p.m.f. of X when ...

Tags:

  Fisher

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Fisher Information and Cram¶er-Rao Bound

1 Math 541: Statistical Theory IIFisher Information and Cram er-Rao BoundInstructor: Songfeng ZhengIn the parameter estimation problems, we obtain Information about the parameter from asample of data coming from the underlying probability distribution. A natural question is:how much Information can a sample of data provide about the unknown parameter? Thissection introduces such a measure for Information , and we can also see that this informationmeasure can be used to find bounds on the variance of estimators, and it can be used toapproximate the sampling distribution of an estimator obtained from a large sample, andfurther be used to obtain an approximate confidence interval in case of large this section, we consider a random variableXfor which the pdf or pmf isf(x| ), where is an unknown parameter and , with is the parameter Fisher InformationMotivation.

2 Intuitively, if an event has small probability, then the occurrence of this eventbrings us much Information . For a random variableX f(x| ), if were the true value ofthe parameter, the likelihood function should take a big value, or equivalently, the derivativelog-likelihood function should be close to zero, and this is the basic principle of maximumlikelihood estimation. We definel(x| ) = logf(x| ) as the log-likelihood function, andl (x| ) = logf(x| ) =f (x| )f(x| )wheref (x| ) is the derivative off(x| ) with respect to . Similarly, we denote the secondorder derivative off(x| ) with respect to asf (x| ).

3 According to the above analysis, ifl (X| ) is close to zero, then it is expected, thus therandom variable does not provide much Information about ; on the other hand, if|l (X| )|or [l (X| )]2is large, the random variable provides much Information about . Thus, we canuse [l (X| )]2to measure the amount of Information provided byX. However, sinceXisa random variable, we should consider the average case. Thus, we introduce the followingdefinition: Fisher Information (for ) contained in the random variableXis defined as:I( ) =E {[l (X| ))]2}= [l (x| ))]2f(x| )dx.(1)12We assume that we can exchange the order of differentiation and integration, then f (x| )dx= f(x| )dx= 0 Similarly, f (x| )dx= 2 2 f(x| )dx= 0It is easy to see thatE [l (X| )] = l (x| )f(x| )dx= f (x| )f(x| )f(x| )dx= f (x| )dx= 0 Therefore, the definition of Fisher Information (1) can be rewritten asI( ) = Var [l (X| ))](2)Also, notice thatl (x| ) = [f (x| )f(x| )]=f (x| )f(x| ) [f (x| )]2[f(x| )]2=f (x| )f(x| ) [l (x| )]2 Therefore,E [l (x| )] = [f (x| )f(x| ) [l (x| )]2]f(x| )dx= f (x| )dx E {[l (X| )]2}= I( )

4 Finally, we have another formula to calculate Fisher Information :I( ) = E [l (x| )] = [ 2 2logf(x| )]f(x| )dx(3)To summarize, we have three methods to calculate Fisher Information : equations (1), (2),and (3). In many problems, using (3) is the most convenient 1:Suppose random variableXhas a Bernoulli distribution for which the pa-rameter is unknown (0< <1). We shall determine the Fisher informationI( ) point mass function ofXisf(x| ) = x(1 )1 xforx= 1 orx= (x| ) = logf(x| ) =xlog + (1 x) log(1 )3andl (x| ) =x 1 x1 andl (x| ) = x 2 1 x(1 )2 SinceE(X) = , the Fisher Information isI(x| ) = E[l (x| )] =E(X) 2+1 E(X)(1 )2=1 +11 =1 (1 )Example 2.

5 Suppose thatX N( , 2), and is unknown, but the value of 2is the Fisher informationI( ) < x < , we havel(x| ) = logf(x| ) = 12log(2 2) (x )22 2 Hence,l (x| ) =x 2andl (x| ) = 1 2It follows that the Fisher Information isI( ) = E[l (x| )] =1 2If we make a transformation of the parameter, we will have different expressions of Fisherinformation with different parameterization. More specifically, letXbe a random variablefor which the pdf or pmf isf(x| ), where the value of the parameter is unknown but mustlie in a space . LetI0( ) denote the Fisher Information inX. Suppose now the parameter is replaced by a new parameter , where = ( ), and is a differentiable function.

6 LetI1( ) denote the Fisher Information inXwhen the parameter is regarded as . We will haveI1( ) = [ ( )]2I0[ ( )].Proof:Letg(x| ) be the or ofXwhen is regarded as the parameter. Theng(x| ) =f[x| ( )]. Therefore,logg(x| ) = logf[x| ( )] =l[x| ( )],and logg(x| ) =l [x| ( )] ( ).It follows thatI1( ) =E{[ logg(X| )]2}= [ ( )]2E({l [X| ( )]}2)= [ ( )]2I0[ ( )]This will be verified in exercise that we have a random sampleX1, , Xncoming from a distribution for which thepdf or pmf isf(x| ), where the value of the parameter is unknown. Let us now calculatethe amount of Information the random sampleX1, , Xnprovides for.

7 Let us denote the joint pdf ofX1, , Xnasfn(x| ) =n i=1f(xi| )thenln(x| ) = logfn(x| ) =n i=1logf(xi| ) =n i=1l(xi| ).andl n(x| ) =f n(x| )fn(x| )(4)We define the Fisher informationIn( ) in the random sampleX1, , XnasIn( ) =E {[l n(X| )]2}= [l n(X| )]2fn(x| )dx1 dxnwhich is an n-dimensional integral. We further assume that we can exchange the order ofdifferentiation and integration, then we have f n(x| )dx= fn(x| )dx= 0and, f n(x| )dx= 2 2 fn(x| )dx= 0It is easy to see thatE [l n(X| )] = l n(x| )fn(x| )dx= f n(x| )fn(x| )fn(x| )dx= f n(x| )dx= 0 (5)Therefore, the definition of Fisher Information for the sampleX1, , Xncan be rewrittenasIn( ) = Var [l n(X| ))].

8 It is similar to prove that the Fisher Information can also be calculated asIn( ) = E [l n(X| ))].From the definition ofln(x| ), it follows thatl n(x| ) =n i=1l (xi| ).5 Therefore, the Fisher informationIn( ) = E [l n(X| ))] = E [n i=1l (Xi| )]= n i=1E [l (Xi| )] =nI( ).In other words, the Fisher Information in a random sample of sizenis simplyntimes theFisher Information in a single 3:SupposeX1, , Xnform a random sample from a Bernoulli distribution forwhich the parameter is unknown (0< <1). Then the Fisher informationIn( ) in thissample isIn( ) =nI( ) =n (1 ).Example 4:LetX1, , Xnbe a random sample fromN( , 2), and is unknown, butthe value of 2is given.

9 Then the Fisher informationIn( ) in this sample isIn( ) =nI( ) =n Cram er-Rao Lower Bound and Asymptotic Distri-bution of Maximum Likelihood EstimatorsSuppose that we have a random sampleX1, , Xncoming from a distribution for whichthe pdf or pmf isf(x| ), where the value of the parameter is unknown. We will show howto used Fisher Information to determine the lower Bound for the variance of an estimator ofthe parameter .Let =r(X1, , Xn) =r(X) be an arbitrary estimator of . AssumeE ( ) =m( ), andthe variance of is finite. Let us consider the random variablel n(X| ) defined in (4), it wasshown in (5) thatE [l n(X| )] = 0.

10 Therefore, the covariance between andl n(X| ) isCov [ , l n(X| )] =E {[ E ( )][l n(X| ) E (l n(X| ))]}=E {[r(X) m( )]l n(X| )}=E [r(X)l n(X| )] m( )E [l n(X| )] =E [r(X)l n(X| )]= r(x)l n(x| )fn(x| )dx1 dxn= r(x)f n(x| )dx1 dxn(Use Equation 4)= r(x)fn(x| )dx1 dxn= E [ ] =m ( )(6)6By Cauchy-Schwartz inequality and the definition ofIn( ),{Cov [ , l n(X| )]}2 Var [ ]Var [l n(X| )] = Var [ ]In( ) ,[m ( )]2 Var [ ]In( ) =nI( )Var [ ]Finally, we get the lower Bound of variance of an arbitrary estimator asVar [ ] [m ( )]2nI( )(7)The inequality (7) is called theinformation inequality, and also known as theCram er-Raoinequalityin honor of the Sweden statistician H.


Related search queries