Transcription of 1 Fisher Information - Florida State University
1 Fisher InformationApril 6, 2016 Debdeep Pati1 Fisher InformationAssumeX f(x| ) (pdf or pmf) with R. DefineIX( ) =E [( logf(X| ))2]where( logf(X| ))is the derivative of the log-likelihood function evaluated at thetrue value . Fisher Information is meaningful for families of distribution which are regular:1. Fixed support:{x:f(x| )>0}is the same for all .2. logf(x| ) must exist and be finite for allxand .3. IfE |W(X)|< for all , then( )kE W(X) =( )k W(x)f(x| )dx= W(x)( )kf(x| ) Regular familiesOne parameter exponential families: Cauchy location or scale family:f(x| ) =1 (1 + (x )2)f(x| ) =1 (1 + (x/ )2)and lots more.
2 (Most families of distributions used in applications are regular). Non-regular familiesUniform(0, )Uniform( , + 1). Facts about Fisher InformationAssume a regular ( logf(X| ))= ( logf(X| ))is called the score functionS( ). ( logf(X| ))= ( logf(x| ))f(x| )dx= f(x| )f(x| )f(x| )dx= f(x| )dx= f(x| )dx= 0since f(x| )dx= 1 for all . ( ) = Var ( logf(X| )). ( logf(X| ))= 0 Var ( logf(X| ))=E ( logf(X| ))2=IX( ).3. IfX= (X1,X2,..,Xn) andX1,X2,..,Xnare independent random variables, thenIX( ) =IX1( ) +IX2( ) + IXn( ).
3 Thatf(x| ) =n i=1fi(xi| )2wherefi( | ) is the pdf (pmf) ofXi. Observe that logf(X| ) =n i=1 logfi(Xi| )and the random variables in the sum are independent. ThisVar[ logf(X| )]=n i=1 Var[ logfi(Xi| )]so thatIX( ) = ni=1 IXi( ) by IfX1,X2,..,Xnare andX= (X1,X2,..,Xn), thenIXi( ) =IX1( ) for alliso thatIX( ) =nIX1( ).5. An alternate formula for Fisher Information isIX( ) =E ( 2 2logf(X| )) f(x| )dxas f, etc. Since 1 = f, applying to both sides,0 = f= f = f f= ( logf) again,0 = ( logf)f= [( logf)f]= ( 2 2logf) f+ ( logf) f Noting that f = f f f,=( logf)f,3this becomes0 = ( 2 2logf) f+ ( logf)2 for0 =E( 2 2logf(X| ))+IX( ).
4 Example: Fisher Information for a Poisson sample. ObserveX = (X1,..,Xn) iidPoisson( ). FindIX ( ). We knowIX ( ) =nIX1( ). We shall calculateIX1( ) in threeways. LetX=X1. Preliminaries:f(x| ) = xe x!logf(x| ) =xlog logx! logf(x| ) =x 1 2 2logf(x| ) =x 2 Method #1: Observe thatIX( ) =E [( logf(X| ))2]=E [(X 1)2]= Var (X )(sinceE(X )=EX = 1)=Var(X) 2= 2= 2=1 Method #2: Observe thatIX( ) = Var ( logf(X| ))= Var(X 1)= Var(X )=1 (as in Method#1).Method #3: Observe thatIX( ) =E ( 2 2logf(X| ))=E (X 2)= 2=1.
5 4 ThusIX ( ) =nIX1( ) =n .Example: Fisher Information for Cauchy location family. SupposeX1,X2,..,Xniid withpdff(x| ) =1 (1 + (x )2).LetX = (X1,..,Xn),X f(x| ). FindIX ( ).Note thatIX ( ) =nIX1( ) =nIX( ). Now logf(x| ) = f f= 1 (1+(x )2)2 2(x )( 1)1 (1+(x )2)=2(x )(1 + (x )2)NowIX( ) = E[( logf(X| ))2]=E(2(X )1 + (X )2)2= (2(x )1 + (x )2)21 (1 + (x )2)dx=4 (x )2(1 + (x )2) ,du=dx,IX( ) =4 u2(1 +u2)3du=8 0u2(1 +u2) 1/(1 +u2),u= (1/x 1)1/2,du= (1/x 1) 1/2( 1/x2)dx,IX( ) =8 0u2(1 +u2)3du=8 0u2(1 +u2)(11 +u2)2du=8 10(1 x)x2 (1/2)(1/x 1) 1/2(1/x2)dx=4 10x1/2(1 x)1/2dx=4 10x3/2 1(1 x)3/2 1dx(Beta integral)=4 (3/2) (3/2) (3/2 + 3/2)=4 ( )22!
6 = ( ) = Uses of Fisher Information Asymptotic distribution of MLE s Cram er-Rao Inequality ( Information inequality) Asymptotic distribution of MLE s case:Iff(x| ) is a regular one-parameter family of pdf s (or pmf s) and n= n(Xn) isthe MLE based onXn= (X1,..,Xn) wherenis large andX1,..,Xnare iid fromf(x| ), then approximately, n N( ,1nI( ))whereI( ) IX1( ) and is the true value. Note thatnI( ) =IXn( ). Moreformally, n 1nI( )= nI( )( n )d N(0,1)6asn . More general case:(Assuming various regularity conditions) Iff(x | ) is a one-parameter family of joint pdf s (or joint pmf s) for dataXn= (X1.)
7 ,Xn) wherenis large (think of a large dataset arising from regression or time series model) and n= n(Xn) is the MLE, then n N( ,1 IXn( ))where is the true Estimation of the Fisher InformationIf is unknown, then so isIX( ). Two estimates Iof the Fisher informationIX( ) are I1=IX( ), I2= 2 2logf(X| )| = where is the MLE of based on the dataX. I1is the obvious plug-in estimator. Itcan be difficult to computeIX( ) does not have a known closed form. The estimator I2issuggested by the formulaIX( ) =E( 2 2logf(X| ))It is often easy to compute, and is required in many Newton- Raphson style algorithmsfor finding the MLE (so that it is already available without extra computation).
8 Thetwo estimates I1and I2are often referred to as the expected and observed Fisherinformation, 1, both estimators are consistent (after normalization) forIXn( ) under variousregularity example: in the iid case: I1/n, I2/n, andIXn( )/nall converge toI( ) IX1( ). Approximate Confidence Intervals for Choose 0< <1 (say, = ). Letz be such thatP( z < Z < z ) = 1 whereZ N(0,1). Whennis large, we have approximately IX( )( ) N(0,1)7so thatP{ z < IX( )( )< z } 1 or equivalently,P{ z 1IX( )< < +z 1IX( )} 1.
9 This approximation continues to hold whenIX( ) is replaced by an estimate I(either I1or I2):P{ z 1 I< < +z 1 I} 1 .Thus( z 1 I, +z 1 I)is an approximate 1 confidence interval for . (Here is the MLE and Iis an estimateof the Fisher Information .)3 Cramer-Rao InequalityLetX P , (x | )is a regular one-parameter family,E W(X ) = ( )for all , and ( )is differentiable, thenVar (W(X )) { ( )}2IX ( ). Facts:A.[Cov(X,Y)]2 (VarX)(VarY). This is a special case of the Cauchy-Schwarz inequal-ity.
10 It is better known to statisticians as 2 1 where =Cov(X,Y) Var(X) Var(Y)is the correlation (X,Y) =EXYif witherEX= 0 orEY= 0. This follows from the (X,Y) =EXY (EX)(EY).SinceE logf(X | ) = 0, fromB, we have[Cov (W(X ), logf(X | )] =E[W(X ) logf(X | )]= W(x )( logf(x | ))f(x | )dx = W(x ) f(x | ) dx = W(x )f(x | )dx (sincef(x | ) is a regular family)= E W(X ) = ( ).Since fromA., we have[Cov (W(X ), logf(X | )]2 VarW(X )Var( logf(X | )),[ ( )]2 Var W(X )IX ( ).Remark inA.))