Transcription of Taylor Approximation and the Delta Method
1 Taylor Approximation and the Delta MethodAlex Papanicolaou April 28, 20091 Taylor Motivating Example: Estimating the oddsSuppose we observeX1,..,Xnindependent Bernoulli(p) random variables. Typically, we areinterested inpbut there is also interest in the parameterp1 p, which is known as theodds. Forexample, if the outcomes of a medical treatment occur withp= 2/3, then the odds of getting betteris 2 : 1. Furthermore, if there is another treatment with success probabilityr, we might also beinterested in theodds ratiop1 p/r1 r, which gives the relative odds of one treatment over we wished to estimatep, we would typically estimate this quantity with the observed successprobability p= iXi/n.
2 To estimate the odds, it then seems perfectly natural to use p1 pas anestimate forp1 p. But whereas we know the variance of our estimator pisp(1 p) (check thisbe computing var( p)), what is the variance of p1 p? Or, how can we approximate its samplingdistribution?The Delta Method gives a technique for doing this and is based on using a Taylor series The Taylor SeriesDefinition:If a functiong(x)has derivatives of orderr, that isg(r)(x) =drdxrg(x)exists, then forany constanta, theTaylor polynomial of orderraboutaisTr(x) =r k=0g(k)(a)k!
3 (x a) the Taylor polynomial was introduced as far back as beginning calculus, the major theoremfrom Taylor is that theremainderfrom the Approximation , namelyg(x) Tr(x), tends to 0 fasterthan the highest-order term inTr(x).Theorem:Ifg(r)(a) =drdxrg(x)|x=aexists, thenlimx ag(x) Tr(x)(x a)r= 0. The material here is almost word for word from pp. 240-245 ofStatistical Inferenceby George Casella and RogerL. Berger and credit is really to the purposes of the Delta Method , we will only be consideringr= 1. Furthermore, we will notbe concerned with the remainder term since, (1), we are interested in approximations, and (2), wewill have a nice convergence result that says from a probabilistic point of view, the remainder Applying the Taylor TheoremLet s now put the first-order Taylor polynomial to use from a statistical point of view: LetT1.
4 ,Tkbe random variables with means 1,.., k, and defineT= (T1,..,Tk) and = ( 1,.., k).Suppose there is a differentiable functiong(T) (say, an estimator of some parameter. In ourmotivating example,T=pandg(p) =p1 p) for which we want an estimate of variance. Define thepartial derivatives asg i( ) = tig(t)|t1= 1,..,tk= k,wheret= (t1,..,tk) is just anyk-dimensional coordinates. The first-order Taylor series expansion(this is actually coming from the multivariate version of the Taylor series which shall be addressedlater) ofgabout isg(t) =g( ) +k i=1g i( )(ti i) + far, we have done nothing special.
5 Now, let s turn this into a statistical Approximation bybringing inTand dropping the remainder. This givesg(T) g( ) +k i=1g i( )(Ti i).(1)Continuing, let s take expectations on both sides (noticing that everything but theTiterms on theright-hand side are non-random) to getEg(T) g( ) +k i=1g i( )E(Ti i)=g( ).(2)We can also approximate the variance ofg(T) byVarg(T) E[(g(T) g( ))2]From Eq. (2). E(( ki=1g i( )(Ti i))2)From Eq. (1).=k i=1g i( )2 VarTi+ 2 i>jg i( )g j( )Cov(Ti,Tj),(3)where the last equality comes from expanding the square and using the definitions of variance andcovariance.
6 Notice that we have approximated the variance to our estimatorg(T) using only thevariances and covariances of theTi, which if the problem is set up well, are not terribly difficult tocompute or estimate. Let s now put this to Continuation: Estimating the OddsRecall that we wanted to gather some properties about p1 pas an estimate ofp1 p, wherepisa binomial success probability. Using the notation described in the previous section, we takeg(p) =p1 pso thatg (p) =1(1 p)2(this is a univariate this case, sok= 1 and thus there is only onederivative) andVar( p1 p) g (p)2 Var( p)=(1(1 p)2)2p(1 p)n=pn(1 p)3,giving us an Approximation for the variance of our estimator.
7 Example: Approximate Mean and VarianceSupposeXis a random variable withEX= 6= 0. If we want to estimate a functiong( ), afirst-order Approximation like before would give usg(X) =g( ) +g ( )(X ).Thus, if we useg(X) as an estimator ofg( ), we can say that approximatelyEg(X) g( ),Varg(X) g ( ) is the purpose of this? Well, suppose we takeg( ) = 1/ with unknown. If we estimate1/ with 1/X, then we can sayE(1X) 1 ,Var(1X) (1 ) we have seen, we can use these Taylor series approximations to estimate the mean and varianceestimators.
8 As mentioned earlier, we can generalize this into a convergence result akin to theCentral Limit Theorem. This result is known as theDelta The Delta Slutsky s TheoremBefore we address the main result, we first state a useful result, named after Eugene : (Slutsky s Theorem)IfWn Win distribution andZn cin probability, wherecis a non-random constant, thenWnZn cWin +Zn W+cin proof is Delta Method : A Generalized CLTT heorem:LetYnbe a sequence of random variables that satisfies n(Yn ) N(0, 2)indistribution.
9 For a given function and a specific value of , suppose thatg ( )exists and is not , n(g(Yn) g( )) N(0, 2g ( )2)in :The Taylor expansion ofg(Yn) aroundYn= isg(Yn) =g( ) +g ( )(Yn ) + Remainder,where the remainder 0 asYn . From the assumption thatYnsatisfies the standard CLT, wehaveYn in probability, so it follows that the remainder 0 in probability as well. Rearrangingterms, we have n(g(Yn) g( )) =g ( ) n(Yn ) + Slutsky s Theorem withWn=g ( ) n(Yn ) andZnas the remainder, we have theright-hand side converging toN(0, 2g ( )2), and thus the desired result follows.
10 Continuation: Approximate Mean and VarianceBefore, we considered the case of just estimatingg( ) withg(X). Suppose now we have took random sample of a population to getX1,..,Xnto get a sample mean Xn1 For 6= 0, fromthe Delta Method we have n(1 X 1 ) N(0,(1 )4 VarX1)in is pretty good! But what if we don t know the variance ofX1? Furthermore, we re trying toestimate 1/ and the variance on the right-hand side requires knowledge of . This actually posesno major problem since we shall just estimate everything to get the approximate variance Var(1 X) (1 X)S2,where 2is an estimate of the variance ofX1, say the sample variance.