Asymptotic Relative Eﬃciency in Estimation

Asymptotic Relative Efficiency in EstimationRobert Serfling University of Texas at DallasOctober 2009 Prepared for forthcomingINTERNATIONAL ENCYCLOPEDIA OF STATISTICAL SCIENCES,to be published by SpringerAsymptotic Relative efficiency of two estimatorsFor statistical Estimation problems, it is typical and evendesirable that several reasonableestimators can arise for consideration. For example, the mean and median parameters ofa symmetric distribution coincide, and so thesample meanand thesample medianbecomecompeting estimators of the point of is preferred? By what criteria shallwe make a choice?One natural and time-honored approach is simply to compare the sample sizes at whichtwo competing estimators meet a given standard of performance. This depends upon thechosen measure of performance and upon the particular population make the discussion of sample mean versus sample median more precise, consider adistribution functionFwith density functionfsymmetric about an unknown point tobe estimated.

For{X1, .. , Xn}a sample fromF, putXn=n 1 ni=1 Xiand Medn=median{X1, .. , Xn}. Each ofXnand Mednis a consistent estimator of in the senseof convergence in probability to as the sample sizen . To choose between theseestimators we need to use further information about their performance. In this regard, onekey aspect isefficiency, which answers:How spread out about is the sampling distributionof the estimator?The smaller the variance in its sampling distribution, the more efficient is that we consider large-sample sampling distributions. ForXn, the classical centrallimit theorem tells us: ifFhas finite variance 2F, then the sampling distribution ofXnisapproximatelyN( , 2F/n), , Normal with mean and variance 2F/n. For Medn, a similar Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas 75083-0688,USA. serfling. Support by NSF GrantDMS-0805786 and NSA Grant H98230-08-1-0106 is gratefully result [11] tells us: if the densityfis continuous and positive at , then the samplingdistribution of Mednis approximatelyN( ,1/4[f( )]2n).

On this basis, we considerXnandMednto perform equivalently at respective sample sizesn1andn2if 2Fn1=14[f( )] in mind that these sampling distributions are only approximations assuming thatn1andn2are large , we define theasymptotic Relative efficiency (ARE)of Med toXasthelarge-sample limitof the ration1/n2, ,ARE(Med,X, F) = 4[f( )]2 2F.(1)Definition in the general caseFor any parameter of a distributionF, and for estimators (1)and (2)approximatelyN( , V1(F)/n) andN( , V2(F)/n), respectively, theARE of (2)to (1)is given byARE( (2), (1), F) =V1(F)V2(F).(2)Interpretation. If (2)is used with a sample of sizen, the number of observations needed for (1)to perform equivalently is ARE( (2), (1), F) to the case of multidimensional parameter. For a parameter taking values inRk,and two estimators (i)which arek-variate Normal with mean and nonsingular covariancematrices i(F)/n,i= 1,2, we use (see [11])ARE( (2), (1), F) =(| 1(F)|| 2(F)|)1/k,(3)the ratio ofgeneralized variances(determinants of the covariance matrices), raised to thepower 1 with the maximum likelihood estimatorLetFhave densityf(x| ) parameterized by Rand satisfying some differentiabilityconditions with respect to.

Suppose also thatI(F) =E {[ logf(x| )]2}(theFisherinformation) is positive and finite. Then [5] it follows that (i) themaximum likelihoodestimator (ML)of is approximatelyN( ,1/I(F)n), and (ii) for a wide class of estimators that are approximatelyN( , V( , F)/n), alower boundtoV( , F) is 1/I(F). In thissituation, (2) yieldsARE( , (ML), F) =1I(F)V( , F) 1,(4)making (ML)(asymptotically) the most efficient among the given class of estimators .We note, however, as will be discussed later, that (4) does not necessarily make (ML)theestimator of choice, when certain other considerations aretaken into discussion of Estimation of point of symmetryLet us now discuss in detail the example treated above, withFa distribution with densityfsymmetric about an unknown point and{X1, .. , Xn}a sample fromF. For estimationof , we will consider not onlyXnand Mednbut also a third important versus medianLet us now formally compareXnand Mednand see how the ARE differs with choice (1) withF=N( , 2F), it is seen thatARE(Med,X, N( , 2F)) = 2/ = , for sampling from aNormaldistribution, the sample mean performs as efficiently asthe sample median using only 64% as many observations.

(Since and Fare locationand scale parameters ofF, and since the estimatorsXnand Mednare location and scaleequivariant, their ARE does not depend upon these parameters.) The superiority ofXnhereis no surprise since it is the MLE of in the modelN( , 2F).As noted above,asymptoticrelative efficiencies pertain to large sample comparisons andneed not reliably indicate small sample performance. In particular, forFNormal, theexactrelative efficiency of Med toXfor sample sizen= 5 is a very high 95%, although thisdecreases quickly, to 80% forn= 10, to 70% forn= 20, and to 64% in the sampling from adouble exponential(orLaplace) distribution with densityf(x) = e |x |/2, < x < (and thus variance 2/ 2), the above result favoringXnoverMednis reversed: (1) yieldsARE(Med,X,Laplace) = 2,so that the sample mean requires 200% as many observations toperform equivalently to thesample median. Again, this is no surprise because for this model the MLE of is compromise: the Hodges-Lehmann location estimatorWe see from the above that the ARE depends dramatically upon the shape of the densityfand thus must be used cautiously as a benchmark.

For Normal versus Laplace,Xniseither greatly superior or greatly inferior to Medn. This is a rather unsatisfactory situation,since in practice we might not be quite sure whetherFis Normal or Laplace or some othertype. A very interesting solution to this dilemma is given byan estimator that has excellentoverall performance, the so-calledHodges-Lehmann location estimator[2]:HLn= Median{Xi+Xj2},the median of all pairwise averages of the sample observations. (Some authors include thecasesi=j, some not.) We have [3] that HLnis asymptoticallyN( ,1/12[ f2(x)dx]2n),3which yields that ARE(HL,X, N( , 2F)) = 3/ = and ARE(HL,X,Laplace) = , for theLogisticdistribution with densityf(x) = 1e(x )/ /[1+e(x )/ ]2, < x < , for which HLnis the MLE of and thus optimal, we have ARE(HL,X,Logistic) = 2/9= (see [4]). Further, forFthe class of all distributions symmetric about and havingfinite variance, we have infFARE(HL,X, F) = 108/125 = (see [3]).

The estimatorHLnis highly competitive withXat Normal distributions, can be infinitely more efficientat some other symmetric distributionsF, and is never much less efficient at any distributionFinF. The computation of HLnappears at first glance to requireO(n2) steps, but a muchmore efficientO(nlogn) algorithm is available (see [6]).Efficiency versus robustness trade-offAlthough the asymptotically most efficient estimator is given by the MLE, the particularMLE depends upon the shape ofFand can be drastically inefficient when the actualFdeparts even a little bit from the nominalF. For example, if the assumed F isN( ,1) butthe actual model differs by a small amount of contamination , ,F= (1 )N( ,1) + N( , 2), thenARE(Med,X, F) =2 (1 + 1)2(1 + 2),which equals 2/ in the ideal case = 0 but otherwise as . A smallperturbation of the assumed model thus can destroy the superiority of the way around this issue is to take anonparametricapproach and seek an estimatorwith ARE satisfying a favorable lower bound.

Above we saw howthe estimator HLnmeetsthis criterion by which to evaluate and compare estimators isrobustness. Here letus use finite-samplebreakdown point (BP): the minimal fraction of sample points whichmay be taken to a limitL( , ) without the estimator also being taken toL. Arobustestimator remains stable and effective when in fact the sample is only partly fromthe nominal distributionFand contains some non-Fobservations which might be relativelyextreme single observation taken to (withnfixed) takesXnwith it, soXnhas BP = 0. Itsoptimality at Normal distributions comes at the price of a complete sacrifice of comparison, Mednhas extremely favorable BP = but at the price of a considerableloss of efficiency at Normal the other hand, the estimator HLnappeals broadly, possessingbothquite high AREover a wide class ofFand relatively high BP = 1 2 1/2= another example, consider the problem of Estimation of scale.

Two classical scaleestimators are thesample standard deviationsnand thesample MAD(median absolutedeviation about the median) MADn. They estimate scale in different ways but can beregarded as competitors in the problem of Estimation of in the modelF=N( , 2), asfollows. With both and unknown, the estimatorsnis (essentially) the MLE of and is4asymptotically most efficient. Also, for thisF, the population MAD is equal to 1(3/4) ,so that the estimator n= MADn/ 1(3/4) = MADncompetes withsnfor estimationof . (Here denotes the standard normal distribution function, and, for anyF,F 1(p)denotes thepth quantile, inf{x:F(x) p}, for 0< p <1.) To compare with respectto robustness, we note that a single observation taken to (withnfixed) takessnwithit,snhas BP = 0. On the other hand, MADnand thus nhave BP = , like , ARE( n, sn, N( , 2)) = , even worse than the ARE of Mednrelative desired is a more balanced trade-off between efficiency and robustness than providedby either ofsnand n.

Alternative scale estimators having the same BP as nbut muchhigher ARE of Relative tosnare developed in [10]. Also, further competitors offeringa range of trade-offs given by (BP, ARE) = ( , ) or ( , ) or ( , ), forexample, are developed in [12].In general, efficiency and robustness trade off against each other. Thus ARE should beconsidered in conjunction with robustness, choosing the balance appropriate to the particularapplication context. This theme is prominent in the many examples treated in [14].A few additional aspects of AREC onnections with confidence intervalsIn view of the Asymptotic normal distribution underlying the above formulation of ARE inestimation, we may also characterize the ARE given by (2) as the limiting ratio of samplesizes at which thelengths of associated confidence intervals at approximate level100(1 )%, (i) 1(1 2) Vi(F)ni, i= 1,2,converge to 0 at the same rate, when holding fixed the coverageprobability 1.

Asymptotic Relative Eﬃciency in Estimation

Tags:

Information

Transcription of Asymptotic Relative Eﬃciency in Estimation

Related search queries

Asymptotic Relative Eﬃciency in Estimation

Tags:

Information

Documents from same domain

Related documents

Related search queries