Example: barber

Mathematical Statistics - ETH Z

Mathematical StatisticsSara van de GeerSeptember 20102 Contents1 Some notation and model assumptions .. Estimation .. Comparison of estimators: risk functions .. Comparison of estimators: sensitivity .. Confidence intervals .. confidence sets and tests .. Intermezzo: quantile functions .. How to construct tests and confidence sets .. An illustration: the two-sample problem .. normality .. nonparametric test .. of Student s test and Wilcoxon s test .. How to construct estimators .. estimators .. method of moments .. methods .. 232 Decision Decisions and their risk .. Admissibility .. Minimaxity .. Bayes decisions .. Intermezzo: conditional distributions .. Bayes methods .. Discussion of Bayesian approach (to be written) .. Integrating parameters out (to be written).

10 CHAPTER 1. INTRODUCTION The class F 0 is for example modeled as the class of all symmetric distributions, that is F 0:= {F 0(x) = 1 −F 0(−x),∀x}.(1.2) This is an infinite-dimensional collection: it is not parametrized by a finite dimensional parameter.

Tags:

  Statistics

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Mathematical Statistics - ETH Z

1 Mathematical StatisticsSara van de GeerSeptember 20102 Contents1 Some notation and model assumptions .. Estimation .. Comparison of estimators: risk functions .. Comparison of estimators: sensitivity .. Confidence intervals .. confidence sets and tests .. Intermezzo: quantile functions .. How to construct tests and confidence sets .. An illustration: the two-sample problem .. normality .. nonparametric test .. of Student s test and Wilcoxon s test .. How to construct estimators .. estimators .. method of moments .. methods .. 232 Decision Decisions and their risk .. Admissibility .. Minimaxity .. Bayes decisions .. Intermezzo: conditional distributions .. Bayes methods .. Discussion of Bayesian approach (to be written) .. Integrating parameters out (to be written).

2 Intermezzo: some distribution theory .. multinomial distribution .. Poisson distribution .. distribution of the maximum of two random variables Sufficiency .. Rao-Blackwell .. Factorization Theorem of Neyman .. Exponential families .. Canonical form of an exponential family .. Minimal sufficiency .. 533 Unbiased What is an unbiased estimator? .. UMVU estimators .. Statistics .. The Cramer-Rao lower bound .. Higher-dimensional extensions .. Uniformly most powerful tests .. example .. tests and exponential families .. tests .. tests .. 774 Equivariant Equivariance in the location model .. Equivariance in the location-scale model (to be written) .. 865 Proving admissibility and Minimaxity .. Admissibility .. Inadmissibility in higher-dimensional settings (to be written).

3 956 Asymptotic Types of convergence .. order symbols .. implications of convergence .. Consistency and asymptotic normality .. linearity .. -technique .. M-estimators .. of M-estimators .. normality of M-estimators .. Plug-in estimators .. of plug-in estimators .. normality of plug-in estimators .. Asymptotic relative efficiency .. Asymptotic Cramer Rao lower bound .. Cam s 3rdLemma .. Asymptotic confidence intervals and tests .. likelihood .. ratio tests .. Complexity regularization(to be written).. 1397 Literature141 CONTENTS5 These notes in English will closely followMathematische Statistik, by unsch (2005), but are as yet Statistikcan be usedas supplementary reading material in rigor and clarity often bite each other. At some places, not allsubtleties are fully presented. A snake will indicate 1 IntroductionStatistics is about the Mathematical modeling of observable phenomena, usingstochastic models, and about analyzing data: estimating parameters of themodel and testing hypotheses.

4 In these notes, we study various estimation andtesting procedures. We consider their theoretical properties and we investigatevarious notions of Some notation and model assumptionsThe data consist of measurements (observations)x1,..,xn, which are regardedas realizations of random variablesX1,..,Xn. In most of the notes, theXiare real-valued:Xi R(fori= 1,..,n), although we will also consider someextensions to vector-valued and Foucault developed methods for estimating thespeed of light (1849, 1850), which were later improved by Newcomb and Michel-son. The main idea is to pass light from a rapidly rotating mirror to a fixedmirror and back to the rotating mirror. An estimate of the velocity of lightis obtained, taking into account the speed of the rotating mirror, the distancetravelled, and the displacement of the light as it returns to the rotating 1 The data are Newcomb s measurements of the passage time it took light totravel from his lab, to a mirror on the Washington Monument, and back to 1.

5 INTRODUCTION distance: measurements on 3 consecutive daysfirst measurement: seconds= 24828 nanosecondsThe dataset has the deviations from 24800 measurements on 3 different days:llllllllllllllllllll0510152025 4002040day 1t1X1llllllllllllllllllllll202530354045 4002040day 2t2X2llllllllllllllllllllllllll404550556 065 4002040day 3t3X3 All measurements in one plot:lllllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllll010203040 5060 40 SOME NOTATION AND MODEL ASSUMPTIONS9 One may estimate the speed of light using the mean, or the median, orHuber s estimate (see below). This gives the following results (for the 3 daysseparately, and for the three days combined):MeanMedianHuberDay 1 Day 2 Day 1 The question which estimate is the best one is one of the topics of these collection of observations will be denoted byX={X1,..,Xn}. Thedistribution ofX, denoted by IP, is generally unknown.

6 A statistical model isa collection of assumptions about this unknown will usually assume that the observationsX1,..,Xnare independent andidentically distributed ( ). Or, to formulate it differently,X1,.., copies from some population random variable, which we denote common distribution, that is: the distribution ofX, is denoted byP. ForX R, the distribution function ofXis written asF( ) =P(X ).Recall that the distribution functionFdetermines the distributionP(and viseversa).Further model assumptions then concern the modeling ofP. We write sucha model asP P, wherePis a given collection of probability measures, theso-called model following example will serve to illustrate the concepts that are to real-valued. The location model isP:={P ,F0(X ) :=F0( ), R, F0 F0},( )whereF0is a given collection of distribution functions. Assuming the expec-tation exist, we center the distributions inF0to have mean zero.

7 ThenP ,F0has mean . We call a location parameter. Often, only is the parameter ofinterest, andF0is a so-called nuisance 1. INTRODUCTIONThe classF0is for example modeled as the class of all symmetric distributions,that isF0:={F0(x) = 1 F0( x), x}.( )This is an infinite-dimensional collection: it is not parametrized by a finitedimensional parameter. We then callF0an infinite-dimensional finite-dimensional model is for exampleF0:={ ( / ) : >0},( )where is the standard normal distribution , the location model isXi= + i, i= 1,..,n,with 1,.., and, under model ( ), symmetrically but otherwise un-known distributed and, under model ( ),N(0, 2)-distributed with unknownvariance EstimationA parameter is an aspect of the unknown distribution. An estimatorTis somegiven functionT(X) of the observationsX. The estimator is constructed toestimate some unknown parameter, Example , one may consider the following estimators of : The average 1:=1nN i= that 1minimizes the squared lossn i=1(Xi ) can be shown that 1is a good estimator if the model ( ) holds.

8 When( ) is not true, in particular when there areoutliers(large, wrong , obser-vations) (Ausreisser), then one has to apply a morerobustestimator. The (sample) median is 2:={X((n+1)/2)whennodd{X(n/2)+X(n/2+1)}/ 2 whennis even,whereX(1) X(n) are the order Statistics . Note that 2is a minimizerof the absolute lossn i=1|Xi |. ESTIMATION11 The Huber estimator is 3:= arg min n i=1 (Xi ),( )where (x) ={x2if|x| kk(2|x| k) if|x|> k,withk >0 some given threshold. We finally mention the -trimmed mean, defined, for some 0< <1, as 4:=1n 2[n ]n [n ] i=[n ]+1X(i).NoteTo avoid misunderstanding, we note that in ( ), is used as variableover which is minimized, whereas in ( ), is a parameter. These are actuallydistinct concepts, but it is a general convention to abuse notation and employthe same symbol . When further developing the theory (see Chapter 6) weshall often introduce a new symbol for the variable, , ( ) is written as 3:= arg mincn i=1 (Xi c).}}

9 An example of a nonparametric estimator is the empirical distribution function Fn( ) :=1n#{Xi ,1 i n}.This is an estimator of the theoretical distribution functionF( ) :=P(X ).Any reasonable estimator is constructed according the so-called aplug-in princi-ple(Einsetzprinzip). That is, the parameter of interest is written as =Q(F),withQsome given map. The empirical distribution Fnis then plugged in , toobtain the estimatorT:=Q( Fn). (We note however that problems can arise, ( Fn) may not be well-defined ..).Examples are the above estimators 1,.., 4of the location parameter . Wedefine the mapsQ1(F) := xdF(x)(the mean, or point of gravity, ofF), andQ2(F) :=F 1(1/2)(the median ofF), andQ3(F) := arg min ( )dF,12 CHAPTER 1. INTRODUCTIONand finallyQ4(F) :=11 2 F 1(1 )F 1( )xdF(x).Then kcorresponds toQk( Fn),k= 1,..,4. If the model ( ) is correct, 1,.., 4are all estimators of.

10 If the model is incorrect, eachQk( Fn) is stillan estimator ofQk(F) (assuming the latter exists), but theQk(F) may all bedifferent aspects Comparison of estimators: risk functionsA risk functionR( , ) measures the loss due to the error of an estimator. Therisk depends on the unknown distribution, in the location model, on and/orF0. Examples areR( ,F0, ) :={IE ,F0| |pIP ,F0(| |> a)..Herep 1 anda >0 are chosen by the is anequivariantestimator, the above risks no longer depend on . Anestimator := (X1,..,Xn) is called equivariant if (X1+c,..,Xn+c) = (X1,..,Xn) +c, , writingIPF0:= IP0,F0,(and likewise for the expectation IEF0), we have for allt >0IP ,F0( t) = IPF0( t),that is, the distribution of does not depend on . We then writeR( ,F0, ) :=R(F0, ) :={IEF0| |pIPF0(| |> a).. Comparison of estimators: sensitivityWe can compare estimators with respect to their sensitivity to large errors inthe data.}}


Related search queries