Example: dental hygienist

DENSITY ESTIMATION FOR STATISTICS AND DATA ANALYSIS

Published in Monographs on STATISTICS and Applied Probability, London: Chapman and Hall, ESTIMATION FOR STATISTICS AND SilvermanSchool of Mathematics University of Bath, UKTable of ContentsINTRODUCTIONWhat is DENSITY ESTIMATION ? DENSITY estimates in the exploration and presentation of dataFurther readingSURVEY OF EXISTING METHODSI ntroductionHistogramsThe naive estimatorThe kernel estimatorThe nearest neighbour methodThe variable kernel methodOrthogonal series estimatorsMaximum penalized likelihood estimatorsGeneral weight function estimatorsBounded domains and directional dataDiscussion and bibliography1. What is DENSITY ESTIMATION ?

Maximum penalized likelihood estimators General weight function estimators Bounded domains and directional data Discussion and bibliography 1. INTROUCTION 1.1. What is density estimation? The probability density function is a fundamental concept in statistics. Consider any random quantity X that has probability density function f.

Tags:

  Estimation, Likelihood

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of DENSITY ESTIMATION FOR STATISTICS AND DATA ANALYSIS

1 Published in Monographs on STATISTICS and Applied Probability, London: Chapman and Hall, ESTIMATION FOR STATISTICS AND SilvermanSchool of Mathematics University of Bath, UKTable of ContentsINTRODUCTIONWhat is DENSITY ESTIMATION ? DENSITY estimates in the exploration and presentation of dataFurther readingSURVEY OF EXISTING METHODSI ntroductionHistogramsThe naive estimatorThe kernel estimatorThe nearest neighbour methodThe variable kernel methodOrthogonal series estimatorsMaximum penalized likelihood estimatorsGeneral weight function estimatorsBounded domains and directional dataDiscussion and bibliography1. What is DENSITY ESTIMATION ?

2 Theprobability DENSITY functionis a fundamental concept in STATISTICS . Consider any random quantityXthat has probabilitydensity functionf. Specifying the functionfgives a natural description of the distribution ofX, and allows probabilitiesassociated withXto be found from the relationSuppose, now, that we have a set of observed data points assumed to be a sample from an unknown probability DENSITY , as discussed in this book, is the construction of an estimate of the DENSITY function from the observed two main aims of the book are to explain how to estimate a DENSITY from a given data set and to explore how densityestimates can be used.

3 Both in their own right and as an ingredient of other statistical approach to DENSITY ESTIMATION isparametric. Assume that the data are drawn from one of a known parametric familyofdistributions, for example the normal distribution with mean and variance2. The densityfunderlying the data could thenbeestimated by finding estimates of and2from the data and substituting theseestimates into the formula for thenormaldensity. In this book we shall not be considering parametric estimates of this kind; the approach will be morenon parametricinthat less rigid assumptions will be made about the distribution of the observed data.

4 Although it will be assumed that thedistribution has a probability densityf, the data will be allowed to speak for themselves in determining the estimate offmorethan would be the case iffwere constrained to fall in a given parametric estimates of the kind discussed in this book were first proposed by Fix and Hodges (1951) as a way of freeing1of2203/15/2002 2:18 PMDensity ESTIMATION for STATISTICS and Data ANALYSIS - Silvermanfile:///e|/moe/HTML/March02/ ANALYSIS from rigid distributional assumptions. Since then, DENSITY ESTIMATION and related ideas have been used in avariety of contexts, some of which, including discriminant ANALYSIS , will be discussed in the final chapter of this book.

5 Theearlier chapters are mostly concerned with the question of how DENSITY estimates are constructed. In order to give a rapid feel forthe idea and scope of DENSITY ESTIMATION , one of the most important applications, to the exploration and presentation of data, willbe introduced in the next section and elaborated further by additional examples throughout thebook. It must be stressed,however, that these valuable exploratory purposes are by no means the only setting in which DENSITY estimates can be DENSITY estimates in the exploration and presentation of dataA very natural use of DENSITY estimates is in the informal investigation of the properties of a given set of data.

6 DENSITY estimatescan give valuable indication of such features as skewnessand multimodality in the data. Insome cases they will yieldconclusions that may then be regarded as self-evidently true, while in others all they will do is to point the way to furtheranalysis and/or data example is given The curves shown in this figure were constructed by Emery and Carpenter (1974) in the courseof a study of sudden infant death syndrome (also called `cot death'or `crib death'). The curve A is constructed from a particularobservation, the degranulated mast cell count, made on each of 95 infants who died suddenly and apparently unaccountably,while the cases used to construct curve B were a control sample of 76 infants who died of known causes that would not affectthe degranulated mast cell count.

7 The investigators concluded tentatively from the DENSITY estimates that the DENSITY underlyingthe sudden infant death cases might be a mixture of the control DENSITY with a smaller proportion of a contaminating densityofhigher mean. Thus it appeared that in a minority (perhaps a quarter to a third) of the sudden deaths, the degranulated mast cellcount was exceptionally high. In this example the conclusions couldonly be regarded as a cue for further clinical estimates constructed from transformed and correcteddegranulated mast cell counts observed in a cot death study. (A,Unexpected deaths; B, Hospital deaths.)

8 After Emery and Carpenter(1974) with the permission of the Canadian Foundation for the Studyof Infant Deaths. This version reproduced from Silverman (1981a)with the permission of John Wiley & Sons example is given The data from which this figure was constructed were collected in an engineeringexperiment described by Bowyer (1980). The height of a steel surface above an arbitrary level was observed at about 15 000points. The figure gives a DENSITY estimate constructed from theobserved heights. It is clear from the figure that the distributionof height is skew and has a long lower tail.

9 The tails of the distribution are particularly important to the engineer, because theupper tail represents the part of the surface which might come into contact with other surfaces, while the lower tail representshollows where fatigue cracks can start and also where lubricant might gather. The non-normality of the DENSITY on the Gaussian models typically used to model these surfaces, since these models would lead to a normal distributionofheight. Models which allow a skew distribution of height would be more appropriate, and one such class of models wassuggested for this data set by Adler and Firman (1981).

10 2of2203/15/2002 2:18 PMDensity ESTIMATION for STATISTICS and Data ANALYSIS - Silvermanfile:///e|/moe/HTML/March02/ estimate constructed from observations of theheight of a steel surface. After Silverman (1980) with thepermission of Academic Press, Inc. This version reproduced fromSilverman (1981a) with the permission of John Wiley & third example is given The data used to construct this curve are a standard directional data set and consist of thedirections in which each of 76 turtles was observed to swim whenreleased. It is clear that most of the turtles show a preferencefor swimming approximately in the 60 direction, while a small proportion prefer exactly the opposite direction.


Related search queries