Example: biology

Chapter 4 Parameter Estimation - Division of Social …

Chapter 4 Parameter EstimationThus far we have concerned ourselves primarily withprobability theory: what events mayoccur with what probabilities, given a model family and choices for the parameters . This isuseful only in the case where we know the precise model familyand Parameter values for thesituation of interest. But this is the exception, not the rule, for both scientific inquiry andhuman learning & inference. Most of the time, we are in the situation of processing datawhose generative source we are uncertain about. In Chapter 2we briefly covered elemen-tary density Estimation , using relative-frequency Estimation , histograms and kernel densityestimation.

Chapter 4 Parameter Estimation Thus far we have concerned ourselves primarily with probability theory: what events may occur with what probabilities, given a model family and choices for the parameters.

Tags:

  Chapter, Parameters, Estimation, Chapter 4 parameter estimation

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Chapter 4 Parameter Estimation - Division of Social …

1 Chapter 4 Parameter EstimationThus far we have concerned ourselves primarily withprobability theory: what events mayoccur with what probabilities, given a model family and choices for the parameters . This isuseful only in the case where we know the precise model familyand Parameter values for thesituation of interest. But this is the exception, not the rule, for both scientific inquiry andhuman learning & inference. Most of the time, we are in the situation of processing datawhose generative source we are uncertain about. In Chapter 2we briefly covered elemen-tary density Estimation , using relative-frequency Estimation , histograms and kernel densityestimation.

2 In this Chapter we delve more deeply into the theory of probability density es-timation, focusing on inference within parametric families of probability distributions (seediscussion in Section ). We start with some importantproperties of estimators, thenturn to basic frequentist Parameter Estimation (maximum-likelihood Estimation and correc-tions for bias), and finally basic Bayesian Parameter IntroductionConsider the situation of the first exposure of a native speaker of American English toan English variety with which she has no experience ( , Singaporean English), and theproblem of inferring the probability of use of active versuspassive voice in this variety witha simple transitive verb such ashit:(1) The ball hit the window.

3 (Active)(2) The window was hit by the ball. (Passive)There is ample evidence that this probability is contingenton a number of features of theutterance and discourse context ( , Weiner and Labov, 1983), and in Chapter 6 we coverhow to construct such richer models, but for the moment we simplify the problem by assumingthat active/passive variation can be modeled with a binomial distribution (Section ) withparameter characterizing the probability that a given potentially transitive clause eligible51for passivization will in fact be realized as a question faced by the nativeAmerican English speaker is thus, what inferences should we make about on the basis oflimited exposure to the new variety?

4 This is the problem ofparameter Estimation , andit is a central part of statistical inference. There are manydifferent techniques for parameterestimation; any given technique is called anestimator, which is applied to a set of data toconstruct an estimate. Let us briefly consider two simple estimators for our that our American English speaker has been exposed tontransi-tive sentences of the variety, andmof them have been realized in the passive voice in eligibleclauses. A natural estimate of the binomial Parameter would bem/n. Becausem/nis therelative frequency of the passive voice, this is known as therelative frequency estimate(RFE; see Section ).

5 In addition to being intuitive, we will see in Section that theRFE can be derived from deep and general principles of optimality in Estimation , RFE also has weaknesses. For instance, it makes no use of the speaker s knowledgeof her native English variety. In addition, whennis small, the RFE is unreliable: imagine,for example, trying to estimate from only two or three sentences from the new speaker presumably knows the probability of a passive inAmericanEnglish; call this probabilityq. An extremely simple estimate of would be to ignore all newevidence and set =q, regardless of how much data she has on the new variety.

6 Althoughthis option may not be as intuitive as Estimator 1, it has certain advantages: it is extremelyreliable and, if the new variety is not too different from American English, reasonably accu-rate as well. On the other hand, once the speaker has had considerable exposure to the newvariety, this approach will almost certainly be inferior torelative frequency Estimation . (SeeExercise to be included with this Chapter .)In light of this example, Section describes how to assessthe quality of an estimatorin conceptually intuitive yet mathematically precise terms. In Section , we coverfre-quentistapproaches to Parameter Estimation , which involve procedures for constructingpoint estimates of parameters .

7 In particular we focus on maximum-likelihood estimationand close variants, which for multinomial data turns out to be equivalent to Estimator Section , we coverBayesianapproaches to Parameter Estimation , which involveplacing probability distributions over the range of possible Parameter values. The Bayesianestimation technique we will cover can be thought of as intermediate between Estimators 1and Desirable properties for estimatorsIn this section we briefly cover three key properties of any estimator, and discuss the desir-ability of these this probability we implicitly conditionalize on the useof a transitive verb that is eligible for pas-sivization, excluding intransitives and also unpassivizable verbs such Levy Probabilistic Models in the Study of Languagedraft, November 6.

8 ConsistencyAn estimator isconsistentif the estimate it constructs is guaranteed to converge to thetrue Parameter value as the quantity of data to which it is applied increases. Figure that Estimator 1 in our example is consistent:as the sample size increases, theprobability that the relative-frequency estimate falls into a narrow band around the trueparameter grows asymptotically toward 1 (this behavior can also be proved rigorously; seeSection ). Estimator 2, on the other hand, is not consistent (so long as the AmericanEnglish parameterqdiffers from ), because it ignores the data completely. Consistency isnearly always a desirable property for a statistical BiasIf we view the collection (orsampling) of data from which to estimate a population pa-rameter as a stochastic process, then the Parameter estimate resulting from applying apre-determined estimator to the resulting data can be viewed as a continuous randomvariable (Section ).

9 As with any random variable, we can take its expectation. In general,it is intuitively desirable that the expected value of the estimate be equal (or at least close)to the true Parameter value , but this will not always be the case. Thebiasof an estimator is defined as the deviation of the expectation from the true value:E[ ] . All else beingequal, the smaller the bias in an estimator the more preferable. An estimator for which thebias is zero that is,E[ ] = is Estimator 1 in our passive-voice example biased? The relative-frequency estimate ismn, soE[ =E[mn]. Sincenis fixed, we can move it outside of the expectation (see linearityof the expectation in Section ) to getE[ ] =1nE[m]Butmis just the number of passive-voice utterances heard, and sincemis binomiallydistributed,E[m] = n.]

10 This means thatE[ ] =1n n= So Estimator 1 is unbiased. Estimator 2, on the other hand, has biasq . Variance (and efficiency)Suppose that our speaker has decided to use Estimator 1 to estimate the probability of apassive, and has been exposed tonutterances. The intuition is extremely strong that sheshould useallnutterances to form her relative-frequency estimate , rather than, say, usingRoger Levy Probabilistic Models in the Study of Languagedraft, November 6, 201253only the firstn/2. But why is this the case? Regardless of how many utterancesshe useswith Estimator 1, her estimate will be unbiased (think aboutthis carefully if you are notimmediately convinced).


Related search queries