Parameter Estimation - ML vs. MAP - fu-berlin.de

ParameterEstimationPeter NRobinsonEstimatingParametersfrom DataMaximumLikelihood(ML)EstimationBetad istributionMaximum aposteriori(MAP)EstimationMAQP arameter EstimationML vs. MAPP eter N RobinsonDecember 14, 2012 ParameterEstimationPeter NRobinsonEstimatingParametersfrom DataMaximumLikelihood(ML)EstimationBetad istributionMaximum aposteriori(MAP)EstimationMAQE stimating parameters from DataIn many situations in bioinformatics, we want to estimate op-timal parameters from data. In the examples we have seen inthe lectures on variant calling, these parameters might be theerror rate for reads, the proportion of a certain genotype, theproportion of nonreference bases etc.

However, the hello worldexample for this sort of thing is the coin toss, so we will startwith NRobinsonEstimatingParametersfrom DataMaximumLikelihood(ML)EstimationBetad istributionMaximum aposteriori(MAP)EstimationMAQCoin tossLet s say we have two coins that are each tossed 10 timesCoin 1: H,T,T,H,H,H,T,H,T,TCoin 2: T,T,T,H,T,T,T,H,T,TIntuitively, we might guess that coin one is a fair coin, ,P(X=H) = , and that coin 2 is biased, ,P(X=H)6= NRobinsonEstimatingParametersfrom DataMaximumLikelihood(ML)EstimationBetad istributionMaximum aposteriori(MAP)EstimationMAQD iscrete Random VariableLet us begin to formalize this. We model the coin toss processas outcome of a single coin toss is a random variableXthat can take on values in a setX={x1,x2.}

,xn}In our example, of course,n= 2, and the values arex1= 0 (tails) andx2= 1 (heads)We then have a probability mass functionp:X [0,1];the law of total probability states that x Xp(xi) = 1 This is a Bernoulli distribution with Parameter :p(X= 1; ) = (1)ParameterEstimationPeter NRobinsonEstimatingParametersfrom DataMaximumLikelihood(ML)EstimationBetad istributionMaximum aposteriori(MAP)EstimationMAQP robability of sequence of eventsIn general, for a sequence of two eventsX1andX2, the jointprobability isP(X1,X2) =p(X2|X1)p(X1)(2)Since we assume that the sequence is iid (identically andindependently distributed), by definitionp(X2|X1) =P(X2).Thus, for a sequence ofnevents (coin tosses), we havep(x1,x2.)

,xn; ) =n i=1p(xi; )(3)if the probability of heads is 30%, the the probability of thesequence for coin 2 can be calculated asp(T,T,T,H,T,T,T,H,T,T; ) = 2(1 )8=(310)2(710)8(4)ParameterEstimationPet er NRobinsonEstimatingParametersfrom DataMaximumLikelihood(ML)EstimationBetad istributionMaximum aposteriori(MAP)EstimationMAQP robability of sequence of eventsThus far, we have consideredp(x; ) as a function ofx,parametrized by . If we viewp(x; ) as a function of , thenit is called thelikelihood likelihood Estimation basically chooses a value of that maximizes the likelihood function given the observed NRobinsonEstimatingParametersfrom DataMaximumLikelihood(ML)EstimationBetad istributionMaximum aposteriori(MAP)EstimationMAQM aximum likelihood for BernoulliThe likelihood for a sequence of Bernoulli randomvariablesX= [x1,x2.]

,xn] withxi {0,1}is thenp(X; ) =n i=1p(xi; ) =n i=1 xi(1 )1 xi(5)We usually maximize the log likelihood function rather than theoriginal functionOften easier to take the derivativethe log function is monotonically increasing, thus, themaximum (argmax) is the sameAvoid numerical problems involved with multiplying lotsof small numbersParameterEstimationPeter NRobinsonEstimatingParametersfrom DataMaximumLikelihood(ML)EstimationBetad istributionMaximum aposteriori(MAP)EstimationMAQLog likelihoodThus, instead of maximizing thisp(X; ) =n i=1 xi(1 )1 xi(6)we maximize thislogp(X; ) = logn i=1 xi(1 )1 xi=n i=1log{ xi(1 )1 xi}=n i=1[log xi+ log(1 )1 xi]=n i=1[xilog + (1 xi) log(1 )]ParameterEstimationPeter NRobinsonEstimatingParametersfrom DataMaximumLikelihood(ML)EstimationBetad istributionMaximum aposteriori(MAP)EstimationMAQLog likelihoodNote that one often denotes the log likelihood function withthe symbolL= logp(X; ).

A function f defined on a subset of the real numbers with realvalues is called monotonic (also monotonically increasing, in-creasing or non-decreasing), if for allxandysuch thatx yone hasf(x) f(y)Thus, the monotonicity of the log function guarantees thatargmax p(X; ) = argmax logp(X; )(7)ParameterEstimationPeter NRobinsonEstimatingParametersfrom DataMaximumLikelihood(ML)EstimationBetad istributionMaximum aposteriori(MAP)EstimationMAQML estimateThe ML estimate of the Parameter is thenargmax n i=1[xilog + (1 xi) log(1 )](8)We can calculate the argmax by setting the first derivativeequal to zero and solving for ParameterEstimationPeter NRobinsonEstimatingParametersfrom DataMaximumLikelihood(ML)EstimationBetad istributionMaximum aposteriori(MAP)EstimationMAQML estimateThus logp(X.)

=n i=1 [xilog + (1 xi) log(1 )]=n i=1xi log +n i=1(1 xi) log(1 )=1 n i=1xi 11 n i=1(1 xi)ParameterEstimationPeter NRobinsonEstimatingParametersfrom DataMaximumLikelihood(ML)EstimationBetad istributionMaximum aposteriori(MAP)EstimationMAQML estimateand finally, to find the maximum we set logp(X; ) = 0:0 =1 n i=1xi 11 n i=1(1 xi)1 = ni=1(1 xi) ni=1xi1 1 = ni=11 ni=1xi 11 =n ni=1xi ML=1nn i=1xiReassuringly, the maximum likelihood estimate is just theproportion of flips that came out NRobinsonEstimatingParametersfrom DataMaximumLikelihood(ML)EstimationBetad istributionMaximum aposteriori(MAP)EstimationMAQP roblems with ML estimationDoes it really make sense thatH,T,H,T = ,T,T,T = ,T,T,T = Estimation does not incorporate any prior knowledge anddoes not generate an estimate of the certainty of its NRobinsonEstimatingParametersfrom DataMaximumLikelihood(ML)EstimationBetad istributionMaximum aposteriori(MAP)

EstimationMAQM aximum a posteriori EstimationBayesian approaches try to reflect our belief about . In thiscase, we will consider to be a random ( |X) =p(X| )p( )p(X)(9)Thus, Bayes law converts our prior belief about the Parameter (before seeing data) into a posterior probability,p( |X), byusing the likelihood functionp(X| ). The maximuma-posteriori (MAP) estimate is defined as MAP= argmax p( |X)(10)ParameterEstimationPeter NRobinsonEstimatingParametersfrom DataMaximumLikelihood(ML)EstimationBetad istributionMaximum aposteriori(MAP)EstimationMAQM aximum a posteriori EstimationNote that becausep(X) does not depend on , we have MAP= argmax p( |X)= argmax p(X| )p( )p(X)= argmax p(X| )p( )This is essentially the basic idea of the MAP equation used bySNVMix for variant callingParameterEstimationPeter NRobinsonEstimatingParametersfrom DataMaximumLikelihood(ML)EstimationBetad istributionMaximum aposteriori(MAP)EstimationMAQMAP Estimation ; What does it buy us?

To take a simple example of a situation in which MAP estima-tion might produce better results than ML Estimation , let usconsider a statistician who wants to predict the outcome of thenext election in the statistician is able to gather data on party preferencesby asking people he meets at the wall Street Golf Club1which party they plan on voting for in the next electionThe statistician asks 100 people, seven of whom answer Democrats . This can be modeled as a series ofBernoullis, just like the coin this case, the maximum likelihood estimate of theproportion of voters in the USA who will vote democraticis ML= , a notorious haven of ultraconservative RepublicansParameterEstimationPeter NRobinsonEstimatingParametersfrom DataMaximumLikelihood(ML)EstimationBetad istributionMaximum aposteriori(MAP)EstimationMAQMAP Estimation ; What does it buy us?

Somehow, the estimate of ML= doesn t seem quite rightgiven our previous experience that about half of the electoratevotes democratic, and half votes republican. But how shouldthe statistician incorporate this prior knowledge into hisprediction for the next election?The MAP Estimation procedure allows us to inject our priorbeliefs about Parameter values into the new NRobinsonEstimatingParametersfrom DataMaximumLikelihood(ML)EstimationBetad istributionMaximum aposteriori(MAP)EstimationMAQBeta distribution: BackgroundThe Beta distribution is appropriate to express prior belief abouta Bernoulli distribution. The Beta distribution is a family ofcontinuous distributions defined on [0,1] and parametrized bytwo positive shape parameters , and p( ) =1B( , ) 1(1 ) 1here, [0,1], andB( , ) = ( + ) ( ) ( )where is the Gamma function(extension of factorial).

Parameter Estimation - ML vs. MAP - fu-berlin.de

Tags:

Information

Transcription of Parameter Estimation - ML vs. MAP - fu-berlin.de

Related search queries

Parameter Estimation - ML vs. MAP - fu-berlin.de

Tags:

Information

Documents from same domain

Related documents

Related search queries