Transcription of Lecture 6. Bayesian estimation
1 6. Bayesian estimationLecture 6. Bayesian estimation1 (1 72)6. Bayesian The parameter as a random variableThe parameter as a random variableSo far we have seen thefrequentistapproach to statistical inferential statements about are interpreted in terms of repeat contrast, the Bayesian approach treats as a random variabletakingvalues in .The investigator s information and beliefs about the possible values for ,before any observation of data, are summarised by aprior distribution ( ).When dataX=xare observed, the extra information about is combinedwith the prior to obtain theposterior distribution ( |x) for givenX= has been a long-running argument between proponents of thesedifferent approaches to statistical inferenceRecently things have settled down, and Bayesian methods are seen to beappropriate in huge numbers of application where one seeks to assess aprobability about a state of the world.
2 Examples are spam filters, text and speech recognition, machine learning,bioinformatics, health economics and (some) clinical 6. Bayesian estimation2 (1 72)6. Bayesian Prior and posterior distributionsPrior and posterior distributionsBy Bayes theorem, ( |x) =fX(x| ) ( )fX(x),wherefX(x) = fX(x| ) ( )d for continuous , andfX(x) = fX(x| i) ( i) in the discrete ( |x) fX(x| ) ( )(1)posterior likelihood prior,where the constant of proportionality is chosen to make the total mass of theposterior distribution equal to practice we use (1) and often we can recognise the family for ( |x).
3 It should be clear that the data enter through the likelihood, and so theinference is automatically based on any sufficient 6. Bayesian estimation3 (1 72)6. Bayesian Prior and posterior distributionsInference about a discrete parameterSuppose I have 3 coins in my pocket,1biased 3:1 in favour of tails2a fair coin,3biased 3:1 in favour of headsI randomly select one coin and flip it once, observing a head. What is theprobability that I have chosen coin 3?LetX= 1 denote the event that I observe a head,X= 0 if a tail denote the probability of a head: ( , , )Prior:p( = ) =p( = ) =p( = ) = mass function:p(x| ) = x(1 )(1 x) Lecture 6.
4 Bayesian estimation4 (1 72)6. Bayesian Prior and posterior distributionsPrior Likelihood Un-normalised NormalisedPosteriorPosteriorCoin p( )p(x= 1| )p(x= 1| )p( )p(x=1| )p( )p(x) The normalising constant can be calculated asp(x) = ip(x| i)p( i)So observing a head on a single toss of the coin means that there is now a 50%probability that the chance of heads is and only a probability that thechance of heads in 6. Bayesian estimation5 (1 72)6. Bayesian Prior and posterior distributionsBayesian inference - how did it all start?In 1763, Reverend Thomas Bayes of tunbridge Wells wroteIn modern language, givenr Binomial( ,n), what isP( 1< < 2|r,n)?
5 Lecture 6. Bayesian estimation6 (1 72)6. Bayesian Prior and posterior distributionsExample we are interested in the true mortality risk in a hospitalHwhich isabout to try a new operation. On average in the country around 10% of peopledie, but mortality rates in different hospitals vary from around 3% to around 20%.HospitalHhas no deaths in their first 10 operations. What should we believeabout ?LetXi= 1 if theith patient dies inH(zero otherwise),i= 1,.., (x| ) = xi(1 )n a priori that Beta(a,b) for some knowna>0,b>0, so that ( ) a 1(1 )b 1,0< < the posterior is ( |x) fX(x| ) ( ) xi+a 1(1 )n xi+b 1,0< < recognise this as Beta( xi+a,n xi+b) and so ( |x) = xi+a 1(1 )n xi+b 1B( xi+a,n xi+b)for 0< <1.
6 Lecture 6. Bayesian estimation7 (1 72)6. Bayesian Prior and posterior distributionsIn practice, we need to find a Beta prior distribution that matches ourinformation from other turns out that a Beta(a=3,b=27) prior distribution has mean andP( < < ) = data is xi= 0,n= the posterior is Beta( xi+a,n xi+b) = Beta(3, 37)This has mean 3/40 = Even though nobody has died so far, the mle = xi/n= 0 ( it isimpossible that any will ever die) does not seem ("LearnBayes")library(LearnBayes)prior = c( a= 3, b = 27 ) # beta priordata = c( s = 0, f = 10 )
7 # s events out of f trialstriplot(prior,data) Lecture 6. Bayesian estimation8 (1 72)6. Bayesian Prior and posterior distributionsLecture 6. Bayesian estimation9 (1 72)6. Bayesian ConjugacyConjugacyFor this problem, a beta prior leads to a beta posterior. We say that the betafamily is aconjugatefamily of prior distributions for Bernoulli thata=b= 1 so that ( ) = 1,0< <1 - the uniformdistribution (called the principle of insufficient reason by Laplace, 1774) .Then the posterior is Beta( xi+ 1,n xi+ 1), with xi+1n+2 xin( xi+1)(n xi+1)(n+2)2(n+3)Notice that the mode of the posterior is the posterior mean estimator, Xi+1n+2is discussed in Lecture 2, where weshowed that this estimator had smaller mse than the mle for non-extremevalues of.
8 Known as Laplace s posterior variance is bounded above by 1/(4(n+ 3)), and this is smallerthan the prior variance, and is smaller for , note the posterior automatically depends on the data through thesufficient 6. Bayesian estimation10 (1 72)6. Bayesian Bayesian approach to point estimationBayesian approach to point estimationLetL( ,a) be the loss incurred in estimating the value of a parameter to beawhen the true value is .Common loss functions are quadratic lossL( ,a) = ( a)2, absolute errorlossL( ,a) =| a|, but we can have our estimate isa, the expected posterior loss ish(a) = L( ,a) ( |x)d.
9 TheBayes estimator minimises the expected posterior lossh(a) = (a )2 ( |x)d .h (a) = 0 ifa ( |x)d = ( |x)d .So = ( |x)d , theposterior mean, minimisesh(a). Lecture 6. Bayesian estimation11 (1 72)6. Bayesian Bayesian approach to point estimationForabsolute error loss,h(a)= | a| ( |x)d = a (a ) ( |x)d + a( a) ( |x)d =a a ( |x)d a ( |x)d + a ( |x)d a a ( |x)d Nowh (a) = 0 if a ( |x)d = a ( |x)d .This occurs when each side is 1/2 (since the two integrals must sum to 1) so is theposterior 6. Bayesian estimation12 (1 72)6. Bayesian Bayesian approach to point estimationExample thatX1.
10 ,Xnare iid N( ,1), and that a priori N(0, 2) forknown posterior is given by ( |x) fX(x| ) ( ) exp[ 12 (xi )2]exp[ 2 22] exp[ 12(n+ 2){ xin+ 2}2](check).So the posterior distribution of givenxis a Normal distribution with mean xi/(n+ 2) and variance 1/(n+ 2).The normal density is symmetric, and so the posterior mean and the posteriormedian have the same value xi/(n+ 2).This is the optimal Bayes estimate of under both quadratic and absoluteerror 6. Bayesian estimation13 (1 72)6. Bayesian Bayesian approach to point estimationExample thatX1,..,Xnare iid Poisson( ) rv s and that has an exponentialdistribution with mean 1, so that ( ) =e , > posterior distribution is given by ( |x) e n xie = xie (n+1) , >0,ie Gamma( xi+ 1,n+ 1).