Example: bankruptcy

Lecture 20 | Bayesian analysis

STATS 200: Introduction to Statistical InferenceAutumn 2016 Lecture 20 Bayesian analysisOur treatment of parameter estimation thus far has assumed that is an unknown butnon-random quantity it is some fixed parameter describing the true distribution of data,and our goal was to determine this parameter. This is the called thefrequentistparadigmof statistical inference. In this and the next Lecture , we will describe an alternativeBayesianparadigm, in which itself is modeled as a random variable. The Bayesian paradigm natu-rally incorporates our prior belief about the unknown parameter , and updates this beliefbased on observed Prior and posterior distributionsRecall that ifX,Yare two random variables having joint PDF or PMFfX,Y(x,y), then themarginal distributionofXis given by the PDFfX(x) = fX,Y(x,y)dyin the continuous case and by the PMFfX(x) = yfX,Y(x,y)in the discrete case; this describes the probability distribution ofXalone. TheconditionaldistributionofYgivenX=xis defined by the PDF or PMFfY|X(y|x) =fX,Y(x,y)fX(x),and represents the probability distribution ofYif it is known thatX=x.

Lecture 20 | Bayesian analysis Our treatment of parameter estimation thus far has assumed that is an unknown but non-random quantity|it is some xed parameter describing the true distribution of data, and our goal was to determine this parameter. This is the called the frequentist paradigm of statistical inference.

Tags:

  Lecture, Analysis

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Lecture 20 | Bayesian analysis

1 STATS 200: Introduction to Statistical InferenceAutumn 2016 Lecture 20 Bayesian analysisOur treatment of parameter estimation thus far has assumed that is an unknown butnon-random quantity it is some fixed parameter describing the true distribution of data,and our goal was to determine this parameter. This is the called thefrequentistparadigmof statistical inference. In this and the next Lecture , we will describe an alternativeBayesianparadigm, in which itself is modeled as a random variable. The Bayesian paradigm natu-rally incorporates our prior belief about the unknown parameter , and updates this beliefbased on observed Prior and posterior distributionsRecall that ifX,Yare two random variables having joint PDF or PMFfX,Y(x,y), then themarginal distributionofXis given by the PDFfX(x) = fX,Y(x,y)dyin the continuous case and by the PMFfX(x) = yfX,Y(x,y)in the discrete case; this describes the probability distribution ofXalone. TheconditionaldistributionofYgivenX=xis defined by the PDF or PMFfY|X(y|x) =fX,Y(x,y)fX(x),and represents the probability distribution ofYif it is known thatX=x.

2 (This is a PDF orPMF as a function ofy, for any fixedx.) Defining similarly the marginal distributionfY(y)ofYand the conditional distributionfX|Y(x|y) ofXgivenY=y, the joint PDFfX,Y(x,y)factors in two ways asfX,Y(x,y) =fY|X(y|x)fX(x) =fX|Y(x|y)fY(y).In Bayesian analysis , before data is observed, the unknown parameter is modeled as arandom variable having a probability distributionf ( ), called theprior distribution represents our prior belief about the value of this parameter. Conditionalon = , the observed dataXis assumed to have distributionfX| (x| ), wherefX| (x| )defines a parametric model with parameter , as in our previous joint distri-bution of andXis then the productfX, (x, ) =fX| (x| )f ( ),1 For notational simplicity, we are considering here a single data valueX, but this extends naturally tothe case whereX= (X1,..,Xn) is a data vector andfX| (x| ) is the joint distribution ofXgiven .20-1and the marginal distribution ofX(in the continuous case) isfX(x) = fX, (x, )d = fX| (x| )f ( )d.

3 The conditional distribution of givenX=xisf |X( |x) =fX, (x, )fX(x)=fX| (x| )f ( ) fX| (x| )f ( )d .( )This is called theposterior distributionof : It represents our knowledge about theparameter after having observed the dataX. We often summarize the preceding equationsimply asf |X( |x) fX| (x| )f ( )( )Posterior density Likelihood Prior densitywhere the symbol hides the proportionality factorfX(x) = fX| (x| )f ( )d whichdoes not depend on .Example (0,1) be the probability of heads for a biased coin, and letX1,..,Xnbe the outcomes ofntosses of this coin. If we do not have any prior informationaboutP, we might choose for its prior distribution Uniform(0,1), having PDFfP(p) = 1for allp (0,1). GivenP=p, we modelX1,..,XnIID Bernoulli(p). Then the jointdistribution ofP,X1,..,Xnis given byfX,P(x1,..,xn,p) =fX|P(x1,..,xn|p)fP(p)=n i=1pxi(1 p)1 xi 1 =p ni=1xi(1 p)n ni= +..+xn. The marginal distribution ofX1,..,Xnis obtained by integratingfX,P(x1.)

4 ,xn,p) overp:fX(x1,..,xn) = 10ps(1 p)n sdp=B(s+ 1,n s+ 1)whereB(x,y) is the Beta functionB(x,y) = (x) (y) (x+y).Hence the posterior distribution ofPgivenX1=x1,..,Xn=xnhas PDFfP|X(p|x1,..,xn) =fX,P(x1,..,xn,p)fX(x1,..,xn)=1B(s+ 1,n s+ 1)ps(1 p)n is the PDF of the Beta(s+ 1,n s+ 1) distribution2, so the posterior distribution ofPgivenX1=x1,..,Xn=xnis Beta(s+ 1,n s+ 1), wheres=x1+..+ Beta( , ) distribution is a continuous distribution on (0,1) with PDFf(x) =1B( , )x 1(1 x) computed explicitly the marginal distributionfX(x1,..,xn) above, but this was notnecessary to arrive at the answer. Indeed, equation ( ) givesfP|X(p|x1,..,xn) fX|P(x1,..,xn|p)fP(p) =ps(1 p)n tells us that the PDF of the posterior distribution ofPis proportional tops(1 p)n s,as a function ofp. Then it must be the PDF of the Beta(s+ 1,n s+ 1) distribution,and the proportionality constant must be whatever constant is required to make this PDFintegrate to 1 overp (0,1). We will repeatedly use this trick to simplify our calculationsof posterior now we have a prior belief thatPis close to 1/2.

5 There arevarious prior distributions that we can choose to encode this belief; it will turn out to bemathematically convenient to use the prior distribution Beta( , ), which has mean 1/2 andvariance 1/(8 + 4). The constant may be chosen depending on how confident we are, apriori, thatPis near 1/2 choosing = 1 reduces to the Uniform(0,1) prior of the previousexample, whereas choosing >1 yields a prior distribution more concentrated around 1 prior distribution Beta( , ) has PDFfP(p) =1B( , )p 1(1 p) 1. Then, applyingequation ( ), the posterior distribution ofPgivenX1=x1,..,Xn=xnhas PDFfP|X(p|x1,..,xn) fX|P(x1,..,xn|p)fP(p) ps(1 p)n s p 1(1 p) 1=ps+ 1(1 p)n s+ 1,wheres=x1+..+xnas before, and where the symbol hides any proportionalityconstants that do not depend onp. This is proportional to the PDF of the distributionBeta(s+ ,n s+ ), so this Beta distribution is the posterior distribution the previous example, the parametric form for the prior was (cleverly) chosen so thatthe posterior would be of the same form they were both Beta distributions.

6 This type ofprior is called aconjugate priorforPin the Bernoulli model. Use of a conjugate prioris mostly for mathematical and computational convenience in principle, any priorfP(p)on (0,1) may be used. The resulting posterior distribution may be not be a simple nameddistribution with a closed-form PDF, but the PDF may be computed numerically fromequation ( ) by numerically evaluating the integral in the denominator of this (0, ) be the parameter of the Poisson modelX1,..,XnIID Poisson( ). As a prior distribution for , let us take the Gamma distribution Gamma( , ).The prior and likelihood are given byf ( ) = ( ) 1e fX| (x1,..,xn| ) =n i=1 xie xi!.Dropping proportionality constants that do not depend on , the posterior distribution of givenX1=x1,..,Xn=xnis thenf |X( |x1,..,xn) fX| (x1,..,xn| )f ( ) n i=1( xie ) 1e = s+ 1e (n+ ) ,20-3wheres=x1+..+xn. This is proportional to the PDF of the Gamma(s+ ,n+ )distribution, so the posterior distribution of must be Gamma(s+ ,n+ ).

7 As the prior and posterior are both Gamma distributions, the Gamma distribution is aconjugate prior for in the Poisson Point estimates and credible intervalsTo the Bayesian statistician, the posterior distribution is the complete answer to the question:What is the value of ? In many applications, though, we would still like to have a singleestimate , as well as an interval describing our uncertainty about .Theposterior meanandposterior modeare the mean and mode of the posteriordistribution of ; both of these are commonly used as a Bayesian estimate for . A100(1 )% Bayesian credible intervalis an intervalIsuch that the posterior probabilityP[ I|X] = 1 , and is the Bayesian analogue to a frequentist confidence interval. Onecommon choice forIis simply the interval [ ( /2), (1 /2)] where ( /2)and (1 /2)are the /2 and 1 /2 quantiles of the posterior distribution of . Note that the interpretationof a Bayesian credible interval is different from the interpretation of a frequentist confidenceinterval in the Bayesian framework, the parameter is modeled as random, and 1 isthe probability that this random parameter belongs to an interval that is fixed conditionalon the observed Example , the posterior distribution ofPis Beta(s+ ,n s+ ).

8 The posterior mean is then (s+ )/(n+2 ), and the posterior mode is (s+ 1)/(n+2 2).Both of these may be taken as a point estimate pforp. The interval from the to quantile of the Beta(s+ ,n s+ ) distribution forms a 90% Bayesian credible Example , the posterior distribution of is Gamma(s+ ,n+ ).The posterior mean and mode are then (s+ )/(n+ ) and (s+ 1)/(n+ ), and eithermay be used as a point estimate for . The interval from the to the quantile ofthe Gamma(s+ ,n+ ) distribution forms a 90% Bayesian credible interval for .20-4


Related search queries