Chapter 6 Importance sampling

Chapter 6 Importance The basicsTo movtivate our discussion consider the following situation. We want to use Monte Carlo tocompute =E[X]. There is an eventEsuch thatP(E) is small butXis small outside we run the usual Monte Carlo algorithm the vast majorityof our samples ofXwill beoutsideE. But outside ofE,Xis close to zero. Only rarely will we get a sample inEwhereXis not of the time we think of our problem as trying to compute the mean of some randomvariableX. For Importance sampling we need a little more structure. Weassume that therandom variable we want to compute the mean of is of the formf(~X) where~Xis a randomvector.

We will assume that the joint distribution of~Xis absolutely continous and letp(~x) bethe density. (Everything we will do also works for the case where the random vector~Xisdiscrete.) So we focus on computingEf(~X) =Zf(~x)p(~x)dx( )Sometimes people restrict the region of integration to somesubsetDofRd. (Owen does this.)We can (and will) instead just takep(x) = 0 outside ofDand take the region of integrationto idea of Importance sampling is to rewrite the mean as follows. Letq(x) be anotherprobability density onRdsuch thatq(x) = 0 impliesf(x)p(x) = 0. Then =Zf(x)p(x)dx=Zf(x)p(x)q(x)q(x)dx( )1We can write the last expression asEq f(~X)p(~X)q(~X) ( )whereEqis the expectation for a probability measure for which the distribution of~Xisq(x)rather thanp(x).

The densityp(x) is called thenominal or target distribution,q(x) theimportance or proposal distributionandp(x)/q(x) thelikelihood ratio. Note that we assumedthatf(x)p(x) = 0 wheneverq(x) = 0. Note that we do not have to havep(x) = 0 for allxwhereq(x) = Importance sampling algorithm is then as follows. Generate samples~X1, ,~Xnaccording to the distributionq(x). Then the estimator for is q=1nnXi=1f(~Xi)p(~Xi)q(~Xi)( )Of course this is doable only iff(x)p(x)/q(x) is 1 qis an unbaised estimator of , ,Eq q= . Its variance is 2q/nwhere 2q=Zf(x)2p(x)2q(x)dx 2=Z(f(x)p(x) q(x))2q(x)dx( ) can think of this Importance sampling Monte Carlo algorithm as just ordinary MonteCarlo applied toEq[f(~X)p(~X)/q(~X)].

So a natural estimator for the variance is 2q=1nnXi=1 f(~Xi)p(~Xi)q(~Xi) q 2( )What is the optimal choice of the Importance distributionq(x)? Looking at the theorem wesee that if we letq(x) =f(x)p(x)/ , then the variance will be zero. This is a legitimateprobability density iff(x) 0. Of course we cannot really do this since it would requireknowing . But this gives us a strategy. We would like to find a densityq(x) which is close tobeing proportional tof(x)p(x).What iff(x) is not positive? Then we will show that the variance is minimized by takingq(x)to be proportional to|f(x)|p(x).

Theorem 2 Letq(x) =|f(x)|p(x)/cwherecis the constant that makes this a probabilitydensity. Then for any probability densityq(x)we have q qProof:Note thatc=R|f(x)|q(x)dx. q 2=Zf(x)2p(x)2q(x)dx( )=cZ|f(x)|p(x)dx( )= Z|f(x)|p(x)dx 2( )= Z|f(x)|p(x)q(x)q(x)dx!2( ) Zf(x)2p(x)2q(x)2q(x)dx( )=Zf(x)2p(x)2q(x)dx( )= q 2( )where we have used the Cauchy Schwarz inequality with respect to the probaility measureq(x)dx. (One factor is the function 1.) we do not knowRf(x)p(x)dx, we probably do not knowR|f(x)|p(x)dxeither. So theoptimal sampling density given in the theorem is not realizable. But again, it gives us astrategy.

We want a sampling density which is approximatelyproportional to|f(x)|p(x).Big warning:Even if the originalf(~X) has finite variance, there is no guarantee that qwillbe finite. Discuss heavy tails and light the sampling distribution should be chosen depends very much on the particularproblem. Nonetheless there are some general ideas which we illustrate with some the functionf(x) is unbounded then ordinary Monte Carlo may have a large variance,possibly even infinite. We may be able to use Importance sampling to turn a problem with anunbounded random variable into a problem with a bounded random want to compute the integralI=Z10x e xdx( )where 0< <1.

So the integral is finite, but the integrand is unbounded. We takef(x) =x e xand the nominal distribution is the uniform distribution on[0,1]. Note thatfwill have infinite variance if 1 take the sampling distribution to beq(x) =11 x ( )on [0,1]. This can be sampled using inversion. We havef(x)p(x)q(x)=e x(1 )( )So we do a Monte Carlo simulation ofEq[e X(1 )] whereXhas distributionq. Note thate X(1 ) is a bounded random second general idea we illustrate involves rare-event simulation. This refers to thesituation where you want to compute the probabily of an eventwhen that probability is :LetZhave a standard normal distribution.

We want to computeP(Z 4). Wecould do this by a Monte Carlo simulation. We generate a bunchof samples ofZand counthow many satisfyZ 4. The problem is that there won t be very many (probably zero). Ifp=P(Z 4), then the variance of 1Z 4isp(1 p) p. So the error withnsamples is oforderqp/n. So this is small, but it will be small compared toponly ifnis nominal distribution isp(x) =1 2 exp( 12x2)( )We take the sampling distribution to beq(x) =(e (x 4),ifx 4,0,ifx <4,( )The sampling distribution is an exponential shifted to the right by 4. In other words, ifYhasan exponential distribution with mean 1, thenY+ 4 has the distributionq.)

The probabilitywe want to compute isp=Z1x 4p(x)dx( )=Z1x 4p(x)q(x)q(x)dx( )The likehood ratio isw(x) =p(x)q(x)=1 2 exp( 12x2+x 4)( )On [4, ) this function is decreasing. So its maximum is at 4 where itsvalue is exp( 8)/ 2 which is really small. The variance is no bigger than the second moment which is bounded bythis number squared. This is exp( 16)/2 . Compare this with the variance of ordinary MCwhich saw was of the order ofpwhich is on the order of exp( 8). So the decrease in thevariance is return to the network example, following Kroese s reviewarticle. LetU1, U2, , U5be independent and uniform on [0,1].]

LetTibeUimultiplied by the approriateconstant to give the desired distribution for the timesTi. We want to estimate the mean off(U1, , U5) wherefis the minimum time. The nominal density isp(u) = 1 on [0,1]5. Forour sampling density we takeg(u) =5Yi=1 iu i 1i( )where the iare parameters. (This is a special case of the beta distribution.) Note that i= 1gives the nominal distributionp. There is no obvious choice for the i. Kroese finds that with = ( , , , , ) the variance is reduced by roughly a factor of have discussed Importance sampling in the setting where we want to estimateE[f(~X)]and~Xis jointly absolutely continuous.

Chapter 6 Importance sampling

Tags:

Information

Transcription of Chapter 6 Importance sampling

Related search queries

Chapter 6 Importance sampling

Tags:

Information

Documents from same domain

Related documents

Related search queries