Transcription of Tutorial on quasi-Monte Carlo methods
1 Tutorial onquasi- monte Carlo methodsJosef DickSchool of Mathematics and Statistics, UNSW, Sydney, MCMC, MC, QMCR oughly speaking: Markov chain monte Carlo and quasi - monte Carlo are fordifferent types of problems; If you have a problem where monte Carlo does not work, thenchances are quasi - monte Carlo will not work as well; If monte Carlo works, but you want a faster method try(randomized) quasi - monte Carlo (some tweaking might benecessary). quasi - monte Carlo is an "experimental design" approach toMonte Carlo simulation;In this talk we shall discuss how quasi - monte Carlo can be faster thanMonte Carlo under certain ,REPRODUCING KERNELHILBERTSPACES AND WORST-CASE ERRORQUASI-MONTECARLO POINT SETSRANDOMIZATIONSWEIGHTED FUNCTION SPACES ANDTRACTABILITYThe task is to approximate an integralIs(f) = [0,1]sf(z)dzfor some integrandfby some quadrature ruleQN,s(f) =1NN 1 n=0f(xn)at some sample pointsx0,..,xN 1 [0,1]s. [0,1]sf(x)dx 1NN 1 n=0f(xn)In other words:Area under curve = Volume of cube average function value.
2 Ifx0,..,xN 1 [0,1]sare chosen randomly monte Carlo Ifx0,..,xN 1 [0,1]schosen deterministically quasi -MonteCarloSmooth integrands Integrandf: [0,1] R; say continuously differentiable; We want to study the integration error: 10f(x)dx 1NN 1 n=0f(xn). Representation:f(x) =f(1) 1xf (t)dt=f(1) 101[0,t](x)f (t)dt,where1[0,t](x) ={1 ifx [0,t],0 error Substitute: 10f(x)dx= 10(f(1) 101[0,t](x)f (t)dt)dx=f(1) 10 101[0,t](x)f (t)dxdt,1NN 1 n=0f(xn)=1NN 1 n=0(f(1) 101[0,t](xn)f (t)dt)=f(1) 101NN 1 n=01[0,t](xn)f (t)dt. Integration error: 10f(x)dx 1NN 1 n=0f(xn) = 10(1NN 1 n=01[0,t](xn) 101[0,t](x)dx)f (t) discrepancyIntegration error: 10f(x)dx 1NN 1 n=0f(xn) = 10(1NN 1 n=01[0,t](xn) t)f (t) {x0,..,xN 1} [0,1]. Define thelocal discrepancyby P(t) =1NN 1 n=01[0,t](xn) t,t [0,1].Then 10f(x)dx 1NN 1 n=0f(xn) = 10 P(t)f (t) inequality 10f(x)dx 1NN 1 n=0f(xn) = 10 P(t)f (t)dt ( 10| P(t)|pdt)1/p( 10|f (t)|qdt)1/q= P Lp f Lq,1p+1q=1,where g Lp=( |g|p)1 of discrepancyLetP={x0.}}
3 ,xN 1} [0,1]. Recall the definition of the localdiscrepancy: P(t) =1NN 1 n=01[0,t](xn) t,t [0,1].Local discrepancy measures difference between uniform distributionand emperical distribution of quadrature pointsP={x0,..,xN 1}.This is the Kolmogorov-Smirnov test for the difference between theempirical distribution of{x0,..,xN 1}and the uniform spaceRepresentationf(x) =f(1) 1xf (t) inner product: f,g =f(1)g(1) + 10f (t)g (t) norm f = |f(1)|2+ 10|f (t)| space:H={f: [0,1] R:fabsolutely continuous and f < }.Worst-case errorTheworst-case erroris defined bye(H,P) =supf H, f 1 10f(x)dx 1NN 1 n=0f(xn) .Reproducing kernel Hilbert spaceRecall:f(y) =f(1) 1+ 10f (x)( 1[0,x](y))dxand f,g =f(1)g(1) + 10f (x)g (x) :Find set of functionsgy Hfor eachy [0,1]such that f,gy =f(y)for allf : gy(1) =1 for ally [0,1]; g y(x) = 1[0,x](y) ={ 1 ify x,0 otherwise; Makegcontinuous such thatg H;Reproducing kernel Hilbert spaceIt follows thatgy(x) =1+min{1 x,1 y}.
4 The functionK(x,y) :=gy(x) =1+min{1 x,1 y},x,y [0,1]is calledreproducing spaceHis called a reproducing kernel Hilbert space (withreproducing kernelK).Numerical integration in reproducing kernel HilbertspacesFunction representation: 10f(z)dz= 10 f,K( ,z) dz= f, 10K( ,z)dz 1NN 1 n=0f(xn)=1NN 1 n=0 f,K( ,xn) = f,1NN 1 n=0K( ,xn) Integration error 10f(z)dz 1NN 1 n=0f(xn) = f, 10K( ,z)dz 1NN 1 n=0K( ,xn) = f,h ,whereh(x) = 10K(x,z)dz 1NN 1 n=0K(x,xn).Worst-case error in reproducing kernel Hilbert spacesThuse(H,P) =supf H, f 1 10f(z)dz 1NN 1 n=0f(xn) =supf H, f =1| f,h |=supf H,f6=0 f f ,h = h,h h = h ,since the supremum is attained when choosingf=h h error in reproducing kernel Hilbert spacese2(H,P) = 10 10K(x,y)dxdy 2NN 1 n=0 10K(x,xn)dx+1N2N 1 n,m=0K(xn,xm)Numerical integration in higher dimensions Tensor product spaceHs=H H. Reproducing kernelK(x,y) =s i=1[1+min{1 xi,1 yi}],wherex= (x1,..,xs),y= (y1.)
5 ,ys) [0,1]s. Functionsf Hshave partial mixed derivatives up to order 1 ineach variable |u|f xu L2([0,1]s),whereu {1,..,s},xu= (xi)i uand|u|denotes the cardinalityofuand where |u|f xu(xu,1) =0 for allu {1,..,s}.Worst-case errorAgaine2(H,P) = [0,1]s [0,1]sK(x,y)dxdy 2NN 1 n=0 [0,1]sK(x,xn)dx+1N2N 1 n,m=0K(xn,xm)ande2(H,P) = [0,1]s P(x) sf x(x)dx,where P(x) =1NN 1 n=01[0,x](xn) s i= in higher dimensionsPoint setP={x0,..,xN 1} [0,1]s,t= (t1,..,ts) [0,1]s. Local discrepancy: P(t) =1NN 1 n=01[0,t](xn) s i=1ti,where[0,t] = si=1[0,ti].Koksma-Hlawka inequalityLetf: [0,1]s Rwith f =( [0,1]s s xf(x) qdx)1/q,and where |u|f xu(xu,1) =0 for allu {1,..,s}.Then [0,1]sf(x)dx 1NN 1 n=0f(xn) f P Lp,where1p+1q=1. Construct pointsP={x0,..,xN 1} [0,1]swith smalldiscrepancyLp(P) = P is often useful to consider different reproducing kernels yieldingdifferent worst-case errors and discrepancies. We will see furtherexamples of low-discrepancy sequences Lattice rules(Bilyk, Brauchart, Cools, D.
6 , Hellekalek, Hickernell,Hlawka, Joe, Keller, Korobov, Kritzer, Kuo, Larcher, L Ecuyer,Lemieux, Leobacher, Niederreiter, Nuyens, Pillichshammer,Sinescu, Sloan, Temlyakov, Wang, Wo zniakowski, ..) Digital nets and sequences(Baldeaux, Bierbrauer, Brauchart,Chen, D., Edel, Faure, Hellekalek, Hofer, Keller, Kritzer, Kuo,Larcher, Leobacher, Niederreiter, Owen, zbudak,Pillichshammer, Pirsic, Schmid, Skriganov, Sobol , Wang, Xing,Yue, ..) Hammersley-Halton sequences(Atanassov, De Clerck, Faure,Halton, Hammersley, Kritzer, Larcher, Lemieux, Pillichshammer,Pirsic, White, ..) Kronecker sequences(Beck, Hellekalek, Larcher, Niederreiter,Schoissengeier, ..)Lattice rulesLetN N, letg= (g1,..,gs) {1,..,N 1} the quadrature points asxn={ngN},forn=0,..,N 1,where{z}=z bzcforz R+ lattice rulesLattice rule withN=55 points and generating vectorg= (1,34).Fibonacci lattice rulesLattice rule withN=89 points and generating vectorg= (1,55).In comparison: Random point setRandom set of 64 points generated by Matlab using rulesHow to find generating vectorg?
7 Reproducing kernel:K(x,y) =s i=1(1+2 B ({xi yi})),where{xi yi}= (xi yi) bxi yicis the fractional part ofxi kernel Hilbert space of Fourier series:f(x) = h Zs f(h)e2 ih error:e2 (g)= 1+1NN 1 n=0s i=1(1+2 B ({ngiN})),whereB is the Bernoulli polynomial of order . For instanceB2(x) =x2 x+1 construction (Korobov,Sloan-Reztsov,Sloan-Kuo-Joe, Nuyens-Cools) Setg1=1. Ford=2,..,sassume that we have foundg2,..,gd 1. Thenfindgd {1,..,N 1}which minimizese (g1,..,gd 1,g)as afunction ofg, {1,..,N 1}e2 (g1,..,gd 1,g).Using fast Fourier transform (Nuyens, Cools), a good generatingvectorg {1,..,N 1}scan be found inO(sNlogN) sequenceRadical inverse function in baseb: Letn N0have basebexpansionn=n0+n1b+ +na 1ba 1. Set b(n) =n0b+n1b2+ +na 1ba [0,1].Hammersley-Halton sequence: LetP={p1,p2,..,}be the set of prime numbers in increasingorder, ,p2=3,p3=5,p4=7,.. Define quadrature pointsx0,x1,..byxn= ( p1(n), p2(n),.., ps(n))forn=0,1,2,..Hammersley-Halton sequenceHammerlsey-Halton point set with 64 nets Choose prime numberband finite fieldZb={0,1.}
8 ,b 1}oforderb. ChooseC1,..,Cs Zm mb. Letn=n0+n1b+ +nm 1bm 1and set~n= (n0,..,nm 1)> Zmb. Let~yn,i=Ci~nfor 1 i s,0 n<bm. For~yn,i= (yn,i,1,..,yn,i,m)> Zmbletxn,i=yn,i,1b+ +yn,i,mbm. Setxn= (xn,1,..,xn,s)for 0 n<bm.(t,m,s)-net propertyLetm,s 1 andb 2 be integers. A point setP={x0,..,xbm 1}is called a(t,m,s)-net in baseb, if for all integersd1,..,ds 0 withd1+ +ds=m tthe number of points in the elementary intervalss i=1[aibdi,ai+1bdi)where 0 ai<bdi, {x0,..,xbm 1} [0,1]sis a(t,m,s)-net, then localdiscrepancy function P(a1bd1,..,asbds)=0for all 0 ai<bdi,di 0,d1+ +ds=m point setThe first 64 points of a Sobol sequence which form a(0,6,2)-net inbase point setThe first 64 points of a Niederreiter-Xing point setThe first 1024 points of a Niederreiter-Xing sequenceLet 1,.., s Rbe linearly independent ({n 1},..,{n s})forn=0,1,2,..For instance, one can choose i= piwherepiis theith sequenceThe first 64 points of a Kronecker bounds It is known that for all the constructions above one hasLp(P) Cs(logN)c(s)N,for allN 1,1 p ,whereCs>0 andc(s) s.]
9 Lower bound: For all point setsPconsisting ofNpoints we haveLp(P) C s(logN)(s 1)/2 Nfor allN 1,1<p .In comparison: random point setE(L2(P)) =O( log logNN).For 1<p< , the exact order of convergence is known to be oforder(logN)(s 1) constructions of such points were achieved by Chen &Skriganov forp=2 and Skriganov for 1<p< .Great open problem: What is the correct order of convergence ofminP [0,1]s|P|=NL (P)?Randomized quasi - monte CarloIntroduce random element to deterministic point several advantages: yields an unbiased estimator; there is a statistical error estimate; better rate of convergence of the random-case error compared toworst-case error; performs similar to monte Carlo forL2functions but has betterrate of convergence for smooth functions;Randomly shifted lattice rulesLattice rules can be randomized using a random shift: Lattice point set:{ng/N},n=0,1,..,N 1. Shifted lattice rule: choose [0,1]suniformly distributed; thenthe shifted lattice point set is given by{ng/N+ },n=0,1.
10 ,N point setShifted lattice point setOwen s scrambling LetZb={0,1,..,b 1}and letx=x1b+x2b2+ ,x1,x2,.. Zb. Randomly choose permutations , x1, x1,x2,..:Zb Zb. Lety1= (x1)y2= x1(x2)y3= x1,x2(x3).. Sety=y1b+y2b2+ .Scrambled Sobol point setThe first 1024 points of a Sobol sequence (left) and a scrambledSobol sequence (right).Scrambled net varianceLete(R) = [0,1]sf(x)dx 1 Nbm 1 n=0f(yn).Then E(e(R)) =0 Var(e(R)) =O((logN)sN2 +1)for integrands with smoothness 0 on the dimensionAlthough the convergence rate is better for quasi - monte Carlo , thereis a stronger dependence on the dimension: monte Carlo : error=O(N 1/2); quasi - monte Carlo : error=O((logN)s 1N);Notice thatg(N) :=N 1(logN)s 1is an increasing function forN es ifsis large, says 30, thenN 1(logN)29increases forN integration error inHwith reproducing kernelK(x,y) =s i=1min{1 xi,1 yi}it is known thate2(H,P) (1 N(89)s)e2(H, ). Error can only decrease ifN>constant (98) function spaces (Sloan and Wo zniakowski)Study weighted function spaces: Introduce 1, 2.