Example: bachelor of science

On the Markov Chain Central Limit Theorem - Statistics

On the Markov Chain Central Limit TheoremGalin L. JonesSchool of StatisticsUniversity of MinnesotaMinneapolis, MN, goal of this paper is to describe conditions which guarantee a Central Limit Theorem forfunctionals of general state space Markov chains. This is done with a view towards Markovchain Monte Carlo settings and hence the focus is on the connections between drift and mixingconditions and their implications. In particular, we consider three commonly cited Central limittheorems and discuss their relationship to classical results for mixing processes. Several motivat-ing examples are given which range from toy one-dimensionalsettings to complicated settingsencountered in Markov Chain Monte IntroductionLetX={Xi:i= 0,1,2,..}be a Harris ergodic Markov Chain on a general spaceXwithinvariant probability distribution having supportX. Letfbe a Borel function and define fn:=n 1 ni=1f(Xi) and E f:= Xf(x) (dx).

On the Markov Chain Central Limit Theorem Galin L. Jones School of Statistics University of Minnesota Minneapolis, MN, USA galin@stat.umn.edu Abstract The goal of this paper is to describe conditions which guarantee a central limit theorem for functionals of general state space Markov chains. This is done with a view towards Markov

Tags:

  Chain, Central, Limits, Theorem, Markov, Markov chain, The markov chain central limit theorem

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of On the Markov Chain Central Limit Theorem - Statistics

1 On the Markov Chain Central Limit TheoremGalin L. JonesSchool of StatisticsUniversity of MinnesotaMinneapolis, MN, goal of this paper is to describe conditions which guarantee a Central Limit Theorem forfunctionals of general state space Markov chains. This is done with a view towards Markovchain Monte Carlo settings and hence the focus is on the connections between drift and mixingconditions and their implications. In particular, we consider three commonly cited Central limittheorems and discuss their relationship to classical results for mixing processes. Several motivat-ing examples are given which range from toy one-dimensionalsettings to complicated settingsencountered in Markov Chain Monte IntroductionLetX={Xi:i= 0,1,2,..}be a Harris ergodic Markov Chain on a general spaceXwithinvariant probability distribution having supportX. Letfbe a Borel function and define fn:=n 1 ni=1f(Xi) and E f:= Xf(x) (dx).

2 When E |f|< the ergodic Theorem guarantees that fn E fwith probability 1 asn . The main goal here is to describe conditions onXandfunder which a Central Limit Theorem (CLT) holds for fn; that is, n( fn E f)d N(0, 2f)(1)asn where 2f:= var {f(X0)}+ 2 i=1cov {f(X0),f(Xi)}< . Although all of the resultspresented in this paper hold in general, the primary motivation is found in Markov Chain MonteCarlo (MCMC) settings where the existence of a CLT is an extremely important practical is high dimensional or known only up to a normalizing constant but the value of E fisrequired. IfXcan be simulated then fnis a natural estimate of E f. The existence of a CLTthen allows one to estimate 2fin order to decide if fnis a good estimate of E f. (Estimation of 2fis challenging and requires specialized techniques that will not be considered further here; seeJones et al. (2004) and Geyer (1992) for an introduction.) Thus the existence of a CLT is crucialto sensible implementation of MCMC; see Jones and Hobert (2001) for more on this point of following simple example illustrates one of the situations common in MCMC a simple hard-shell (also known as hard-core) model.

3 SupposeX={1,..,n1} {1,..,n2} Z2. Aproperconfiguration onXconsists of coloring each point either black or whitein such a way that no two adjacent points are white. LetXdenote the set of all proper configurationsonX,NX(n1,n2) be the total number of proper configurations and be the uniform distributiononXso that each proper configuration is equally likely. Supposeour goal is to calculate the typicalnumber of white points in a proper configuration; that is, ifW(x) is the number of white points inx Xthen we want the value ofE W= x Xw(x)NX(n1,n2).Ifn1andn2are even moderately large then we will have to resort to an approximation to E the following Markov Chain onX. Fixp (0,1) and setX0=x0wherex0 Xisan arbitrary proper configuration. Randomly choose a point (x,y) Xand independently drawU Uniform(0,1). Ifu pand all of the adjacent points are black then color (x,y) white leaving allother points alone. Otherwise, color (x,y) black and leave all other points alone.

4 Call the resultingconfigurationX1. Continuing in this fashion yields a Harris ergodic Markov Chain {X0,X1,X2,..}having as its invariant distribution. It is now a simple matter to estimate E Wwith wn. Also,sinceXis finite (albeit potentially large) it is well known thatXwill converge exponentially fast to which implies that a CLT holds for the publication of the influential book by Meyn andTweedie (1993) the use of driftand minorization conditions has become a popular method forestablishing the existence of a without this constructive methodology it is difficultto envision how one would deal withcomplicated situations encountered in MCMC. In turn, this has led much of the recent work ongeneral state space Markov chains to focus on the implications of drift and minorization. Anotheroutcome of this approach is that classical results in mixingprocesses have been somewhat example, Nummelin (2002) and Roberts and Rosenthal (2004) recently provided nice reviews ofMarkov Chain theory and its connection to MCMC.

5 In particular, both articles contain a review ofCLTs for Markov chains but neither contains any substantivediscussion of the results from mixingprocesses. On the other hand, work on mixing processes rarely discusses their applicability tothe important Markov Chain setting outside of the occasional discrete state space example. Forexample, Bradley (1999) provided a recommended review of CLTs for mixing processes but madeno mention of their connections with Markov chains. Also, Robert (1995) gave a brief discussion ofthe implication of mixing conditions for Markov Chain CLTs but failed to connect them to the useof drift conditions. Thus one of the main goals of this article is to consider the connections betweendrift and minorization and mixing conditions and their implications for the CLT for general statespace Markov Markov Chains and ExamplesLetP(x,dy) be a Markov transition kernel on a general space (X,B(X)) and write the associateddiscrete time Markov Chain asX={Xi:i= 0,1,2.}

6 }. Forn N:={1,2,3,..}, letPn(x,dy)denote then-step Markov transition kernel corresponding toP. Then fori N,x Xand ameasurable setA,Pn(x,A) = Pr (Xn+i A|Xi=x). Letf:X Rbe a Borel function and definePf(x) := f(y)P(x,dy) and f(x) :=Pf(x) f(x). Always,Xwill be assumed to be Harrisergodic, that is, aperiodic, -irreducible and positive Harris recurrent; for definitions see Meyn andTweedie (1993) or Nummelin (1984). These assumptions are more than enough to guarantee astrong form of convergence: for every initial probability measure ( ) onB(X)kPn( , ) ( )k 0 asn (2)wherePn( ,A) := XPn(x,A) (dx) andk kis the total variation norm. Throughout we will beconcerned with the rate of this convergence. LetM(x) be a nonnegative function and (n) be anonnegative decreasing function onZ+such thatkPn(x, ) ( )k M(x) (n).(3)WhenXisgeometrically ergodic(3) holds with (n) =tnfor somet < ergodicitymeansMis bounded and (n) =tnfor somet < ergodicity of order mwherem 0corresponds to (n) =n (3) directly may be difficult whenXis a general space.

7 However, some constructivemethods are given in the following brief discussion; the interested reader should consult Jarner andRoberts (2002) and Meyn and Tweedie (1993) for a more complete introduction to these conditionholds on a setCif there exists a probability measureQonB(X), apositive integern0and an >0 such thatPn0(x,A) Q(A) x C , A B(X).(4)In this case,Cis said to besmall. If (4) holds withC=XthenXis uniformly ergodic and, as iswell-known,kPn(x, ) ( )k (1 ) n/n0 .Uniformly ergodic Markov chains are rarely encountered in MCMC unlessXis finite or ergodicity may be established via the following drift condition: Suppose that for afunctionV:X [1, ) there exist constantsd >0,b < such that V(x) dV(x) +bI(x C)x X(5)whereCis a small set andIis the usual indicator ergodicity may be established via a slightly different drift condition: Suppose thatfor a functionV:X [1, ) there exist constantsd >0,b < and 0 <1 such that V(x) d[V(x)] +bI(x C)x X(6)whereCis a small set.]]

8 Jarner and Roberts (2002) show that (6) implies thatXis polynomiallyergodic of degree /(1 ). Douc et al. (2004) have recently generalized this drift condition to othersubgeometric (slower than geometric) rates of of the drift conditions (5) or (6) imply that in (3) we can takeM(x) V(x).Moreover, Theorem in Meyn and Tweedie (1993) shows that if (5) holds then E V < .Since geometric ergodicity is equivalent to (5) (Meyn and Tweedie, 1993, Chapter 16) we concludethat geometrically (and uniformly) ergodic Markov chains satisfy (3) with E M < . On the otherhand, the polynomial drift (6) only seems to imply that E V < where <1. Thus, when (6)holds, to ensure that E M < we will have to show that E V < .Beyond establishing a rate of convergence, drift conditions also immediately imply the existenceof a CLT for certain a Harris ergodic Markov Chain onXhaving stationary distribution .Supposef:X Rand assume that one of the following conditions hold:1.

9 The drift condition(5)holds andf2(x) V(x)for allx The drift condition(6)holds and|f(x)| V(x) + 1for allx Xwhere1 1issuch that E V2 < .Then 2f [0, )and if 2f>0then for any initial distribution n( fn E f)d N(0, 2f)asn . first part of the Theorem is from Meyn and Tweedie (1993, Theorem ) whilethe second part is due to Jarner and Roberts (2002, Theorem ). and Meyn (2003) investigate the rate of convergence in the CLT when thedrift condition (5) has been a substantial amount of effort devoted to establishing drift and minorizationconditions in MCMC settings. For example, Hobert and Geyer (1998), Jones and Hobert (2004),Marchev and Hobert (2004), Robert (1995), Roberts and Polson (1994), Rosenthal (1995, 1996) andTierney (1994) examined Gibbs samplers while Christensen et al. (2001), Douc et al. (2004), Fortand Moulines (2000), Fort and Moulines (2003), Geyer (1999), Jarner and Hansen (2000), Jarnerand Roberts (2002), Meyn and Tweedie (1994), and Mengersen and Tweedie (1996) considered4 Metropolis-Hastings-Green (MHG) algorithms.]

10 Also, Mira and Tierney (2002) and Roberts andRosenthal (1999) worked with slice the next section three simple examples are presented in order to give the reader a tasteof using these results in specific models and to demonstrate the application of Theorem 1. Moresubstantial examples will be considered in Section ExamplesExample1 continued. SinceXis finite it is easy to see that (4) holds withC=Xand hence theMarkov Chain described in Example 1 is uniformly ergodic. Ofcourse, ifn1andn2are reasonablylarge may be too small to be onX=Zsuch that ifx 1 and 0< <1 thenP(x,x+ 1) =P( x, x 1) = , P(x,0) =P( x,0) = 1 ,P(0,1) =P(0, 1) = Chain is Harris ergodic and has stationary distribution given by (0) = (1 )/(2 ) and forx 1 (x) = ( x) = (0) x Appendix A the drift condition (5) is verified withV(x) =a|x|fora >1 satisfyinga <1 and(a 1)a+ 1 <0 andC={0}. Hence a CLT holds for fniff2(x) a|x| x and Roberts (2002) and Tuomimem and Tweedie (1994) consider the followingexample and establish a polynomial rate of convergence.


Related search queries