Example: air traffic controller

2.1 Markov Chains - Georgia Institute of Technology

University of ChicagoAutumn 2003CS37101-1 Markov chain Monte Carlo MethodsLecture 2: October 7, 2003 Markov Chains , Coupling, Stationary DistributionEric Markov ChainsIn this lecture, we will introduce Markov Chains and show a potential algorithmic use ofMarkov Chains for sampling from complex a finite state space , we say a sequence of random variables (Xt) on is aMarkovchainif the sequence is Markovian in the following sense, for allt, allx0,..,xt,y , werequirePr(Xt+1=y|X0=x0,X1=x1,..,Xt=xt) = Pr(Xt+1=y|Xt=xt).We consider transitions which are independent of the time, known as time-homogeneous,and denote the transition matrix asP(x,y) = Pr(Xt+1=y|Xt=x).

CS37101-1 Markov Chain Monte Carlo Methods Lecture 2: October 7, 2003 Markov Chains, Coupling, Stationary Distribution Eric Vigoda 2.1 Markov Chains In this lecture, we will introduce Markov chains and show a potential algorithmic use of Markov chains for sampling from complex distributions.

Tags:

  Chain, Markov, Markov chain

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of 2.1 Markov Chains - Georgia Institute of Technology

1 University of ChicagoAutumn 2003CS37101-1 Markov chain Monte Carlo MethodsLecture 2: October 7, 2003 Markov Chains , Coupling, Stationary DistributionEric Markov ChainsIn this lecture, we will introduce Markov Chains and show a potential algorithmic use ofMarkov Chains for sampling from complex a finite state space , we say a sequence of random variables (Xt) on is aMarkovchainif the sequence is Markovian in the following sense, for allt, allx0,..,xt,y , werequirePr(Xt+1=y|X0=x0,X1=x1,..,Xt=xt) = Pr(Xt+1=y|Xt=xt).We consider transitions which are independent of the time, known as time-homogeneous,and denote the transition matrix asP(x,y) = Pr(Xt+1=y|Xt=x).

2 Thet-step distribution is defined in the natural way,Pt(x,y) ={P(x,y)t= 1 z P(x,z)Pt 1(z,y)t>1We will study the class of ergodic Markov Chains , which have a unique stationary ( ,limiting) distribution and thus will be useful from an algorithmic perspective. We say adistribution is astationary distributionif it is invariant with respect to the transitionmatrix, ,for ally , (y) = x (x)P(x,y).( )A Markov chain is calledergodicif:there existstsuch that for allx,y ,Pt(x,y)> finite Markov Chains the following pair of conditions are equivalent to ergodicity:2-1 Lecture2:October7,20032-2 Irreducible: For allx,y , there existst=t(x,y) such thatPt(x,y)>0; Aperiodic: For allx , gcd{t:Pt(x,x)>0}= Markov Chains are useful algorithmic tools in that, regardless of their initial state,they eventually reach a unique stationary distribution.}

3 We can aim to design (approximate)samplers by designing Markov Chains with appropriate stationary distributions. The fol-lowing theorem, originally proved by Doeblin [2], details the essential property of ergodicMarkov a finite ergodic Markov chain , there exists a unique stationary distribu-tion such thatfor allx,y ,limt Pt(x,y) = (y).Before proving the theorem, let us make a few remarks about its algorithmic general, it is difficult to determine this unique stationary distribution. However, thereis a large class of Chains where it is trivial to verify the stationary distribution. A Markovchain is calledreversibleif there exists a distribution such thatfor allx,y , (x)P(x,y) = (y)P(y,x).

4 It is easy to check that such a is a stationary distribution. WhenPis symmetric, it followsthat the stationary distribution is uniform over .The following example highlights the potential usefulness of reversible Markov Chains . Fora graphG= (V,E), let denote the set of matchings ofG. We define a Markov chain on whose transitionsXt Xt+1are as follows. FromXt , Choose an edgeeuniformly at random fromE. LetX ={Xt eife6 XtXt\eife Xt IfX , then setXt+1=X with probability 1/2; Otherwise setXt+1= that the Markov chain is aperiodic (sinceP(M,M) 1/2 for allM ) and irre-ducible (via the empty set) with symmetric transition probabilities.}

5 Therefore, the uniquestationary distribution is uniform over all matchings ofG. If we can bound the asymptoticrate of convergence to the stationary distribution, then we have a simple algorithm to gen-erate an approximately random matching. Simply start at an arbitrary matching ( , theempty set) and follow the transitions of the Markov chain until we are sufficiently close tothe stationary distribution. For now we focus on proving the theorem. In later lectures weexplore techniques for bounding the convergence :October7, Coupling TechniqueWe will prove the theorem using thecouplingtechnique. We begin by describing coupling forgeneral distributions before considering its application to Markov Chains .

6 For distributions , on a finite set , a distribution on is acouplingif:For allx , y (x,y) = (x),and( )For ally , x (x,y) = (y).( )In other words, is a joint distribution whose marginal distributions are the provides a convenient method to bound the variation distance between a pair ofdistributions. For distributions and on , their variation distance is defined asdTV( , ) =12 z | (z) (z)|= maxS (S) (S).It turns out there always exists an optimal coupling which exactly captures the variationdistance. The following lemma is known as the coupling lemma, and was first detailed byAldous [1].Lemma a pair of distributions , on a finite.

7 (a)For a coupling of and , let(X,Y) ( ,(X,Y)is a random variable chosenfrom the distribution ). Then,dTV( , ) Pr(X6=Y).(b)There always exists a coupling where, for(X,Y) ,dTV( , ) = Pr(X6=Y).Proof of Lemma, part (a):Since is a valid coupling, for anyz , we know that (z,z) min{ (z), (z)}. Summingover allz, this is exactly the probability thatXandYare equal, ,Pr(X=Y) = z (z,z) z min{ (z), (z)}.Lecture2:October7,20032-4 Therefore,Pr(X6=Y) 1 z min{ (z), (z)}= z (z) min{ (z), (z)}= z : (z) (z) (z) (z)= maxS (S) (S)=dTV( , ).This completes the proof of part (a) of the of Lemma, part (b):For allz , let (z,z) = min{ (z), (z)}.

8 This ensuresdTV( , ) = Pr(X6=Y). Now we need to complete the construction of in avalid way. This requires defining the off-diagonal terms in a way that guarantees has thecorrect marginal ,z ,y6=z, let (y,z) =( (y) (y,y))( (z) (z,z))1 x (x,x)It is straightforward to verify that satisfies ( ) and ( ), thus it is a valid will consider couplings for Markov chain . Consider a pair of Markov Chains (Xt) and (Yt)on with transition matricesPXandPY,respectively. Typically in our applications, theMarkov Chains are identical:PX=PY. The Markov chain (X t,Y t) on is a (Markovian)coupling ifFor alla,b,c ,Pr(X t+1=c|X t=a,Y t=b) =PX(a,c),andFor alla,b,c ,Pr(Y t+1=c|X t=a,Y t=b) =PY(b,c).

9 In other words, if we simply observe the first coordinate, it behaves likePXand similarly thesecond coordinate acts according toPY. This is a more restrictive form of coupling then isnecessary. In general, we might only have Pr(X t+1=c|X t=a) =PX(a,c) and similarly forthe second coordinate. Such a coupling is called a non-Markovian coupling, since the jointprocess is not itself a Markov chain on the product space. Most applications of couplingconstruct Markovian couplings, this simplifies the :October7,20032-5 For such a Markovian coupling (X t,Y t) of (Xt) and (Yt) we then havedTV(Pt(X0, ),Pt(Y0, )) Pr(X t6=Y t|X 0=X0,Y 0=Y0).

10 ChoosingY0from the stationary distribution , we haveYtis distributed according to forall (since is invariant), which impliesdTV(Pt(X0, ), ) Pr(X t6=Y t|X 0=X0,Y 0 ).This shows how we can use coupling to bound the distance from Stationary DistributionWe are now prepared to prove the of Theorem:Create two copies of the Markov chain , denoted by (Xt) and (Yt), whereX0andY0arearbitrary states of . We will create a coupling for these Chains in the following way. From(Xt,Yt), chooseXt+1according to the transition matrixP. IfYt=Xt, setYt+1=Xt+1,otherwise chooseYt+1according toP, independent of the choice ergodicity, we know that there existst such that for allx,y ,Pt (x,y) > , for allX0,Y0 ,Pr(Xt 6=Yt |X0,Y0) 1.


Related search queries