Example: confidence

Introduction to Hidden Markov Models - Harvard University

Introduction to Hidden Markov ModelsAlperen DegirmenciThis document contains derivations and algorithms for im-plementing Hidden Markov Models . The content presentedhere is a collection of my notes and personal insights fromtwo seminal papers on HMMs by Rabiner in 1989 [2] andGhahramani in 2001 [1], and also from Kevin Murphy s book[3]. This is an excerpt from my project report for the Machine Learning class taught in Fall HIDDENMARKOVMODELS(HMMS)HMMs have been widely used in many applications, suchas speech recognition, activity recognition from video, genefinding, gesture tracking.

A hidden Markov model is a tool for representing prob-ability distributions over sequences of observations [1]. In this model, an observation X t at time tis produced by a stochastic process, but the state Z tof this process cannot be directly observed, i.e. it is hidden [2]. This hidden process is assumed to satisfy the Markov property, where ...

Tags:

  Model, Hidden, Ability, Markov, Prob, Hidden markov, Prob ability, Hidden markov model

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Introduction to Hidden Markov Models - Harvard University

1 Introduction to Hidden Markov ModelsAlperen DegirmenciThis document contains derivations and algorithms for im-plementing Hidden Markov Models . The content presentedhere is a collection of my notes and personal insights fromtwo seminal papers on HMMs by Rabiner in 1989 [2] andGhahramani in 2001 [1], and also from Kevin Murphy s book[3]. This is an excerpt from my project report for the Machine Learning class taught in Fall HIDDENMARKOVMODELS(HMMS)HMMs have been widely used in many applications, suchas speech recognition, activity recognition from video, genefinding, gesture tracking.

2 In this section, we will explain whatHMMs are, how they are used for machine learning, theiradvantages and disadvantages, and how we implemented ourown HMM DefinitionA Hidden Markov model is a tool for representing prob - ability distributions over sequences of observations [1]. Inthis model , an observationXtat timetis produced by astochastic process, but the stateZtof this process cannot bedirectly observed, it ishidden[2]. This Hidden processis assumed to satisfy the Markov property, where stateZtattimetdepends only on the previous state,Zt 1at timet is, in fact, called the first-order Markov model .

3 Thenth-order Markov model depends on thenprevious states. Fig. 1shows a Bayesian network representing the first-order HMM,where the Hidden states are shaded in should note that even though we talk about time to indicate that observations occur at discrete time steps , time could also refer to locations within a sequence [3].The joint distribution of a sequence of states and observa-tions for the first-order HMM can be written as,P(Z1:N,X1:N) =P(Z1)P(X1|Z1)N t=2P(Zt|Zt 1)P(Xt|Zt)(1)where the notationZ1:Nis used as a shorthand forZ1.

4 ,ZN. Notice that Eq. 1 can be also written as,P(X1:N,Z1:N) =P(Z1)N t=2P(Zt|Zt 1)N t=1P(Xt|Zt)(2)which is same as the expression given in the lecture are five elements that characterize a Hidden Markovmodel:The author is with the School of Engineering and This document is an excerptfrom a project report for the MIT Machine Learning class taught inFall Bayesian network representing a first-order HMM. The hiddenstates are shaded in ) Number of states in the model ,K:This is the numberof states that the underlying Hidden Markov process states often have some relation to the phenomena beingmodeled.

5 For example, if a HMM is being used for gesturerecognition, each state may be a different gesture, or a partof the gesture. States are represented as integers1,.., will encode the stateZtat timetas aK 1vector ofbinary numbers, where the only non-zero element is thek-thelement ( 1), corresponding to statek Kat timet. While this may seem contrived, it will later on help us inour computations. (Note that [2] usesNinstead ofK).2) Number of distinct observations, :Observations arerepresented as integers1,.., . We will encode the observa-tionXtat timetas a 1vector of binary numbers, wherethe only non-zero element is thel-th element ( 1),corresponding to statel at timet.

6 While this may seemcontrived, it will later on help us in our computations. (Notethat [2] usesMinstead of , and [1] usesD. We decidedto use since this agrees with the lecture notes).3) State transition model ,A:Also called the state transi-tion probability distribution [2] or the transition matrix [3],this is aK Kmatrix whose elementsAijdescribe theprobability of transitioning from stateZt 1,itoZt,jin onetime step wherei,j {1,..,K}. This can be written as,Aij=P(Zt,j= 1|Zt 1,i= 1).(3)Each row ofAsums to 1, jAij= 1, and therefore it iscalled a stochastic matrix.

7 If any state can reach any otherstate in a single step (fully-connected), thenAij>0for121- 1- 1A1123A22A33A12A21A23A32(a)(b)Fig. 2. A state transition diagram for (a) a 2-state, and (b) a 3-state ergodicMarkov chain. For a chain to be ergodic, any state should be reachable fromany other state in a finite amount of 2014 Alperen Degirmencialli,j; otherwiseAwill have some zero-valued 2 shows two state transition diagrams for a 2-state and3-state first-order Markov chain. For these diagrams, the statetransition Models are,A(a)=[1 1 ],A(b)= A11A120A21A22A230A32A33.

8 (4)The conditional probability can be written asP(Zt|Zt 1) =K i=1K j=1 AZt 1,iZt,jij.(5)Taking the logarithm, we can write this aslogP(Zt|Zt 1) =K i=1K j=1Zt 1,iZt,jlogAij(6)=Z>tlog (A)Zt 1.(7)4) Observation model ,B:Also called the emission prob -abilities,Bis a Kmatrix whose elementsBkjdescribethe probability of making observationXt,kgiven stateZt, can be written as,Bkj=P(Xt=k|Zt=j).(8)The conditional probability can be written asP(Xt|Zt) =K j=1 k=1 BZt,jXt,kkj.(9)Taking the logarithm, we can write this aslogP(Xt|Zt) =K j=1 k=1Zt,jXt,klogBkj(10)=X>tlog (B)Zt.

9 (11)5) Initial state distribution, :This is aK 1vectorof probabilities i=P(Z1i=1). The conditional probabilitycan be written as,P(Z1| ) =K i=1 Z1ii.(12)Given these five parameters presented above, an HMMcan be completely specified. In literature, this often getsabbreviated as = (A,B, ).(13)B. Three Problems of InterestIn [2] Rabiner states that for the HMM to be useful inreal-world applications, the following three problems mustbe solved: Problem 1:Given observationsX1,..,XNand amodel = (A,B, ), how do we efficiently computeP(X1:N| ), the probability of the observations giventhe model ?

10 This is a part of theexact inferenceproblempresented in the lecture notes, and can be solved usingforward Zt-1,Zt Zt,Zt+1 Xt,ZtZt+1Xt ZtZt-1 ,Zt Zt Zt ,Zt+1 Zt+1Zt ,Zt+1 ZtXt ,Zt Xt Xt ,Zt ZtZt ,Zt+1 Zt Zt-1 ,Zt Zt+1 Zt-1 ,Zt Zt Xt ,Zt XtXt ,ZtFig. graph for a slice of the HMM at timet. Problem 2:Given observationsX1,..,XNand themodel , how do we find the correct Hidden statesequenceZ1,..,ZNthat best explains the observa-tions? This corresponds to finding the most probablesequence of Hidden states from the lecture notes, andcan be solved using the Viterbi algorithm.


Related search queries