Essay on Hidden Markov ModelsSargur N SriharisriharicedarbuffaloeduMachine Learning

Submitted By atena90
Words: 3628
Pages: 15

Hidden Markov Models
Sargur N. Srihari srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/CSE574/index.html Machine Learning: CSE 574

0

HMM Topics
1.
2.
3.
4.
5.
6.
7.

What is an HMM?
State-space Representation
HMM Parameters
Generative View of HMM
Determining HMM Parameters Using EM
Forward-Backward or α−β algorithm
HMM Implementation Issues:
a)
b)
c)
d)
e)

Length of Sequence
Predictive Distribution
Sum-Product Algorithm
Scaling Factors
Viterbi Algorithm

Machine Learning: CSE 574

1

1. What is an HMM?



Ubiquitous tool for modeling time series data
Used in





Almost all speech recognition systems
Computational molecular biology



Group amino acid sequences into proteins

Handwritten word recognition




It is a tool for representing probability distributions over sequences of observations
HMM gets its name from two defining properties:



Example: z are phoneme sequences x are acoustic observations




Observation xt at time t was generated by some process whose state zt is hidden from the observer
Assumes that state at zt is dependent only on state zt-1 and independent of all prior states (First order)

Machine Learning: CSE 574

2

Graphical Model of HMM

• Has the graphical model shown below and latent variables are discrete

• Joint distribution has the form:
⎤ N
⎡ N p ( x1 ,..x N , z1 ,..z n ) = p ( z1 ) ⎢∏ p ( z n | z n −1 )⎥∏ p (xn | z n )
⎦ n =1
⎣ n=2
Machine Learning: CSE 574

3

HMM Viewed as Mixture






A single time slice corresponds to a mixture distribution with component densities p(x|z)



Each state of discrete variable z represents a different component

An extension of mixture model



Choice of mixture component depends on choice of mixture component for previous distribution

Latent variables are multinomial variables zn



That describe component responsible for generating xn

Can use one-of-K coding scheme

Machine Learning: CSE 574

4

2. State-Space Representation

• Probability distribution of zn


State k 1 2 . K of zn znk 0 1 . 0

depends on state of previous latent variable zn-1 through probability distribution p(zn|zn-1)
State
One-of K coding

of zn-1

• Since latent variables are Kdimensional binary vectors

Ajk=p(znk=1|zn-1,j=1)

ΣkAjk=1

A matrix • These are known as Transition


Probabilities zn-1 K(K-1) independent parameters
Machine Learning: CSE 574

j
1 2 . K zn-1,j 1 0 . 0 zn 1 2 …. K
1
2

K

Ajk
5

Transition Probabilities
Example with 3-state latent variable zn=1 zn=2 zn1=1 zn1=0 zn2=0 zn2=1 zn3=0 zn3=0

zn=3 zn1=0 zn2=0 zn3=1 zn-1=1 zn-1,1=1 zn-1,2=0 zn-1,3=0 A11

A12

A13

zn-1=2 zn-1,1=0 zn-1,2=1 zn-1,3=0 A21

A22

A23

zn-3= 3 zn-1,1=0 zn-1,2=0 zn-1,3=1 A31

A32

Machine Learning: CSE 574

A33

State Transition Diagram

A11+A12+A13=1

• Not a graphical model since nodes are not separate variables but states of a single variable 6
• Here K=3

Conditional Probabilities



Transition probabilities Ajk represent state-to-state probabilities for each variable
Conditional probabilities are variable-to-variable probabilities •

can be written in terms of transition probabilities as
K

K

p (z n | z n −1 , A) = ∏∏ A



k =1 j =1

Note that exponent zn-1,j zn,k is a product that evaluates to 0 or 1
Hence the overall product will evaluate to a single Ajk for each setting of values of zn and zn-1





z n−1, j z n ,k jk E.g., zn-1=2 and zn=3 will result in only zn-1,2=1 and zn,3=1. Thus p(zn=3|zn-1=2)=A23 A is a global HMM parameter
Machine Learning: CSE 574

7

Initial Variable Probabilities
• Initial latent node z1 is a special case without


parent node
Represented by vector of probabilities π with elements πk=p(z1k=1) so that
K

p ( z1 | π ) = ∏ π

z1,k k where Σ kπ k = 1

k =1

• Note that π is an HMM parameter

• representing probabilities of each state for

the

first variable
Machine Learning: CSE 574

8

Lattice or Trellis Diagram
• State transition diagram unfolded over time

•…