Sargur N. Srihari srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/CSE574/index.html Machine Learning: CSE 574

0

HMM Topics

1.

2.

3.

4.

5.

6.

7.

What is an HMM?

State-space Representation

HMM Parameters

Generative View of HMM

Determining HMM Parameters Using EM

Forward-Backward or α−β algorithm

HMM Implementation Issues:

a)

b)

c)

d)

e)

Length of Sequence

Predictive Distribution

Sum-Product Algorithm

Scaling Factors

Viterbi Algorithm

Machine Learning: CSE 574

1

1. What is an HMM?

•

•

Ubiquitous tool for modeling time series data

Used in

•

•

•

Almost all speech recognition systems

Computational molecular biology

•

Group amino acid sequences into proteins

Handwritten word recognition

•

•

It is a tool for representing probability distributions over sequences of observations

HMM gets its name from two defining properties:

•

Example: z are phoneme sequences x are acoustic observations

•

•

Observation xt at time t was generated by some process whose state zt is hidden from the observer

Assumes that state at zt is dependent only on state zt-1 and independent of all prior states (First order)

Machine Learning: CSE 574

2

Graphical Model of HMM

• Has the graphical model shown below and latent variables are discrete

• Joint distribution has the form:

⎤ N

⎡ N p ( x1 ,..x N , z1 ,..z n ) = p ( z1 ) ⎢∏ p ( z n | z n −1 )⎥∏ p (xn | z n )

⎦ n =1

⎣ n=2

Machine Learning: CSE 574

3

HMM Viewed as Mixture

•

•

•

•

A single time slice corresponds to a mixture distribution with component densities p(x|z)

•

Each state of discrete variable z represents a different component

An extension of mixture model

•

Choice of mixture component depends on choice of mixture component for previous distribution

Latent variables are multinomial variables zn

•

That describe component responsible for generating xn

Can use one-of-K coding scheme

Machine Learning: CSE 574

4

2. State-Space Representation

• Probability distribution of zn

•

State k 1 2 . K of zn znk 0 1 . 0

depends on state of previous latent variable zn-1 through probability distribution p(zn|zn-1)

State

One-of K coding

of zn-1

• Since latent variables are Kdimensional binary vectors

Ajk=p(znk=1|zn-1,j=1)

ΣkAjk=1

A matrix • These are known as Transition

•

Probabilities zn-1 K(K-1) independent parameters

Machine Learning: CSE 574

j

1 2 . K zn-1,j 1 0 . 0 zn 1 2 …. K

1

2

…

K

Ajk

5

Transition Probabilities

Example with 3-state latent variable zn=1 zn=2 zn1=1 zn1=0 zn2=0 zn2=1 zn3=0 zn3=0

zn=3 zn1=0 zn2=0 zn3=1 zn-1=1 zn-1,1=1 zn-1,2=0 zn-1,3=0 A11

A12

A13

zn-1=2 zn-1,1=0 zn-1,2=1 zn-1,3=0 A21

A22

A23

zn-3= 3 zn-1,1=0 zn-1,2=0 zn-1,3=1 A31

A32

Machine Learning: CSE 574

A33

State Transition Diagram

A11+A12+A13=1

• Not a graphical model since nodes are not separate variables but states of a single variable 6

• Here K=3

Conditional Probabilities

•

•

Transition probabilities Ajk represent state-to-state probabilities for each variable

Conditional probabilities are variable-to-variable probabilities •

can be written in terms of transition probabilities as

K

K

p (z n | z n −1 , A) = ∏∏ A

•

•

k =1 j =1

Note that exponent zn-1,j zn,k is a product that evaluates to 0 or 1

Hence the overall product will evaluate to a single Ajk for each setting of values of zn and zn-1

•

•

z n−1, j z n ,k jk E.g., zn-1=2 and zn=3 will result in only zn-1,2=1 and zn,3=1. Thus p(zn=3|zn-1=2)=A23 A is a global HMM parameter

Machine Learning: CSE 574

7

Initial Variable Probabilities

• Initial latent node z1 is a special case without

•

parent node

Represented by vector of probabilities π with elements πk=p(z1k=1) so that

K

p ( z1 | π ) = ∏ π

z1,k k where Σ kπ k = 1

k =1

• Note that π is an HMM parameter

• representing probabilities of each state for

the

first variable

Machine Learning: CSE 574

8

Lattice or Trellis Diagram

• State transition diagram unfolded over time

•…