Solutions for homework 2
P.1 Let p(x|ωi ) ∼ N (µi , Σ) for i = 1, 2 in a two-category d-dimensional problem with the same covariance but arbitrary means and prior probabilities. Consider the square Mahalanobis distance: ri2 = (x − µi )t Σ−1 (x − µi )
(a) Show that the gradient of ri2 is given by
∇ri2 = 2Σ−1 (x − µi )
(b) Show that any position on a given line through µi the gradient
∇ri2 points in the same direction. Must this direction be parallel to that line?
(e) True or False: For a two-category problem involving normal densities with arbitrary means and covariances, and P (ω1 ) = P (ω2 ) =
1/2, the Bayes decision boundary consists of the set of points of equal Mahalanobis distance from the respective sample means.
(a) ∇ri2 can be written as:
Σ−1 (x − µi ) + ((x − µi )t Σ−1 )t
Σ−1 (x − µi ) + (Σ−1 )t (x − µi )
Since the covariance matrix Σ is symmetry, (Σ−1 ) = Σ−1 .
∇ri2 = 2Σ−1 (x − µi )
(b) In a given line through µi , for any point x on this line, ∇ri2 = x−µi x−µi
2Σ−1 (x − µi ) = 2||x − µi ||Σ−1 ( ||x−µ
). Σ−1 ( ||x−µ
) is a constant for i || i || any point on the line. Thus for all points on this line, ∇ri2 points to the same direction.
When Σ = σ 2 I, for any σ > 0, the direction parallel to the line. Otherwise, the direction doesn’t parallel to the line.
(e) This statement is true. When P (ω1 ) = P (ω2 ) = 1/2 and the covariance matrix are equal for the two classes, the Bayes decision function turns to:
1 gi (x) = − (x − µi )t Σ−1 (x − µi ) = − ri2
The decision boundary corresponds to the points which have g1 (x) = g2 (x) ⇒ r12 = r22
Thus the Bayes decision boundary consists of the set of points of equal
Mahalanobis distance from the respective sample means. Otherwise, this will be false if the covariance matrix are not equal.
P.2 Suppose we have three categories in two dimensions with the following distributions: – p(x|ω1 ) ∼ N (0, I)
– p(x|ω2 ) ∼ N ([1, 1]t , I)
– p(x|ω2 ) ∼ 12 N ([0.5, 0.5]t , I) + 12 N ([−0.5, 0.5]t , I) with P (ωi ) = 1/3, i = 1, 2, 3.
(a) By explicit calculation of posterior probabilities, classify the point x = [0.3, 0.3]t for minimum probability error.
(b) Suppose that for a particular test point the first feature is missing.
That is, classify x = [∗, 0.3]t .
(c) Suppose that for a particular test point the second feature is missing. That is, classify x = [0.3, ∗]t .
i )P (ωi )
(a) p(ωi |x) = p(x|ωp(x)
. Since P (ω1 ) = P (ω2 ) = P (ω3 ) = 1/3, we just need to compare the likelihood p(x|ωi ).
exp − [0.3, 0.3][0.3, 0.3]t = 0.1455
1 p(x|ω2 ) = exp − [−0.7, −0.7][−0.7, −0.7]t = 0.0975
2 p(x|ω1 ) =
exp − [−0.2, −0.2][−0.2, −0.2]t
+ exp − [0.8, −0.2][0.8, −0.2]t
p(x|ω3 ) =
Since p(x|ω1 ) is the largest likelihood, x