Transcription of 3. The Multivariate Normal Distribution
1 3. The Multivariate Normal Distribution Introduction A generalization of the familiar bell shaped Normal density to several dimensions plays a fundamental role in Multivariate analysis While real data are never exactly Multivariate Normal , the Normal density is often a useful approximation to the true population Distribution because of a central limit effect. One advantage of the Multivariate Normal Distribution stems from the fact that it is mathematically tractable and nice results can be obtained. 1. To summarize, many real-world problems fall naturally within the framework of Normal theory. The importance of the Normal Distribution rests on its dual role as both population model for certain natural phenomena and approximate sampling Distribution for many statistics. 2. The Multivariate Normal density and Its Properties Recall that the univariate Normal Distribution , with mean and variance 2, has the probability density function 1 2. f (x) = e [(x )/ ] /2. <x< . 2 2.
2 The term 2. x . = (x )( 2) 1(x ).. This can be generalized for p 1 vector x of observations on serval variables as (x )0 1(x ). The p 1 vector represents the expected value of the random vector X, and the p p matrix is the variance - covariance matrix of X. 3. A p-dimensional Normal density for the random vector X 0 = [X1, X2, .. , Xp]. has the form 1 (x )0 1 (x )/2. f (x) = e (2 )p/2| |1/2. where < xi < , i = 1, 2, .. , p. We should denote this p-dimensional Normal density by Np( , ). 4. Example (Bivariate Normal density) Let us evaluate the p = 2 variate Normal density in terms of the individual parameters 1 = E(X1), 2 =.. E(X2), 11 = Var(X1), 22 = Var(X2), and 12 = 12/( 11 22) =. Corr(X1, X2). Result If is positive definite, so that 1 exists, then 1. e = e implies 1e = e . so ( , e) is an eigenvalue-eigenvector pair for corresponding to the pair (1/ , e) for 1. Also 1 is positive definite. 5. 6. Constant probability density contour = { all x such that (x )0 1(x ) = c2}.
3 = surface of an ellipsoid centered at . Contours of constant density for the p-dimensional Normal Distribution are ellipsoids defined by x such the that (x )0 1(x ) = c2.. These ellipsoids are centered at and have axes c iei, where ei = i for i = 1, 2, .. , p. 7. Example (Contours of the bivariate Normal density) Obtain the axes of constant probability density contours for a bivariate Normal Distribution when 11 = 22. 8. The solid ellipsoid of x values satisfying (x )0 1(x ) 2p( ). has probability 1 where 2p( ) is the upper (100 )th percentile of a chi-square Distribution with p degrees of freedom. 9. Additional Properties of the Multivariate Normal Distribution The following are true for a Normal vector X having a Multivariate Normal Distribution : 1. Linear combination of the components of X are normally distributed. 2. All subsets of the components of X have a ( Multivariate ) Normal Distribution . 3. Zero covariance implies that the corresponding components are independently distributed.
4 4. The conditional distributions of the components are Normal . 10. Result If X is distributed as Np( , ), then any linear combination of variables a0X = a1X1 + a2X2 + + apXp is distributed as N (a0 , a0 a). Also if a0X is distributed as N (a0 , a0 a) for every a, then X must be Np( , ). Example (The Distribution of a linear combination of the component of a Normal random vector ) Consider the linear combination a0X of a Multivariate Normal random vector determined by the choice a0 = [1, 0, .. , 0]. Result If X is distributed as Np( , ), the q linear combinations . a11X1 + + a1pXp a21X1 + + a2pXp . A(q p)Xp 1 = .. aq1X1 + + aqpXp are distributed as Nq (A , A A0). Also X p 1 + dp 1, where d is a vector of constants, is distributed as Np( + d, ). 11. Example (The Distribution of two linear combinations of the components of a Normal random vector ) For X distributed as N3( , ), find the Distribution of . X1. X1 X2 1 1 0 X2 = AX. =. X2 X3 0 1 1. X3. 12. Result All subsets of X are normally distributed.
5 If we respectively partition X, its mean vector , and its covariance matrix as . X1 1.. (q 1) .. (q 1) .. X (p 1) = .. (p 1) = .. X2 2 . (p q) 1 (p q) 1. and . 11 12.. (q 1) (q (p q)) .. (p p) = .. 21 22 . ((p q) q) ((p q) (p q)). then X 1 is distributed as Nq ( 1, 11). Example (The Distribution of a subset of a Normal random vector ). If X is distributed as N5( , ), find the Distribution of [X2, X4]0. 13. Result (a) If X 1 and X 2 are independent, then Cov(X 1, X 2) = 0, a q1 q2 matrix of zeros, where X 1 is q1 1 random vector and X 2 is q2 1. random vector . X1 1 11 12. (b) If is Nq1+q2 , , then X 1 and X 2 are X2 2 21 22. independent if and only if 12 = 21 = 0. (c) If X 1 and X 2 are independent and are distributed as Nq1 ( 1, 11). X1. and Nq2 ( 2, 22), respectively, then has the Multivariate Normal X2. Distribution . 1 11 0. Nq1+q2 , 2 0 22. 14. Example (The equivalence of zero covariance and independence for Normal variables) Let X 3 1 be N3( , ) with . 4 1 0. = 1 3 0.
6 0 0 2. Are X1 and X2 independent ? What about (X1, X2) and X3 ? . X1 1. Result Let X = be distributed as Np( , ) with , =. X 2 2. 11 12. , and | 22| > 0. Then the conditional Distribution of X 1, given 21 22. that X 2 = x2 is Normal and has Mean = 1 + 12 1. 22 (x2 2 ). and covariance = 11 12 1. 22 21. Note that the covariance does not depend on the value x2 of the conditioning variable. 15. Example (The conditional density of a bivariate Normal Distribution ). Obtain the conditional density of X1, give that X2 = x2 for any bivariate Distribution . Result Let X be distributed as Np( , ) with | | > 0. Then (a) (X )0 1(X ) is distributed as 2p, where 2p denotes the chi-square Distribution with p degrees of freedom. (b) The Np( , ) Distribution assign probability 1 to the solid ellipsoid {x : (x )0 1(x ) 2p( )}, where 2p( ) denote the upper (100 )th percentile of the 2p Distribution . 16. Result Let X 1, X 2, .. , X n be mutually independent with X j distributed as Np( j , ). (Note that each X j has the same covariance matrix .)
7 Then V1 = c1X 1 + c2X 2 + + cnX n ! n n c2j ) . Moreover, V1 and V2 = b1X 1 +. P P. is distributed as Np cj j , (. j=1 j=1. b2X 2 + + bnX n are jointly Multivariate Normal with covariance matrix n . c2j ) b0c . P. (. j=1.. n . 0. b2j ) . P . b c 21 (. j=1. n 0. P. Consequently, V1 and V2 are independent if b c = cj bj = 0. j=1. 17. Example (Linear combinations of random vectors) Let X 1, X 2, X 3. and X 4 be independent and identically distributed 3 1 random vectors with . 3 3 1 1. = 1 and = 1 1 0 . 1 1 0 2. (a) find the mean and variance of the linear combination a0X 1 of the three components of X 1 where a = [a1 a2 a3]0. (b) Consider two linear combinations of random vectors 1 1 1 1. X1 + X2 + X3 + X4. 2 2 2 2. and X 1 + X 2 + X 3 3X 4. Find the mean vector and covariance matrix for each linear combination of vectors and also the covariance between them. 18. Sampling from a Multivariate Normal Distribution and Maximum Likelihood Estimation The Multivariate Normal Likelihood Joint density function of all p 1 observed random vectors X 1, X 2.
8 , X n . Joint density of X 1, X 2, .. , X n n . Y 1 0 1. (xj ) (xj )/2. = p/2 | |1/2. e j=1. (2 ). n (xj )0 1 (xj )/2. P. 1 . j=1. = e (2 )np/2| |n/2. " !#. n tr 1 (xj x )(xj x )0 +n(x )(x )0. P. 1 2. j=1. = np/2 n/2. e (2 ) | |. 19. Likelihood When the numerical values of the observations become available, they may be substituted for the xj in the equation above. The resulting expression, now considered as a function of and for the fixed set of observations x1, x2, .. , xn, is called the likelihood. Maximum likelihood estimation One meaning of best is to select the parameter values that maximize the joint density evaluated at the observations. This technique is called maximum likelihood estimation, and the maximizing parameter values are called maximum likelihood estimates. Result Let A be a k k symmetric matrix and x be a k 1 vector . Then (a) x0Ax = tr(x0Ax) = tr(Axx0). n P. (b) tr(A) = i, where the i are the eigenvalues of A. i=1. 20. Maximum Likelihood Estimate of and.
9 Result Given a p p symmetric positive definite matrix B and a scalar b > 0, it follows that 1 tr( 1B)/2 1 pb bp b e b (2b) e | | |B|. for all positive definite p p, with equality holding only for = (1/2b)B. Result Let X 1, X 2, .. , X n be a random sample from a Normal population with mean and covariance . Then n 1X n 1. = X and = (X j X )(X j X )0 = S. n j=1 n are the maximum likelihood estimators of and , respectively. Their n (xj x )(xj x )0, are called the maximum P. observed value x and (1/n). j=1. likelihood estimates of and . 21. Invariance Property of Maximum likelihood estimators Let be the maximum likelihood estimator of , and consider the parameter h( ), which is a function of . Then the maximum likelihood estimate of h( ) is given by h( ). For example 1. The maximum likelihood estimator of 0 1 is 1 , where = X and = n 1. n S are the maximum likelihood estimators of and respectively.. 2. The maximum likelihood estimator of ii is ii, where n 1X. ii = (Xij X i)2.
10 N j=1. is the maximum likelihood estimator of ii = Var(Xi). 22. Sufficient Statistics Let X 1, X 2, .. , X n be a random sample from a Multivariate Normal population with mean and covariance . Then n 1 X. X and S = (X j X )(X j X )0 are sufficient statistics n 1 j=1. The importance of sufficient statistics for Normal populations is that all of the information about and in the data matrix X is contained in X and S, regardless of the sample size n. This generally is not true for nonnormal populations. Since many Multivariate techniques begin with sample means and covariances, it is prudent to check on the adequacy of the Multivariate Normal assumption. If the data cannot be regarded as Multivariate Normal , techniques that depend solely on X and S may be ignoring other useful sample information. 23. The Sampling Distribution of X and S. The univariate case (p = 1). X is Normal with mean =(population mean) and variance 1 2 population variance =. n sample size n For the sample variance , recall that (n 1)s2 = (Xj X )2 is distributed P.