The Multivariate Gaussian Distribution - Stanford University

The Multivariate Gaussian DistributionChuong B. DoOctober 10, 2008A vector-valued random variableX= X1 Xn Tis said to have amultivariatenormal (or Gaussian ) distributionwith mean Rnand covariance matrix Sn++1if its probability density function2is given byp(x; , ) =1(2 )n/2| |1/2exp 12(x )T 1(x ) .We write this asX N( , ). In these notes, we describe Multivariate Gaussians and someof their basic Relationship to univariate GaussiansRecall that the density function of aunivariate normal (or Gaussian ) distributionisgiven byp(x; , 2) =1 2 exp 12 2(x )2 .Here, the argument of the exponential function, 12 2(x )2, is a quadratic function of thevariablex. Furthermore, the parabola points downwards, as the coefficient of the quadraticterm is negative. The coefficient in front,1 2 , is a constant that does not depend onx;hence, we can think of it as simply a normalization factor used to ensure that1 2 Z exp 12 2(x )2 = from the section notes on linear algebra thatSn++is the space of symmetric positive definiten nmatrices, defined asSn++= A Rn n:A=ATandxTAx >0 for allx Rnsuch thatx6= 0.

2In these notes, we use the notationp( ) to denote density functions, instead offX( ) (as in the sectionnotes on probability theory). 10 50510 10 1: The figure on the left shows a univariate Gaussian density for a single figure on the right shows a Multivariate Gaussian density overtwo the case of the Multivariate Gaussian density, the argument ofthe exponential function, 12(x )T 1(x ), is aquadratic formin the vector variablex. Since is positivedefinite, and since the inverse of any positive definite matrix isalso positive definite, thenfor any non-zero vectorz,zT 1z >0. This implies that for any vectorx6= ,(x )T 1(x )>0 12(x )T 1(x )< in the univariate case, you can think of the argument of the exponential function asbeing a downward opening quadratic bowl. The coefficient in front ( ,1(2 )n/2| |1/2) has aneven more complicated form than in the univariate case. However, it still does not dependonx, and hence it is again simply a normalization factor used to ensure that1(2 )n/2| |1/2Z Z Z exp 12(x )T 1(x ) dx1dx2 dxn= The covariance matrixThe concept of thecovariance matrixis vital to understanding Multivariate Gaussiandistributions.

Recall that for a pair of random variablesXandY, theircovarianceisdefined asCov[X, Y] =E[(X E[X])(Y E[Y])] =E[XY] E[X]E[Y].When working with multiple variables, the covariance matrixprovides a succinct way tosummarize the covariances of all pairs of variables. In particular, the covariance matrix,which we usually denote as , is then nmatrix whose (i, j)th entry isCov[Xi, Xj].2 The following proposition (whose proof is provided in the Appendix ) gives an alter-native way to characterize the covariance matrix of a randomvectorX:Proposition any random vectorXwith mean and covariance matrix , =E[(X )(X )T] =E[XXT] T.(1)In the definition of Multivariate Gaussians, we required that the covariance matrix be symmetric positive definite ( , Sn++). Why does this restriction exist? As seenin the following proposition, the covariance matrix ofanyrandom vector must always besymmetric positive semidefinite:Proposition that is the covariance matrix corresponding to some randomvectorX.

Then is symmetric positive symmetry of follows immediately from its definition. Next, for any vectorz Rn, observe thatzT z=nXi=1nXj=1( ijzizj)(2)=nXi=1nXj=1(Cov[Xi, Xj] zizj)=nXi=1nXj=1(E[(Xi E[Xi])(Xj E[Xj])] zizj)=E"nXi=1nXj=1(Xi E[Xi])(Xj E[Xj]) zizj#.(3)Here, (2) follows from the formula for expanding a quadratic form (see section notes on linearalgebra), and (3) follows by linearity of expectations (see probability notes).To complete the proof, observe that the quantity inside the brackets is of the formPiPjxixjzizj= (xTz)2 0 (see problem set #1). Therefore, the quantity inside theexpectation is always nonnegative, and hence the expectation itself must be conclude thatzT z the above proposition it follows that must be symmetric positive semidefinite inorder for it to be a valid covariance matrix. However, in orderfor 1to exist (as required inthe definition of the Multivariate Gaussian density), then mustbe invertible and hence fullrank.

Since any full rank symmetric positive semidefinite matrix is necessarily symmetricpositive definite, it follows that must be symmetric positive The diagonal covariance matrix caseTo get an intuition for what a Multivariate Gaussian is, considerthe simple case wheren= 2,and where the covariance matrix is diagonal, ,x= x1x2 = 1 2 = 2100 22 In this case, the Multivariate Gaussian density has the form,p(x; , ) =12 2100 22 1/2exp 12 x1 1x2 2 T 2100 22 1 x1 1x2 2 !=12 ( 21 22 0 0)1/2exp 12 x1 1x2 2 T"1 21001 22# x1 1x2 2 !,where we have relied on the explicit formula for the determinant of a 2 2 matrix3, and thefact that the inverse of a diagonal matrix is simply found by taking the reciprocal of eachdiagonal entry. Continuing,p(x; , ) =12 1 2exp 12 x1 1x2 2 T"1 21(x1 1)1 22(x2 2)#!=12 1 2exp 12 21(x1 1)2 12 22(x2 2)2 =1 2 1exp 12 21(x1 1)2 1 2 2exp 12 22(x2 2)2 .The last equation we recognize to simply be the product of two independent Gaussian den-sities, one with mean 1and variance 21, and the other with mean 2and variance generally, one can show that ann-dimensional Gaussian with mean Rnanddiagonal covariance matrix = diag( 21, 22.)

, 2n) is the same as a collection ofnindepen-dent Gaussian random variables with mean iand variance 2i, IsocontoursAnother way to understand a Multivariate Gaussian conceptuallyis to understand the shapeof itsisocontours. For a functionf:R2 R, an isocontour is a set of the form x R2:f(x) =c .for somec , a bc d =ad are often also known aslevel curves. More generally, alevel setof a functionf:Rn R,is a set of the form x R2:f(x) =c for somec Shape of isocontoursWhat do the isocontours of a Multivariate Gaussian look like? As before, let s consider thecase wheren= 2, and is diagonal, ,x= x1x2 = 1 2 = 2100 22 As we showed in the last section,p(x; , ) =12 1 2exp 12 21(x1 1)2 12 22(x2 2)2 .(4)Now, let s consider the level set consisting of all points wherep(x; , ) =cfor some constantc R. In particular, consider the set of allx1, x2 Rsuch thatc=12 1 2exp 12 21(x1 1)2 12 22(x2 2)2 2 c 1 2= exp 12 21(x1 1)2 12 22(x2 2)2 log(2 c 1 2) = 12 21(x1 1)2 12 22(x2 2)2log 12 c 1 2 =12 21(x1 1)2+12 22(x2 2)21 =(x1 1)22 21log 12 c 1 2 +(x2 2)22 22log 12 c 1 2.

Definingr1=s2 21log 12 c 1 2 r2=s2 22log 12 c 1 2 ,it follows that1 = x1 1r1 2+ x2 2r2 2.(5)Equation (5) should be familiar to you from high school analytic geometry: it is the equationof anaxis-aligned ellipse, with center ( 1, 2), where thex1axis has length 2r1and thex2axis has length 2r2! Length of axesTo get a better understanding of how the shape of the level curves vary as a function ofthe variances of the Multivariate Gaussian Distribution , suppose that we are interested in5 6 4 2024681012 6 4 202468 4 20246810 4 202468 Figure 2:The figure on the left shows a heatmap indicating values of the density function for anaxis-aligned Multivariate Gaussian with mean = 32 and diagonal covariance matrix = 25 00 9 . Notice that the Gaussian is centered at (3,2), and that the isocontours are allelliptically shaped with major/minor axis lengths in a 5:3 ratio. The figure on the rightshows a heatmap indicating values of the density function for anon axis-aligned multivariateGaussian with mean = 32 and covariance matrix = 10 55 5.

Here, the ellipses areagain centered at (3,2), but now the major and minor axes have been rotated via a values ofr1andr2at whichcis equal to a fraction 1/eof the peak height of , observe that maximum of Equation (4) occurs wherex1= 1andx2= 2. Substi-tuting these values into Equation (4), we see that the peak height of the Gaussian densityis12 1 , we substitutec=1e 12 1 2 into the equations forr1andr2to obtainr1=vuuut2 21log 12 1 2 1e 12 1 2 = 1 2r2=vuuut2 22log 12 1 2 1e 12 1 2 = 2 this, it follows that the axis length needed to reach a fraction 1/eof the peak height ofthe Gaussian density in theith dimension grows in proportion to the standard deviation , this again makes sense: the smaller the variance of some random variablexi, themore tightly peaked the Gaussian Distribution in that dimension, and hence the smallerthe Non-diagonal case, higher dimensionsClearly, the above derivations rely on the assumption that isa diagonal matrix.

However,in the non-diagonal case, it turns out that the picture is not all that different. Insteadof being an axis-aligned ellipse, the isocontours turn out to besimplyrotated , in then-dimensional case, the level sets form geometrical structures known Linear transformation interpretationIn the last few sections, we focused primarily on providing an intuition for how multivariateGaussians with diagonal covariance matrices behaved. In particular, we found that ann-dimensional Multivariate Gaussian with diagonal covariance matrix could be viewed simplyas a collection ofnindependent Gaussian -distributed random variables with means and vari-ances iand 2i, respectvely. In this section, we dig a little deeper and provide a quantitativeinterpretation of Multivariate Gaussians when the covariance matrix is not key result of this section is the following theorem (see proof in Appendix ).Theorem N( , )for some Rnand Sn++.

Then, there exists a matrixB Rn nsuch that if we defineZ=B 1(X ), thenZ N(0, I).7To understand the meaning of this theorem, note that ifZ N(0, I), then using theanalysis from Section 4,Zcan be thought of as a collection ofnindependent standard normalrandom variables ( ,Zi N(0,1)). Furthermore, ifZ=B 1(X ) thenX=BZ+ follows from simple , the theorem states that any random variableXwith a Multivariate Gaus-sian Distribution can be interpreted as the result of applying alinear transformation (X=BZ+ ) to some collection ofnindependent standard normal random variables (Z).8 Appendix prove the first of the two equalities in (1); the proof of the other equality is similar. = Cov[X1, X1] Cov[X1, Xn]..Cov[Xn, X1] Cov[Xn, Xn] = E[(X1 1)2] E[(X1 1)(Xn n)]..E[(Xn n)(X1 1)] E[(Xn n)2] =E (X1 1)2 (X1 1)(Xn n)..(Xn n)(X1 1) (Xn n)2 (6)=E X1 n X1 1 Xn n (7)=E (X )(X )T .Here, (6) follows from the fact that the expectation of a matrix is simply the matrix foundby taking the componentwise expectation of each entry.

The Multivariate Gaussian Distribution - Stanford University

Tags:

Information

Transcription of The Multivariate Gaussian Distribution - Stanford University

Related search queries

The Multivariate Gaussian Distribution - Stanford University

Tags:

Information

Documents from same domain

Related documents

Related search queries