More on Multivariate Gaussians - Stanford University

More on Multivariate GaussiansChuong B. DoNovember 21, 2008Up to this point in class, you have seen Multivariate Gaussians arisein a number of appli-cations, such as the probabilistic interpretation of linear regression, Gaussian discriminantanalysis, mixture of Gaussians clustering, and most recently, factor analysis. In these lec-ture notes, we attempt to demystify some of the fancier properties of Multivariate Gaussiansthat were introduced in the recent factor analysis lecture. The goal of these notes is to giveyou some intuition into where these properties come from, so that you can use them withconfidence on your homework (hint hint!) and DefinitionA vector-valued random variablex Rnis said to have amultivariate normal (or Gaus-sian) distributionwith mean Rnand covariance matrix Sn++1if its probabilitydensity function is given byp(x; , ) =1(2 )n/2| |1/2exp( 12(x )T 1(x )).We write this asx N( , ).2 Gaussian factsMultivariate Gaussians turn out to be extremely handy in practice due to the following facts: Fact #1:If you know the mean and covariance matrix of a Gaussian randomvariablex, you can write down the probability density function from the section notes on linear algebra thatSn++is the space of symmetric positive definiten nmatrices, defined asSn++={A Rn n:A=ATandxTAx >0 for allx Rnsuch thatx6= 0}.

1 Fact #2:The following Gaussian integrals have closed-form solutions: x Rnp(x; , )dx= p(x; , )dx1.. dxn= 1 x Rnxip(x; , 2)dx= i x Rn(xi i)(xj j)p(x; , 2)dx= ij. Fact #3: Gaussians obey a number ofclosureproperties: The sum of independent Gaussian random variables is Gaussian. The marginal of a joint Gaussian distribution is Gaussian. The conditional of a joint Gaussian distribution is first glance, some of these facts, in particular facts #1 and #2,may seem eitherintuitively obvious or at least plausible. What is probably not so clear, however, is whythese facts are so powerful. In this document, we ll provide some intuition for how these factscan be used when performing day-to-day manipulations dealing with Multivariate Gaussianrandom Closure propertiesIn this section, we ll go through each of the closure properties described earlier, and we lleither prove the property using facts #1 and #2, or we ll at least give some type of intuitionas to why the property is following is a quick roadmap of what we ll cover:sums marginals conditionalswhy is it Gaussian?

Noyesyesresulting density Sum of independent Gaussians is GaussianThe formal statement of this rule is:Suppose thaty N( , ) andz N( , ) are independent Gaussian dis-tributed random variables, where , Rnand , Sn++. Then, their sumis also Gaussian:y+z N( + , + ).Before we prove anything, here are some observations:21. The first thing to point out is that the importance of the independence assumption inthe above rule. To see why this matters, suppose thaty N( , ) for some meanvector and covariance matrix , and suppose thatz= y. Clearly,zalso has aGaussian distribution (in fact,z N( , ), buty+zis identically zero!2. The second thing to point out is a point of confusion for many students: if we addtogether two Gaussian densities ( bumps in multidimensional space), wouldn t we getback some bimodal ( , two-humped density)? Here, the thing to realize is that thedensity of the random variabley+zin this rule is NOT found by simply adding thedensities of the individual random variablesyandz.)

Rather, the density ofy+zwillactually turn out to be aconvolutionof the densities show that theconvolution of two Gaussian densities gives a Gaussian density, however, is beyond thescope of this , let s just use the observation that the convolution does give some type of Gaus-sian density, along with Fact #1, to figure out what the density,p(y+z| , ) would be, ifwe were to actually compute the convolution. How can we do this?Recall that from Fact#1, a Gaussian distribution is fully specified by its mean vector and covariance matrix. Ifwe can determine what these are, then we re this is easy! For the mean, we haveE[yi+zi] =E[yi] +E[zi] = i+ ifrom linearity of expectations. Therefore, the mean ofy+zis simply + . Also, the(i, j)th entry of the covariance matrix is given byE[(yi+zi)(yj+zj)] E[yi+zi]E[yj+zj]=E[yiyj+ziyj+yizj+zizj] (E[yi] +E[zi])(E[yj] +E[zj])=E[yiyj] +E[ziyj] +E[yizj] +E[zizj] E[yi]E[yj] E[zi]E[yj] E[yi]E[zj] E[zi][zj]= (E[yiyj] E[yi]E[yj]) + (E[zizj] E[zi]E[zj])+ (E[ziyj] E[zi]E[yj]) + (E[yizj] E[yi]E[zj]).

Using the fact thatyandzare independent, we haveE[ziyj] =E[zi]E[yj] andE[yizj] =E[yi]E[zj]. Therefore, the last two terms drop out, and we are left with,E[(yi+zi)(yj+zj)] E[yi+zi]E[yj+zj]= (E[yiyj] E[yi]E[yj]) + (E[zizj] E[zi]E[zj])= ij+ example, ifyandzwere univariate Gaussians ( ,y N( , 2),z N( , 2)), then theconvolution of their probability densities is given byp(y+z; , , 2, 2) = p(w; , 2)p(y+z w; , 2)dw= 1 2 exp( 12 2(w )2) 1 2 exp( 12 2(y+z w )2)dw3 From this, we can conclude that the covariance matrix ofy+zis simply + .At this point, take a step back and think about what we have just done. Using somesimple properties of expectations and independence , we havecomputed the mean and co- variance matrix ofy+z. Because of Fact #1, we can thus write down the density fory+zimmediately, without the need to perform a convolution! Marginal of a joint Gaussian is GaussianThe formal statement of this rule is:Suppose that[xAxB] N([ A B],[ AA AB BA BB]),wherexA Rm,xB Rn, and the dimensions of the mean vectors and covariancematrix subblocks are chosen to matchxAandxB.

Then, the marginal densities,p(xA) = xB Rnp(xA, xB; , )dxBp(xB) = xA Rmp(xA, xB; , )dxAare Gaussian:xA N( A, AA)xB N( B, BB).To justify this rule, let s just focus on the marginal distribution with respect to the , note that computing the mean and covariance matrix fora marginal distributionis easy: simply take the corresponding subblocks from the mean and covariance matrix ofthe joint density. To make sure this is absolutely clear, let s look at the covariance betweenxA,iandxA,j(theith component ofxAand thejth component ofxA). Note thatxA,iandxA,jare also theith andjth components of[xAxB]3Of course, we needed to know thaty+zhad a Gaussian distribution in the first general, for a random vectorxwhich has a Gaussian distribution, we can always permute entries ofxso long as we permute the entries of the mean vector and the rows/columns of the covariance matrix inthe corresponding way. As a result, it suffices to look only atxA, and the result forxBfollows (sincexAappears at the top of this vector).

To find their covariance , we need to simply lookat the (i, j)th element of the covariance matrix,[ AA AB BA BB].The (i, j)th element is found in the AAsubblock, and in fact, is precisely AA,ij. Usingthis argument for alli, j {1, .. , m}, we see that the covariance matrix forxAis simply AA. A similar argument can be used to find that the mean ofxAis simply A. Thus, theabove argument tells us that if we knew that the marginal distribution overxAis Gaussian,then we could immediately write down a density function forxAin terms of the appropriatesubmatrices of the mean and covariance matrices for the jointdensity!The above argument, though simple, however, is somewhat unsatisfying: how can weactually be sure thatxAhas a Multivariate Gaussian distribution? The argument for thisis slightly long-winded, so rather than saving up the punchline, here s our plan of attack upfront:1. Write the integral form of the marginal density Rewrite the integral by partitioning the inverse covariance Use a completion-of-squares argument to evaluate the integral Argue that the resulting density is s see each of these steps in The marginal density in integral formSuppose that we wanted to compute the density function ofxAdirectly.

Then, we wouldneed to compute the integral,p(xA) = xB Rnp(xA, xB; , )dxB=1(2 )m+n2 AA AB BA BB 1/2 xB Rnexp( 12[xA AxB B]T[ AA AB BA BB] 1[xA AxB B]) Partitioning the inverse covariance matrixTo make any sort of progress, we ll need to write the matrix product in the exponent in aslightly different form. In particular, let us define the matrixV R(m+n) (m+n)as5V=[VAAVABVBAVBB]= ,Vis called the precision might be tempting to think thatV=[VAAVABVBAVBB]=[ AA AB BA BB] 1 = [ 1AA 1AB 1BA 1BB]However, the rightmost equality does not hold! We ll return tothis issue in a later step; fornow, though, it suffices to defineVas above without worrying what actual contents of eachsubmatrix this definition ofV, the integral expands top(xA) =1Z xB Rnexp( [12(xA A)TVAA(xA A) +12(xA A)TVAB(xB B)+12(xB B)TVBA(xA A) +12(xB B)TVBB(xB B)])dxB,whereZis some constant not depending on eitherxAorxBthat we ll choose to ignore forthe moment.

If you haven t worked with partitioned matricesbefore, then the expansionabove may seem a little magical to you. It is analogous to the idea that when defining aquadratic form based on some 2 2 matrixA, thenxTAx= i jAijxixj=x1A11x1+x1A12x2+x2A21x1+ some time to convince yourself that the matrix generalization above also Integrating outxBTo evaluate the integral, we ll somehow want to integrate outxB. In general, however,Gaussian integrals are hard to compute by hand. Is there anything we can do to save time?There are, in fact, a number of Gaussian integrals for which theanswer is already known(see Fact #2). The basic idea in this section, then, will be to transform the integral we hadin the last section into a form where we can apply one of the results from Fact #2 in orderto perform the required integration key to this is a mathematical trick known as completion of squares. Consider thequadratic functionzTAz+bTz+cwhereAis a symmetric, nonsingular matrix.

Then, onecan verify directly that12zTAz+bTz+c=12(z+A 1b)TA(z+A 1b)+c 12bTA is the Multivariate generalization of the completionof squares argument used in singlevariable algebra:12az2+bz+c=12a(z+ba)2+c b22a6To apply the completion of squares in our situation above, letz=xB BA=VBBb=VBA(xA A)c=12(xA A)TVAA(xA A).Then, it follows that the integral can be rewritten asp(xA) =1Z xB Rnexp( [12(xB B+V 1 BBVBA(xA A))TVBB(xB B+V 1 BBVBA(xA A))+12(xA A)TVAA(xA A) 12(xA A)TVABV 1 BBVBA(xA A)])dxBWe can factor out the terms not includingxBto obtain,p(xA) = exp( 12(xA A)TVAA(xA A) +12(xA A)TVABV 1 BBVBA(xA A)) 1Z xB Rnexp( 12[(xB B+V 1 BBVBA(xA A))TVBB(xB B+V 1 BBVBA(xA A))])dxBAt this point, we can now apply Fact #2. In particular, we knowthat generically speaking,for a Multivariate Gaussian distributed random variablexwith mean and covariance matrix , the density function normalizes, ,1(2 )n/2| |1/2 Rnexp( 12(x )T 1(x ))= 1,or equivalently, Rnexp( 12(x )T 1(x ))= (2 )n/2| |1 use this fact to get rid of the remaining integral in our expression forp(xA):p(xA) =1Z (2 )n/2|VBB|1/2 exp( 12(xA A)T(VAA VABV 1 BBVBA)(xA A)).

More on Multivariate Gaussians - Stanford University

Tags:

Information

Transcription of More on Multivariate Gaussians - Stanford University

Related search queries

More on Multivariate Gaussians - Stanford University

Tags:

Information

Documents from same domain

Related documents

Related search queries