Chapter 2 Multivariate Distributions

Chapter 2 Multivariate IntroductionDefinition importantmultivariate location and dispersion modelisa joint distribution with joint probability density function (pdf)f(z| , )for ap 1 random vectorxthat is completely specified by ap 1 populationlocationvector and ap psymmetric positive definite populationdispersionmatrix .ThusP(x A) = Af(z)dzfor suitable :Usually a vectorxwill be column vector, and a row vectorxTwill be the transpose of the vectorx. However, Af(z)dz= Af(z1, .., zp)dz1 notationf(z1, .., zp) will be used to write out the componentsziof ajoint pdff(z) although in the formula for the pdf, egf(z) =cexp(zTz),zis a column 1random vectorx= (x1, .., xp)T= (X1, .., Xp)TwhereX1, .., Xpareprandom variables. Acaseorobservationconsists oftheprandom variables measured for one person or thing.

For multivariatelocation and dispersion theith case isxi= (xi,1, .., xi,p)T. There arencases,and context will be used to determine whetherxis the random vector or the11observed value of the random cases that lie far awayfrom the bulk of the data, and they can ruin a classical thatx1, ..,xnareniidp 1 random vectors and that the jointpdf ofxiisf(z| , ).Also assume that the dataxihas been observed andstored in ann pmatrixW= = x1,1x1,2.. x1,px2,1x2,2.. x2, ,1xn,2.. xn,p =[v1v2..vp]where theith row ofWis theith casexTiand thejth columnvjofWcorresponds tonmeasurements of thejth random variableXjforj= 1, .., thenrows of the data matrixWcorrespond to thencases, while thepcolumns correspond to measurements on theprandom variablesX1.

, example, the data may consist ofnvisitors to a hospital where thep= 2variablesheightandweightof each individual were :In the theoretical sections of this text,xiwill sometimes bea random vector and sometimes the observed and Wichern(1988, p. 7, 53) usesXto denote then pdata matrix and an 1 randomvector, relying on the context to indicate whetherXis a random vector ordata matrix. Software tends to use different notation. For example,R/Spluswill use commands such asvar(x)to compute the sample covariance matrix of the data. HencexcorrespondstoW, x[,1] is the first column ofxandx[4,] is the 4th row The Sample Mean and Sample Covari-ance MatrixDefinition the second moments exist, thepopulation meanof arandomp 1 vectorx= (X1.)

, Xp)TisE(x) = = (E(X1), .., E(Xp))T,12and thep ppopulation covariance matrixCov(x) =E[(x E(x))(x E(x))T] =E[(x E(x))xT] =E(xxT) E(x)[E(x)]T= (( i,j)) = is, theijentry of Cov(x) is Cov(Xi, Xj) = i,j=E([Xi E(Xi)][Xj E(Xj)]).Thep ppopulation correlation matrix Cor(x) = = (( ij)).That is, theijentry of Cor(x) is Cor(Xi, Xj) = i,j i j= ij ii thep ppopulation standard deviation matrix = diag( 11, .., pp).Then x= ,( )and = 1 x 1.( )Let the population standardized random variablesZi=Xi E(Xi) iifori= 1, .., Cor(X) = is the covariance matrix ofz=(Z1, .., Zp) random vectorsxbep 1 andybeq 1. Thepopulation covariance matrixofxwithyis thep qmatrixCov(x,y) =E[(x E(x))(y E(y))T] =E[(x E(x))yT] =E(xyT) E(x)[E(y)]T= x,yassuming the expected values exist.

Note that theq pmatrix Cov(y,x) = y,x= Tx,y,and Cov(x) = Cov(x,x).13Ap 1 random vectorxhas anelliptically contoured distribution ,ifxhas pdff(z) =kp| | 1/2g[(z )T 1(z )],( )and we sayxhas an elliptically contouredECp( , , g) distribution . SeeChapter 3. If second moments exist for this distribution , thenE(x) = and Cov(x) = cx = xfor some constantcx>0 where theijentry is Cov(Xi, Xj) = i, , .., xnjbe measurements on theith randomvariableXjcorresponding to thejth column of the data matrixW. Thejthsample meanisxj=1nn k= covarianceSijestimatesCov(Xi, Xj) = ij, andSij=1n 1n k=1(xki xi)(xkj xj).Sii=S2iis thesample variancethat estimates the population variance ii= correlationrijestimates the population correlationCor(Xi, Xj) = ij, andrij=SijSiSj=Sij SiiSjj= nk=1(xki xi)(xkj xj) nk=1(xki xi)2 nk=1(xkj xj) meanorsample mean vectorx=1nn i=1xi= (x1.)

,xp)T=1nWT1where1is then 1 vector of ones. Thesample covariance matrixS=1n 1n i=1(xi x)(xi x)T= ((Sij)).That is, theijentry ofSis the sample covarianceSij. Theclassical estimatorof Multivariate location and dispersionis (x,S).14It can be shown that (n 1)S= ni=1xixTi xxT=WTW if thecentering matrixH=I 1n11T,then (n 1)S= correlation matrixR= ((rij)).That is, theijentry ofRis the sample the standardized random variablesZi=xi xi Siifori= 1, .., the sample covariance matrix ofz= (Z1, .., Zp) population and sample correlation are measures of the strength of alinear relationshipbetween two random variables, satisfying 1 ij 1and 1 rij thep psample standard deviation matrixD= diag( S11, .., Spp).ThenS=DRD,( )andR=D 1SD 1.( ) DistancesDefinition a positive definite symmetric matrix.

Then theMahalanobis distanceofxfrom the vector isDx( ,A) = (x )TA 1(x ).TypicallyAis a dispersion matrix. Thepopulation squared MahalanobisdistanceD2x( , ) = (x )T 1(x ).( )15 Estimators of Multivariate location and dispersion ( , ) are of squared Mahalanobis distanceD2x( , ) = (x )T 1(x ).( )Notation:Recall that a square symmetricp pmatrixAhas aneigen-value with correspondingeigenvectorx6=0ifAx= x.( )The eigenvalues ofAare real sinceAis symmetric. Note that if constantc6= 0 andxis an eigenvector ofA, thencxis an eigenvector ofA. Letebe an eigenvector ofAwith unit length e = eTe= eare eigenvectors with unit length, andAhaspeigenvalue eigenvectorpairs ( 1,e1),( 2,e2), ..,( p,ep). SinceAis symmetric, the eigenvectors arechosen such that theeiare orthogonal:eTiej= 0 fori6=j.

The symmetricmatrixAis positive definite iff all of its eigenvalues are positive, and pos-itive semidefinite iff all of its eigenvalues are nonnegative. IfAis positivesemidefinite, let 1 2 p 0. IfAis positive definite, then p> ap psymmetric matrix with eigenvectoreigenvalue pairs ( 1,e1),( 2,e2), ..,( p,ep) whereeTiei= 1 andeTiej= 0fori= 1, .., thespectral decompositionofAisA=p i=1 ieieTi= 1e1eT1+ + the same notation as Johnson and Wichern (1988, p. 50-51),letP= [e1e2 ep] be thep porthogonal matrix withith columnei. ThenP PT=PTP= = diag( 1, .., p) and let 1/2=diag( 1, .., p). IfAbe is positive definitep psymmetric matrix withspectral decompositionA= pi=1 ieieTi, thenA=P PTandA 1=P 1PT=p i=11 a positive definitep psymmetric matrix withspectral decompositionA= pi=1 root matrixA1/2=P 1/2 PTis a positive definite symmetric matrix such thatA1/2A1/2= the same distanceDx( ,A 1) lie on a hyperellipsoid.

LetmatrixAhave determinant det(A) =|A|. Recall that|A 1|=1|A|=|A| Johnson and Wichern (1988, p. 49-50, 102-103) for the following >0 be a constant, and letAbe a positive definitep psymmetric matrix with spectral decompositionA= pi=1 ieieTiwhere 1 2 p>0. Then{x: (x )TA(x ) h2}={x:D2x( ,A 1) h2}={x:Dx( ,A 1) h}defines a hyperellipsoid centered at with volume2 p/2p (p/2)|A| 1 =0. Then the axes of the hyperellipsoid are given by the eigenvectorseiofAwith half length in the direction ofeiequal toh/ ifori= 1, .., the following theorem, the shape of the hyperellipsoid isdetermined bythe eigenvectors and eigenvalues of : ( 1,e1), ..,( p,ep) where 1 2 p>0. Note 1has the same eigenvectors as but eigenvaluesequal to 1/ isince e= eiff 1 e=e= 1 divide bothsides by >0 since >0 and is symmetric.

Letw=x . Then pointsat squared distancewT 1w=h2from the origin lie on the hyperellipsoidcentered at the origin whose axes are given by the eigenvectors of wherethe half length in the direction ofeiish i. TakingA= 1orA=S 1in Theorem gives the volume results for the following be a positive definite symmetric matrix, eg adispersion matrix. LetU=D2x=D2x( , ).The hyperellipsoid{x|D2x h2}={x: (x )T 1(x ) h2},whereh2=u1 andP(U u1 ) =1 , is the highest density region covering 1 of the mass for an ellipticallycontouredECp( , , g) distribution (see Definition ) ifgis continuous anddecreasing. Letw=x . Then points at squared distancewTS 1w=h2from the origin lie on the hyperellipsoid centered at the origin whose axes17are given by the eigenvectorseiwhere the half length in the direction ofeiish i.

Chapter 2 Multivariate Distributions

Tags:

Information

Advertisement

Transcription of Chapter 2 Multivariate Distributions

Related search queries

Chapter 2 Multivariate Distributions

Tags:

Information

Advertisement

Documents from same domain

Related documents

Related search queries