A Tutorial on Multivariate Statistical Analysis

A Tutorial onMultivariateStatistical AnalysisCraig A. TracyUC DavisSAMSIS eptember 20061 ELEMENTARY STATISTICSC ollection of (real-valued) data from a sequence of experimentsX1,X2,..,XnMight make assumption underlying law isN( , 2) with unknownmean and variance 2. Want to estimate and 2from the Mean & Sample Variance: X=1nXjXj, S=1n 1Xj Xj X 2 Estimators are unbiased E( X) = ,E(S) = 22 Theorem:IfX1,X2,..are independentN( , 2) variables then XandSare independent. We have that XisN( , 2/n) and(n 1)S/ 2is 2(n 1).Recall 2(d) denotes the chi-squared distribution withddegrees offreedom. Its density isf 2(x) =12d/2 (d/2)xd/2 1e x/2, x 0,where (z) =Z 0tz 1e tdt, (z)> GENERALIZATIONSFrom the classic textbook of Anderson[1]: Multivariate Statistical Analysis is concerned with data thatconsists of sets of measurements on a number of individualsor objects. The sample data may be heights and weights ofsome individuals drawn randomly from a population ofschool children in a given city, or the Statistical treatmentmay be made on a collection of measurements, such aslengths and widths of petals and lengths and widths ofsepals of iris plants taken from two species, or one maystudy the scores on batteries of mental tests administeredto a number of # of sets of measurements on a given individual,n= # of observations = sample size4 Remarks: In above examples, one can assume thatp nsince typicallymany measurements will be taken.

Today it is common forp 1, son/pis no longer Sound Signature Recognition:Vehicle noise is astochastic signal. The power spectrum is discretized to avector of lengthp= 1200 withn 1200 samples from thesame kind of :Sloan Digital Sky Survey typically has manyobservations (say of quasar spectrum) with the spectra ofeach quasar binned resulting in a data :S&P 500 stocks observed over monthlyintervals for twenty data MATRICESThe data are nownindependent column vectors of lengthp~x1,~x2,..,~xnfrom which we construct then pdata matrixX= ~xT1 ~xT2 .. ~xTn The Gaussian assumption is that~xj Np( , )Many applications assume the mean has been already substractedout of the data , = Gaussian DistributionIfxandyare vectors, the matrixx yis defined by(x y)jk=xjykIf =E(x) is the mean of the random vectorx, then thecovariance matrixofxis thep pmatrix =E[(x ) (x )]] is a symmetric, non-negative definite matrix. If >0 (positivedefinite) andX Np( , ), then the density function ofXisfX(x) = (2 ) p/2(det ) 1/2exp 12 x , 1(x ,> x RpSample mean: x=1nXj~xj,E( x) = 7 Sample covariance matrix:S=1n 1nXj=1(~xj x) (~xj x)For = 0 the sample covariance matrix can be written simply as1n 1 XTXSome Notation:IfXis an pdata matrix formed from thenindependent column vectorsxj, cov(xj) = , we can form onecolumn vector vec(X) of lengthpnvec(X) = 8 The covariance of vec(X) is thenp npmatrixIn = 0 0 00 0.)

0 0 0 In this case we say the data matrixXconstructed fromnindependentxj Np( , ) has distributionNp(M,In )whereM=E(X) =1 ,1is the column vector of all 1 DISTRIBUTIOND efinition:IfA=XTXwhere then pmatrixXisNp(0,In ), >0, thenAis said to haveWishart distributionwithndegrees of freedom and covariance matrix . We will sayAisWp(n, ).Remarks: The Wishart distribution is the Multivariate generalization ofthe chi-squared distribution. A Wp(n, ) is positive definite with probability one if andonly ifn p. The sample covariance matrix,S=1n 1 AisWp(n 1,1n 1 ).10 WISHART DENSITY FUNCTION,n pLetSpdenote the space ofp ppositive definite (symmetric)matrices. IfA= (ajk) Sp, let(dA) = volume element ofA=^j kdajkThemultivariate gamma functionis p(a) =ZSpe tr(A)(detA)a (p+1)/2(dA), (a)>(p 1) :IfAisWp(n, ) withn p, then the density functionofAis12np p(n/2) (det )n/2e 12tr( 1A)(detA)(n p 1)/211 Sketch of Proof: The density function forXis the Multivariate Gaussian(including volume element (dX))(2 ) np/2(det ) n/2e 12tr( 1 XTX)(dX) Recall theQR factorization[8]: LetXdenote ann pmatrixwithn pwith full column rank.

Then there exists an uniquen pmatrixQ,QTQ=Ip, and an uniquen puppertriangular matrixRwith positive diagonal elements so thatX=QR. NoteA=XTX=RTR. A Jacobian calculation [1, 13]: IfA=XTX, then(dX) = 2 p(detA)(n p 1)/2(dA)(QTdQ)12where(QTdQ) =p^j=1n^k=j+1qTkdqjandQ= (q1,..,qp) is the column representation ofQ. Thus the joint distribution ofAandQis(2 ) np/2(det ) n/2e 12tr( 1A) 2 p(detA)(n p 1)/2(dA)(QTdQ) Now integrate over allQ. Use fact thatZVn,p(QTdQ) =2p np/2 p(n/2)andVn,pis the set of realn pmatricesQsatisfyingQTQ=Ip. (Whenn=pthis is the orthogonal group.)13 Remarksregarding the Wishart density function Casep= 2 obtain by R. A. Fisher in 1915. Generalpby J. Wishart in 1928 by geometrical arguments. Proof outlined above came later. (See [1, 13] for completeproof.) WhenQis ap porthogonal matrix p(p/2)2p p2/2 QTdQ isnormalized Haar measurefor the orthogonal groupO(p). Wedenote this Haar measure by (dQ). Siegel proved (see, [13]) p(a) = p(p 1)/4pYj=1 a 12(j 1) 14 EIGENVALUES OF A WISHART MATRIXT heorem:IfAisWp(n, ) withn pthe joint density functionfor the eigenvalues 1.

, pofAis p2/22 np/2(det ) n/2 p(p/2) p(n/2)pYj=1 (n p 1)/2jYj<k| j k| ZO(p)e 12tr( 1 QLQT)(dQ),( 1> > p)whereL= diag( 1,.., p) and (dQ) is normalized Haar that ( ) :=Qj<k( j k) is the :IfAisWp(n,Ip), then the integral over the orthogonalgroup in the previous theorem ise 12Pj :Recall that the Wishart density function (times the volumeelement) is12np p(n/2) (det )n/2e 12tr( 1A)(detA)(n p 1)/2(dA)The idea is to diagonalizeAby an orthogonal transformation andthen integrate over the orthogonal group thereby giving thedensityfunction for the eigenvalues 1> > pbe the ordered eigenvalues , L= diag( 1,.., p), Q O(p)Thejthcolumn ofQis a normalized eigenvector ofA. Thetransformation is not 1 1 sinceQ= [ q1,.., qp] works for eachfixedA. The transformation is made 1 1 by requiring that the 1stelement of eachqjis nonnegative. This restrictsQ(asAvaries) toa 2 ppart ofO(p). We compensate for this at the need an expression for the volume element (dA) in terms ofQandL.

First we compute the differential ofAdA=dQLQT+QdLQT+QLdQTQTdAQ=QTdQL+dL+Ld QTQ= dQtQdL+LdQTQ+dL= L,dQTQ +dL(We usedQTQ=IimpliesQTdQ= dQTQ.)We now use the following fact (see, , page 58 in [13]): IfX=BY BTwhereXandYarep psymmetric matrices,Bis anonsingularp pmatrix, then (dX) = (detB)p+1(dY). In our caseQis orthogonal so the volume element (dA) equals the volumeelement (QTdAQ). The volume element is the exterior product ofthe diagonal elements ofQTdAQtimes the exterior product of theelements above the diagonal, the commutator L,dQTQ has zero diagonal elements. Thus the exterior product of thediagonal elements ofQTdAQisVjd exterior product of the elements coming from the commutatorisYj<k( j k)^j<kqTkdqjand so(dA) =^j<kqTkdqj ( )^jd j=2p p2/2 p(p/2)(dQ) ( )^jd jThe theorem now follows once integrate over all ofO(p) and dividethe result by One is interested inlimit lawsasn,p . For =Ip,Johnstone [11] proved, using RMT methods, for centering andscaling constants np= n 1 + p 2, np= n 1 + p 1 n 1+1 p 1/3that 1 np npconverges in distribution asn,p ,n/p < , to theGOE largest eigenvalue distribution [15].

El Karoui [6] has extended the result to . The casep nappears, for example, in microarray data . Soshnikov [14] has lifted Gaussian assumption under theadditional restrictionn p= O(p1/3).19 For 6=Ip, the difficulty in establishing limit theorms comesfrom the integralZO(p)e 12tr( 1Q QT)(dQ)Using zonal polynomials infinite series expansions have beenderived for this integral, but these expansions are difficulttoanalyze. See Muirhead [13]. ForcomplexGaussian data matricesXsimilar density formulasare known for the eigenvalues ofX X. Limit theorems for 6=Ipare known since the analogous group integral, now overthe unitary group, is known explicitly the HarishChandra Itzykson Zuber (HCIZ) integral (see, [17]).Seethe work of Baik, Ben Arous and P ech e [2, 3] and El Karoui [7].20 PRINCIPAL COMPONENT Analysis (PCA),H. Hotelling, 1933 Population Principal Components:Letxbe ap 1 randomvector withE(x) = and cov(x) = >0. Let 1 2 pdenote the eigenvalues of andHan orthogonal matrixdiagonalizing :HT H= = diag( 1.)

, p). We writeHincolumn vector formH= [h1,..,hp]so thathjis thep 1 eigenvector of corresponding to eigenvalue j. Define thep 1 vectoru=HTx= (u1,..,up)Tthencov(u) =E (HTx HT ) (HTx HT ) =HTE((x ) (x ))H=HT H= :ujis called thejthprincipal componentofx. Notevar(uj) = interpretations:The claim is thatu1is that linearcombination of components ofxthat hasmaximum : For simplicity of notation, set = 0. Letbdenote anyp 1vector,bTb= 1, and (bTx) =E bTx bTx =E bTx (bTx)T =bTE(xxT)b=bT want to maximize the right hand side subject to the constraintbTb= 1. By the method of Lagrange multipliers we maximizebT b (bTb 1)Since is symmetric the vector of partial derivatives is2 b 2 bThusbmust be an eigenvector with eigenvalue .22 The largest variance corresponds to choosing the largest general result is thaturhas maximum variance of allnormalized combinations uncorrelated withu1,.. ,ur principal components:LetSdenote the samplecovariance matrix of the data matrixXand letQ= [q1.]

,qp] ap porthogonal matrix diagonalizingS:QTS Q= diag( 1,.., p)The jare thesample variancesthat are estimates for j. Thevectorsqjaresample estimatesfor the the random vector andu=HTxis the vector of principalcomponents, then u=QTxis the vector ofsample PLOTSIn applications: How many of the j s are significant?24 CANONICAL CORRELATION Analysis (CCA)H. Hotelling, 1936 Suppose a large data set is naturally decomposed into two example,p 1 random vectors~x1,.. ,~xnmake up one set andq 1 random vectors~y1,.. ,~ymthe other. We are interested in thecorrelations between these two data sets. For example, in medicinewe might havenmeasurements of age, height, and weight (p= 3)andmmeasurements of systolic and diastolic blood pressures(q= 2). We are interested in what combination of the componentsofxis most correlated with a combination of the components Canonical Correlations:Letxandybe tworandom vectors of sizep 1 andq 1, respectively.

We assumeE(x) =E(y) = 0 andp q. Form the (p+q) 1 vector xy 25and its (p+q) (p+q) covariance matrix = 11 12 21 22 .Letu:= Tx R, v:= Ty Rwhere and are vectors to be determined. We want to maximizethe correlationcorr(u,v) =cov(u,v)pvar(u)var(v)The correlation does not change under scale transformationsu cu, etc. so we can maximize this correlation subject to theconstraintsE(u2) =E( Tx Tx) = T 11 = 1(1)E(v2) = T 22 = 1(2)26 Under these constraintscorr(u,v) =E( Tx Ty) = T 12 .Let = T 12 12 ( T 11 1) 12 ( t 22 1)where and are Lagrange multipliers. Set the vector of partialderivatives to zero: = 12 11 = 0(3) = T12 22 = 0(4)If we left multiply (3) by Tand (4) by T, use the normalizationconditions (1) and (2) we conclude = . Thus (3) and (4) become 11 12 21 22 = 0(5)27withcorr(u,v) = and 0 is a solution todet 11 12 21 22 = 0 This is a polynomial in of degree (p+q). Let 1denote themaximum root and 1and 1corresponding solutions to (5).

A Tutorial on Multivariate Statistical Analysis

Tags:

Information

Advertisement

Transcription of A Tutorial on Multivariate Statistical Analysis

Related search queries

A Tutorial on Multivariate Statistical Analysis

Tags:

Information

Advertisement

Documents from same domain

Related documents

Related search queries