Transcription of Multivariate Regression (Chapter 10)
1 Multivariate Regression ( chapter 10)This week we ll cover Multivariate Regression and maybe a bit of canonicalcorrelation. Today we ll mostly review univariate Multivariate Multivariate Regression , there are typically multiple dependentvariables as well as multiple independent or explanatory variables. Aspecial case of this is when the explanatory variables are categorical andthe dependent variables are continuous (particularly Multivariate normal),in which case we have MANOVA. For Multivariate Regression , we allow theexplanatory variables to be continuous. This approach generalizes multipleregression much as MANOVA generalizes in Regression , we think of theyvariables as random and thexvariables as fixed. For Multivariate Regression , we ll considerxvariables aseither fixed or random. We ll start with them being treated as 29, 20151 / 35 Multivariate regressionFirst, we ll review multiple (univariate) Regression with this model, we havey1= 0+p j=1 jx1j+ 1y2= 0+p j=1 jx2j+ 0+p j=1 jxnj+ nApril 29, 20152 / 35 Multivariate regressionThe standard assumptions for multiple Regression areE( i) = 0 Var( i) = 2cov( i, j) = 0 Equivalently, you can writeE( ) =0 Cov( ) = 2 IApril 29, 20153 / 35 Multivariate regressionUnder the assumption that thexs are fixed, we haveE(yi) = 0+p j=1 jx1jVar(yi) = 2 Cov(yi,yj) =Cov( i, j) = 0 Equivalently,E(y) =X Cov(y)
2 = 2 IApril 29, 20154 / 35 Multivariate regressionThe Regression model using matrix notation isy=X + When I was an undergrad, my Calc III professor suggested that we gettattoos off=ma,but if you are a statistics,y=X + would be 29, 20155 / 35 Multivariate regressionApril 29, 20156 / 35 Multivariate regressionWritten out, the matrix form looks like thisApril 29, 20157 / 35 Multivariate regressionTheXis called the design matrix and recall that it has a column of 1swhich is necessary for the estimation and hypothesis testing (for which variances are needed),you needn>q+ 1 April 29, 20158 / 35 Multivariate regressionThe least squares approach for estimating is to minimize the followingSSE=n i=1 2i=n i=1(yi yi)2=n i=1(yi 0+ 1x1i qxiq)2 This problem can be solved with calculus, or with less effort, using matrixalgebra:y=X If you set yequal to its expectation and to solve for , then get = (X X) 1X yApril 29, 20159 / 35 Multivariate regressionThe previous solution for estimating is the least squares solutionregardless of the distribution of the error term.
3 If the error terms areindependent and identically distributed ( ) asN(0, 2), then thesolution is also the maximum likelihood 29, 201510 / 35 Multivariate regressionAn unbiased estimator for 2iss2=SSEn q 1=1n q 1(y X ) (y X ) =y y X yApril 29, 201511 / 35 Multivariate regressionAnother way of writing the model is to center thexs, so you havex1=n i=1xi1, ,xq=n i=1xiqThen we write (next slide)April 29, 201512 / 35 April 29, 201513 / 35 Multivariate regressionThis approach is equivalent, and corresponds to the modelyi= +q j=1 j(xij xj)so thexs are centered and the intercept term is changed and becomes =yThe term 1is (q 1) 1 rather thanq 1, so we have = ( 0, 1) where 0= q j=1 jxjApril 29, 201514 / 35 Multivariate regressionTo do hypothesis tests, the total sums of squares foryis partitioned intoSSE and SSR. This is done as followsy y=y y Xy+ Xy=SSE+ Xy=SSE+ Xy+ny2 ny2=SSE+SSR ny2 y y+ny2=SSE+SSRA pril 29, 201515 / 35 Multivariate regressionA test for the nonintercept coefficientsH0: 1=0isF=SSR/qSSE/(n q 1)which has anFq,n q 1distribution under the null (and assuming normallydistributedyvalues).
4 April 29, 201516 / 35 Multivariate regressionYou can also test whether a subset of coefficients is 0. To do this, let dbe the subset of interest so that the null isH0: d=0 Have the betas arranged so that =( r d)The reduced model isy=Xr r+ rThe idea is that the reduced model has only the variables with 29, 201517 / 35 Multivariate regressionThe term ris estimated by r= (X rXr) 1 XryThe reduced model is tested against the full model usingF=( X y rX ry)/h(y y X y)/(n q 1)=SSRf SSRr)/hSSEf/(n q 1)=MSRMSE where the subscriptfrefers to the full model andhis the number ofparameters in d. The test statistic is compared to aFh,n q 29, 201518 / 35 Multivariate regressionA special case is testing individual predictor variables, in which caseh= 1,but the formulas hold for this case as well. In this particular case (withnumerator degrees of freedom equal to 1), theFstatistic is the square gives the proportion of variance explained by the model,which isR2= Regression sum of squarestotal sum of squares= X y ny2y y ny2 April 29, 201519 / 35 Multivariate regressionFor Multivariate Regression , we havepvariables fory, so thatY= (yij) isann pmatrix.
5 The observation vectors arey i,i= 1,..,n. As usual,observation vectors are considered as column vectors even though they arewritten horizontally in the data file and even though they correspond torows 29, 201520 / 35 Multivariate regressionThe design matrixXis as before with a column of 1s andqcolumnscorresponding toxvariables. However, there is now a column ofq coefficients for each of thepresponse variables. The model now hasB= ( 1,.., p) = ( ij),which is a (q+ 1) pmatrix. The model can be written asY=XB+ The model for an individual column ofYis equivalent to a univariatemultiple Regression model. (It so happens thatBis the capital of inGreek. However is not the capital of , so this choice of notation seemsa bit inconsistent. HoweverEis used as , which is the matrix analogueof 29, 201521 / 35 April 29, 201522 / 35 Multivariate regressionThe assumptions of the model (Y) =XB,E( ) = (y)i) = , fori= 1.
6 ,n, wherey iis theith row (yi,yj) =Ofori6=jNote thatCov(y)iisp 29, 201523 / 35 Multivariate regressionSimilar to univrariate multiple Regression , B= (X X) 1X Ysoywas replaced withYin the 29, 201524 / 35 Multivariate regressionAn estimator for the covariance matrix ofyiisSe=En q 1=(Y X B) (Y X B)n q 1 TheBcan be partitioned so that there is essentially a vector of interceptterms, one for each response variable, and a matrix of other 29, 201525 / 35 April 29, 201526 / 35 Multivariate regressionYou can also express Bas B=S 1xxSxywhere we use an estimated covariance matrix of all variables (whether ornot they are really random):y1,..,yp,x1,..,xqS=(SyySyxSxySxx )HereSis (p+q) (p+q).April 29, 201527 / 35 Multivariate regressionWe typically wish to testH0:B1=0againstHA:B16=0. This onlyrequires that one ij6= 0 for somei 1 and somej to MANOVA, we define matricesEandH.
7 To total sum ofsquares can be partitioned into these two matrices:Y Y ny y= (Y Y B X Y) + ( B X Y ny y)=E+HApril 29, 201528 / 35 Multivariate regressionSimilar to MANOVA, the eigenvalues ofE 1 Hcan be used to create teststatistics for testing the null s Lambda:| =min(p,q) i=111 + iRoy s greatest root: 11 + 1 Pillai s test:min(p,q) i=111 + iLawley-Hotelling test:min(p,q) i=1 iApril 29, 201529 / 35 Multivariate regressionIf you don t want to use specialized tables of critical values in the book forthese statistics, you can use the sameFapproximations that we used forMANOVA for Wilk s Lambda, where = q,p,n p 1, so that the degreesof freedom for theFtest are a function ofq,p, andn p 29, 201530 / 35 Multivariate regressionAs in the univariate, multiple Regression case, you can whether subsets ofthexvariables have coefficients of 0.
8 In this case, there is a matrix in thenull hypothesis,H0:Bd=0. TheEandHmatrices are given byE=Y Y B X YH= B X Y B rX rYAnd the test statistics are given as is also possible to try to pick a subset of theyvariables if some of theyvariables are not well-explained by thexvariables. This can also be donewith stepwise 29, 201531 / 35 Canonical correlation analysisCorrelation between two variables measure the linear relationship betweenthose two variables. In canonical correlation, we measure the linearrelationship between two sets of variables. Typically, variables within eachset will be related in some way, for example a set of student aptitudes orqualifications (high school GPA, SAT scores) and outcomes (college GPA,GRE scores), or variables on a child and similar variables on their 29, 201532 / 35 Canonical correlation analysisIf you only have one variable in one set,y, andqvariables in the other set,x1.
9 ,xq, then you can defineS=(s2ys yxsxySxx)R=(1r yxrxyRxx)wherer yxis a vectorwith sample correlations betweenyandxi,i= 1,.., squared multiple correlation betweenyandx1,..,xqisR2=r yxR 1xxrxyApril 29, 201533 / 35 Canonical correlation analysisWhen there are multipleyvariables, we useS=(SyySyxSxySxx)A measure of association isR2M=|SyxS 1xxSxy||Syy|=|S 1yySyxS 1xxSxy|=min(p,q) i=1r2iwhere ther2iterms are the eigenvalues ofS 1yySyxS valuesri,i= 1,..,min(p,q) are called thecanonical 29, 201534 / 35 Canonical correlation analysisThe largest canonical correlationr1, is used as a measure of association ofthe two sets of variables. An interpretation ofr21is that it is the maximumsquared correlation between a linear combination of theyvariables and alinear combination of each canonical correlation, there is a set of associated linearcombinations so that there existaiandbisuch thatri=cor(a y,b x)April 29, 201535 / 35 Canonical correlation analysisThere is some interesting discussion in the book about how the authorthinks that canonical correlation is often misapplied in you are ever asked to use canonical correlation, try looking this up!
10 April 29, 201536 / 35