Lecture 13: Simple Linear Regression in Matrix Format

11:55 Wednesday 14thOctober, 2015 See updates and corrections ~cshalizi/mreg/ Lecture 13: Simple Linear Regression in MatrixFormat36-401, Section B, Fall 201513 October 2015 Contents1 Least Squares in Matrix The Basic Matrices .. Mean Squared Error .. Minimizing the MSE ..42 Fitted Values and Residuals .. Expectations and Covariances ..73 Sampling Distribution of Estimators84 Derivatives with Respect to Second Derivatives .. Maxima and Minima .. 115 Expectations and Variances with Vectors and Matrices126 Further Reading1312So far, we have not used any notions, or notation, that goes beyond basicalgebra and calculus (and probability).

This has forced us to do a fair amountof book-keeping, as it were by hand. This is just about tolerable for the simplelinear model , with one predictor variable. It will get intolerable if we havemultiple predictor variables. Fortunately, a little application of Linear algebrawill let us abstract away from a lot of the book-keeping details, and makemultiple Linear Regression hardly more complicated than the Simple notes will not remind you of how Matrix algebra works. However, theywill review some results aboutcalculuswith matrices, and about expectationsand variances with vectors and , bold-faced letters will denote matrices, asaas opposed to Least Squares in Matrix FormOur data consists ofnpaired observations of the predictor variableXand theresponse variableY, , (x1,y1).

(xn,yn). We wish to fit the modelY= 0+ 1X+ (1)whereE[ |X=x] = 0, Var [ |X=x] = 2, and is uncorrelated across The Basic MatricesGroup all of the observations of the response into a single column (n 1) matrixy,y= (2)Similarly, we group both the coefficients into a single vector ( , a 2 1matrix) =[ 0 1](3)We d also like to group the observations of the predictor variable together,but we need something which looks a little unusual at first:x= (4)1 Historically, Linear models with multiple predictors evolved before the use of Matrix alge-bra for Regression . You may imagine the resulting I need to also assume that is Gaussian, and strengthen uncorrelated to inde-pendent , I ll say :55 Wednesday 14thOctober, Mean Squared ErrorThis is ann 2 Matrix , where the first column is always 1, and the second columncontains the actual observations ofX.

We have this apparently redundant firstcolumn because of what it does for us when we multiplyxby :x = 0+ 1x1 0+ 0+ 1xn (5)That is,x is then 1 Matrix which contains the point matrixxis sometimes called thedesign Mean Squared ErrorAt each data point, using the coefficients results in some error of prediction,so we havenprediction errors. These form a vector:e( ) =y x (6)(You can check that this subtracts ann 1 Matrix from ann 1 Matrix .)When we derived the least squares estimator, we used the mean squarederror,MSE( ) =1nn i=1e2i( )(7)How might we express this in terms of our matrices? I claim that the correctform isMSE( ) =1neTe(8)To see this, look at what the Matrix multiplication really involves:[ ] (9)This, clearly equals ie2i, so the MSE has the claimed us expand this a little for further ( ) =1neTe(10)=1n(y x )T(y x )(11)=1n(yT TxT)(y x )(12)=1n(yTy yTx TxTy+ TxTx )(13)11:55 Wednesday 14thOctober, Minimizing the MSEN otice that (yTx )T= TxTy.

Further notice that this is a 1 1 Matrix , soyTx = TxTy. ThusMSE( ) =1n(yTy 2 TxTy+ TxTx )(14) Minimizing the MSEF irst, we find the gradient of the MSE with respect to : MSE( =1n( yTy 2 TxTy+ TxTx )(15)=1n(0 2xTy+ 2xTx )(16)=2n(xTx xTy)(17)We now set this to zero at the optimum, :xTx xTy= 0(18)This equation, for the two-dimensional vector , corresponds to our pair of nor-mal or estimating equations for 0and 1. Thus, it, too, is called an , = (xTx) 1xTy(19)That is, we ve got one Matrix equation which gives us both coefficient this is right, the equation we ve got above should in fact reproduce theleast-squares estimates we ve already derived, which are of course 1=cXYs2X=xy x yx2 x2(20)and 0=y 1x(21)Let s see if that s a first step, let s introduce normalizing factors of 1/ninto both the matrixproducts: = (n 1xTx) 1(n 1xTy)(22)Now let s look at the two factors in parentheses separately, from right to [ xn] (23)=1n[ iyi ixiyi](24)=[yxy](25)11:55 Wednesday 14thOctober, 20155 Similarly for the other factor.)

1nxTx=1n[n ixi ixi ix2i](26)=[1xxx2](27)Now we need to take the inverse:(1nxTx) 1=1x2 x2[x2 x x1](28)=1s2X[x2 x x1](29)Let s multiply together the pieces.(xTx) 1xTy=1s2X[x2 x x1][yxy](30)=1s2X[x2y xxy xy+xy](31)=1s2X[(s2X+ x2)y x(cXY+ x y)cXY](32)=1s2X[s2xy+ x2y xcXY x2 ycXY](33)=[y cXYs2 XxcXYs2X](34)which is what it should :n 1xTyis keeping track ofyandxy, andn 1xTxkeeps track ofxandx2. The Matrix inversion and multiplication then handles all the book-keeping to put these pieces together to get the appropriate (sample) variances,covariance, and intercepts. We don t have to remember that any more; we canjust remember the one Matrix equation, and then trust the Linear algebra totake care of the Fitted Values and ResidualsRemember that when the coefficient vector is , the point predictions for eachdata point arex.

Thus the vector of fitted values, m(x), or mfor short, is m=x (35)Using our equation for , m=x(xTx) 1xTy(36)11:55 Wednesday 14thOctober, 20156 Notice that the fitted values are Linear iny. The matrixH x(xTx) 1xT(37)does not depend onyat all, but does control the fitted values: m=Hy(38)If we repeat our experiment (survey, observation.. ) many times at the samex, we get differentyevery time. ButHdoes not change. The properties ofthe fitted values are thus largely determined by the properties ofH. It thusdeserves a name; it s usually called thehat Matrix , for obvious reasons, or, ifwe want to sound more respectable, theinfluence s look at some of the properties of the hat not a function ofy, we can easily verify that mi/ yj=Hij.

Thus,Hijis the rate at which theithfitted value changes as we varythejthobservation, the influence that observation has on that s easy to see thatHT= square matrixais calledidempotent3whena2=a(and soak=afor any higher powerk). Again, by writing out themultiplication,H2=H, so it s , Projection, GeometryIdempotency seems like the mostobscure of these properties, but it s actually one of the more maren-dimensional vectors. If we project a vectoruon to the line in thedirection of the length-one vectorv, we getvvTu(39)(Check the dimensions:uandvare bothn 1, sovTis 1 n, andvTuis1 1.) If we group the first two terms together, like so,(vvT)u(40)wherevvTis then nproject matrixorprojection operatorfor that a unit vector,vTv= 1, and(vvT)(vvT) =vvT(41)so the projection operator for the line is idempotent.

The geometric meaning ofidempotency here is that once we ve projecteduon to the line, projecting itsimage on to the same line doesn t change this same reasoning, for any Linear subspace of then-dimensionalspace, there is always somen nmatrix which projects vectors in arbitrary3 From the Latinidem, same , andpotens, power .11:55 Wednesday 14thOctober, Residualsposition down into the subspace, and this projection Matrix is always idempo-tent. It is a bit more convoluted to prove that any idempotent Matrix is theprojection Matrix for some subspace, but that s also true. We will see later howto read off the dimension of the subspace from the properties of its ResidualsThe vector of residuals,e, is juste y x (42)Using the hat Matrix ,e=y Hy= (I H)y(43)Here are some properties ofI ei/ yj= (I H) (I H)T=I (I H)2= (I H)(I H) =I H H+H2.

But, sinceHis idempotent,H2=H, and thus (I H)2= (I H).Thus,MSE( ) =1nyT(I H)T(I H)y(44)simplifies toMSE( ) =1nyT(I H)y(45) Expectations and CovariancesWe can of course consider the vector of random variablesY. By our modelingassumptions,Y=x + (46)where is ann 1 Matrix of random variables, with mean vector0, and variance-covariance Matrix 2I. What can we deduce from this?First, the expectation of the fitted values:E[HY] =HE[Y](47)=Hx +HE[ ](48)=x(xTx) 1xTx + 0(49)=x (50)which is as it should be, since the fitted values are :55 Wednesday 14thOctober, 20158 Next, the variance-covariance of the fitted values:Var [HY] = Var [H(x + )](51)= Var [H ](52)=HVar [ ]HT(53)= 2 HIH(54)= 2H(55)using, again, the symmetry and idempotency , the expected residual vector is zero:E[e] = (I H)(x +E[ ]) =x x = 0(56)The variance-covariance Matrix of the residuals.

Lecture 13: Simple Linear Regression in Matrix Format

Tags:

Information

Transcription of Lecture 13: Simple Linear Regression in Matrix Format

Related search queries

Lecture 13: Simple Linear Regression in Matrix Format

Tags:

Information

Documents from same domain

Related documents

Related search queries