Example: bankruptcy

Lecture 5 Hypothesis Testing in Multiple Linear Regression

Lecture 5 Hypothesis Testing in Multiple LinearRegressionBIOST 515 January 20, 20041 Types of tests Overall test Test for addition of a single variable Test for addition of a group of variables2 Overall testyi= 0+xi1 1+ +xip p+ iDoes theentireset of independent variables contributesignificantly to the prediction ofy?3 Test for an addition of a single variableDoes the addition ofoneparticular variable of interest addsignificantly to the prediction ofyacheived by the otherindependent variables already in the model?yi= 0+xi1 1+ +xip p+ i4 Test for addition of a group of variablesDoes the addition of somegroupof independent variables ofinterest add significantly to the prediction ofyobtainedthrough other independent variables already in the model?yi= 0+xi1 1+ +xi,p 1 p 1+xip p+ i5 The ANOVA tableSource ofSums of squaresDegrees ofMeanE[Mean square]variationfreedomsquareRegressionS SR= X y n y2pSSRpp 2+ RX CXC RErrorSSE=y y X y n (p+ 1)SSEn (p+1) 2 TotalSST O=y y n y2n 1 XCis the matrix of centered predictors:XC=0BB@x11 x1x12 x2 x1p xpx21 x1x22 x2 x2p x1xn2 x2 xnp xp1 CCAand R= ( 1, , p).

is often provided in the output from statistical software as Source of Sums of squares Degrees of F variation freedom Regression x 1 1 x 2|x 1 1... x p|x p−1,x p−2,··· ,x 1 1 ... • Is the increase in the regression sums of squares sufficient to warrant an additional predictor in the model? • Additional predictors will increase the ...

Tags:

  Statistical, Regression

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Lecture 5 Hypothesis Testing in Multiple Linear Regression

1 Lecture 5 Hypothesis Testing in Multiple LinearRegressionBIOST 515 January 20, 20041 Types of tests Overall test Test for addition of a single variable Test for addition of a group of variables2 Overall testyi= 0+xi1 1+ +xip p+ iDoes theentireset of independent variables contributesignificantly to the prediction ofy?3 Test for an addition of a single variableDoes the addition ofoneparticular variable of interest addsignificantly to the prediction ofyacheived by the otherindependent variables already in the model?yi= 0+xi1 1+ +xip p+ i4 Test for addition of a group of variablesDoes the addition of somegroupof independent variables ofinterest add significantly to the prediction ofyobtainedthrough other independent variables already in the model?yi= 0+xi1 1+ +xi,p 1 p 1+xip p+ i5 The ANOVA tableSource ofSums of squaresDegrees ofMeanE[Mean square]variationfreedomsquareRegressionS SR= X y n y2pSSRpp 2+ RX CXC RErrorSSE=y y X y n (p+ 1)SSEn (p+1) 2 TotalSST O=y y n y2n 1 XCis the matrix of centered predictors:XC=0BB@x11 x1x12 x2 x1p xpx21 x1x22 x2 x2p x1xn2 x2 xnp xp1 CCAand R= ( 1, , p).

2 6 The ANOVA table foryi= 0+xi1 1 +xi2 2 + +xip p+ iis often provided in the output from statistical software asSource of Sums of squaresDegrees of FvariationfreedomRegressionx11x2| |xp 1, xp 2, , x11 ErrorSSEn (p+ 1)TotalSST On 1whereSSR=SSR(x1) +SSR(x2|x1) + +SSR(xp|xp 1, xp 2, .. , x1)and haspdegrees of testH0: 1= 2= = p= 0H1: j6= 0for at least onej, j= 1, .. , pRejection ofH0implies that at least one of the regressors,x1, x2, .. , xp, contributes significantly to the will use a generalization of the F-test in simple linearregression to test this the null Hypothesis ,SSR/ 2 2pandSSE/ 2 2n (p+1)are independent. Therefore, we haveF0=SSR/pSSE/(n p 1)=M SRM SE Fp,n p 1 Note: as in simple Linear Regression , we are assuming that i N(0, 2)or relying on large sample example, 0+weighti 1+heighti 2+ i> anova(lmwtht)Analysis of Variance TableResponse: DIABPDf Sum Sq Mean Sq F value Pr(>F)WEIGHT 1 1289 1289 **HEIGHT 1 120 120 495 62426 126---Signif.

3 Codes: 0 ** ** * . 1F0=(1289 + 120)/262426/495= > F2,495,.95= reject the null Hypothesis at =.05and conclude that atleast one of 1or 2is not equal to overall F statistic is also available from the output ofsummary().> summary(lmwtht)Call:lm(formula = DIABP ~ WEIGHT + HEIGHT, data = chs)Coefficients:Estimate Std. Error t value Pr(>|t|)(Intercept) **WEIGHT *HEIGHT codes: 0 ** ** * . 1 Residual standard error: on 495 degrees of freedomMultiple R-Squared: , Adjusted R-squared: : on 2 and 495 DF, p-value: on individual Regression coefficientsOnce we have determined that at least one of the regressors isimportant, a natural next question might be which one(s)?Important considerations: Is the increase in the Regression sums of squares sufficient towarrant an additional predictor in the model?

4 Additional predictors will increase the variance of y- includeonly predictors that explain the response (note: we may notknow this through Hypothesis Testing as confounders may nottest significant but would still be necessary in the regressionmodel). Adding an unimportant predictor may increase the residualmean square thereby reducing the usefulness of the 0+xi1 1+ +xij j+ +xip p+ iH0: j= 0H1: j6= 0As in simple Linear Regression , under the null hypothesist0= j se( j) tn p rejectH0if|t0|> tn p 1,1 is apartial testbecause jdepends on all of the otherpredictorsxi, i6=jthat are in the model. Thus, this is a testof the contribution ofxjgiven the other predictors in example, 0+weighti 1+heighti 2+ iH0: 2= 0vsH1: 26= 0, given thatweightis in the the ANOVA table, 2= (X X) 1= 10 4 10 10 10 6 10 6 10 3 10 10 5 t0= 10 5= < t495.

5 975= , we fail to reject the null for groups of predictorsOften it is of interest to determine whether a group ofpredictors contribute to predictingygiven another predictor orgroup of predictors are in the model. In CHS example, we may want to know if age, height and sexare important predictors given weight is in the model whenpredicting blood pressure. We may want to know if additional powers of some predictorare important in the model given the Linear term is alreadyin the model. Given a predictor of interest, are interactions with otherconfounders of interest as well?15 Using sums of squares to test for groups ofpredictorsDetermine the contribution of a predictor or group ofpredictors to SSR given that the other regressors are in themodel using the Regression model withppredictorsy=X +.

6 We would like to determine if some subset ofr < ppredictorscontributes significantly to the Regression the vector of Regression coefficients as =[ 1 2]where 1is(p+ 1 r) 1and 2isr 1. We want to testthe hypothesisH0: 2= 0H1: 26= 0 Rewrite the model asy=X + =X1 1+X2 2+ ,(1)whereX= [X1|X2].17 Equation (1) is thefull modelwith SSR expressed asSSR(X) = X y(p+1 degrees of freedom)andM SE=y y X yn p find the contribution of the predictors inX2, fit the modelassumingH0is true. Thisreduced modelisy=X1 1+ ,where 1= (X1 X1)( 1)X1 y18andSSR(X1) = 1X1 y(p+1-r degrees of freedom).The Regression sums of squares due toX2whenX1is alreadyin the model isSSR(X2|X1) =SSR(X) SSR(X1)withrdegrees of freedom. This is also known as theextrasum of squares due (X2|X1)is independent ofM SE. We can testH0: 2= 0with the statisticF0=SSR(X2|X1)/rM SE Fr,n p example, model:yi= 0+weighti 1+heighti 2H0: 2= 0Df Sum Sq Mean Sq F value Pr(>F)WEIGHT1 495 = < F1,495, should look very similar to the t-test 0+weighti 1+heighti 2+agei 3+genderi 4+ > summary(lm(DIABP~WEIGHT+HEIGHT+AGE+GENDE R,data=chs))Coefficients:Estimate Std.

7 Error t value Pr(>|t|)(Intercept) **WEIGHT .HEIGHT **GENDER codes: 0 ** ** * . 1 Residual standard error: on 493 degrees of freedomMultiple R-Squared: , Adjusted R-squared: : on 4 and 493 DF, p-value: : 2= 3= 4= 0vsH1: j6=, j= 2,3,4Df Sum Sq Mean Sq F value Pr(>F)WEIGHT1 493 (intercept, weight, height, age, gender) =2571019 + + + + = 2573978 SSR(intercept, weight) = 257019 + = 2572308 SSR(height, age, gender|intercept, weight) = 2573978 2572308 = 1670 Notice we can also get this from the ANOVA table aboveSSR(height, age, gender|intercept,weight) = + + = 167022 The observed F statistic isF0= 1670/3 = > F3,493,.95= ,and we reject the null Hypothesis , concluding that at least oneof 2, 3or 4is not equal to should look very similar to the overall F test if weconsidered the intercept to be a predictor and all thecovariates to be the additional variables under if we had put the predictors in the model in a differentorder?

8 Diabpi= 0+heighti 2+agei 3+weighti 1+genderi 4+ Df Sum Sq Mean Sq F value Pr(>F)HEIGHT1 493 we use this table to testH0: 2= 3= 4= 0?24 What if we had the ANOVA table for the reduced model?Df Sum Sq Mean Sq F value Pr(>F)WEIGHT 1 496 thatSSR=SSR(x2) +SSR(x3|x2) +SSR(x1|x2, x3) +SSR(x4|x3, x2, x1)andSSR(x2, x3, x4|x1) =SSR SSR(x1)thenSSR(x2, x3, x4|x1) = + + + = other question we might be interested in asking is if there are any significantinteractions in the model?lm(DIABP~WEIGHT*HEIGHT*AGE*GENDER, data=chs)Estimate Std. Error t value Pr(>|t|)(Intercept) :HEIGHT :AGE :AGE : :HEIGHT:GENDER :AGE:GENDER :AGE:GENDER :HEIGHT: tableDfSum Sq Mean Sq F value Pr(>F) : : : : :HEIGHT: can simplify the ANOVA table toDf Sum Sq Mean Sq F value Pr(>F)Main effects 4 11 do we fill in the rest of this table?


Related search queries