Example: biology

Chapter 5 Multiple correlation and multiple regression

Chapter 5 Multiple correlation and Multiple regressionThe previous Chapter considered how to determine the relationship between two variablesand how to predict one from the other. The general solution was to consider the ratio ofthe covariance between two variables to the variance of the predictor variable ( regression )or the ratio of the covariance to the square root of the product the variances ( correlation ).This solution may be generalized to the problem of how to predict a single variable from theweighted linear sum of Multiple variables ( Multiple regression ) or to measure the strength ofthis relationship ( Multiple correlation ). As part of the problem of finding the weights, theconcepts ofpartial covarianceandpartial correlationwill be introduced.

130 5 Multiple correlation and multiple regression 5.2.1 Direct and indirect effects, suppression and other surprises If the predictor set x i,x j are uncorrelated, then each separate variable makes a unique con- tribution to the dependent variable, y, and R2,the amount of variance accounted for in y,is the sum of the individual r2.In that case, even though each predictor accounted for only

Tags:

  Correlations

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Advertisement

Transcription of Chapter 5 Multiple correlation and multiple regression

1 Chapter 5 Multiple correlation and Multiple regressionThe previous Chapter considered how to determine the relationship between two variablesand how to predict one from the other. The general solution was to consider the ratio ofthe covariance between two variables to the variance of the predictor variable ( regression )or the ratio of the covariance to the square root of the product the variances ( correlation ).This solution may be generalized to the problem of how to predict a single variable from theweighted linear sum of Multiple variables ( Multiple regression ) or to measure the strength ofthis relationship ( Multiple correlation ). As part of the problem of finding the weights, theconcepts ofpartial covarianceandpartial correlationwill be introduced.

2 To do all of this willrequire finding the variance of a composite score, and the covariance of this composite withanother score, which might itself be a of psychometric theory is merely an extension, an elaboration, or a generalizationof these concepts. Almost all tests are composites of items or subtests. An understandinghow to decompose test variance into its component parts, and conversely, an understandinghow to analyze tests as composites of items, allows us to analyze the meaning of tests. Buttests are not merely composites of items. Tests relate to other tests. A deep appreciation ofthe basic Pearson correlation coefficient facilitates an understanding of its generalization tomultiple and partial correlation , to factor analysis, and to questions of The variance of compositesIfx1andx2are vectors of N observations centered around their mean (that is, deviationscores) their variances areVx1= x2i1/(N 1) andVx2= x2i2/(N 1), or, in matrix termsVx1=x 1x1/(N 1) andVx2=x 2x2/(N 1).

3 The variance of the composite made up of the sumof the corresponding scores,x+yis justV(x1+x2)= (xi+yi)2N 1= x2i+ y2i+2 xiyiN 1=(x+y) (x+y)N 1.( ) the case of n xs, the composite matrix of these is justNXnwith dimensionsof N rows and n columns. The matrix of variances and covariances of the individual items ofthis composite is written asSas it is a sample estimate of the population variance-covariancematrix, . It is perhaps helpful to viewSin terms of its elements, n of which are variances1271285 Multiple correlation and Multiple regressionandn2 n=n (n 1) are covariances:S= vx1cx1x2 vxn The diagonal ofS=diag(S) is just the vector of individual variances. ThetraceofSisthe sum of the diagonals and will be used a great deal when considering how to estimatereliability.

4 It is convenient to represent the sum of all of the elements in the matrix,S, as thevariance of the composite X XN 1=1 (X X)1N Multiple regressionThe problem of the optimal linear prediction of yin terms ofxmay be generalized to theproblem of linearly predicting yin terms of a composite variableXwhereXis made up ofindividual variablesx1,x2,..,xn. Just the optimal slope for predictingy, so it is possible to find a set of weights ( weightsin the standardized case,bweightsinthe unstandardized case) for each of the first the problem of two predictors,x1andx2, we want to find the find weights,bi, that when multiplied byx1andx2maximize the covariances withy. That is, we want tosolve the two simultaneous equations vx1b1+cx1x2b2=cx1ycx1x2b1+vx2b2=cx2y.

5 Or, in the standardized case, find the i: 1+rx1x2 2=rx1yrx1x2 1+ 2=rx2y .( )We can directly solve these two equations by adding and subtracting terms to the twosuch that we end up with a solution to the first in terms of 1and to the second in terms of 2: 1=rx1y rx1x2 2 2=rx2y rx1x2 1 ( )Substituting the second row of ( ) into the first row, and vice versa we find 1=rx1y rx1x2(rx2y rx1x2 1) 2=rx2y rx1x2(rx1y rx1x2 2) Collecting terms and rearranging Multiple regression129 1 r2x1x2 1=rx1y rx1x2rx2y 2 r2x1x2 2=rx2y rx1x2rx1y leads to 1=(rx1y rx1x2rx2y)/(1 r2x1x2) 2=(rx2y rx1x2rx1y)/(1 r2x1x2) ( )Alternatively, these two equations ( ) may be represented as the product of a vector ofunknowns (the s) and a matrix of coefficients of the predictors (therxis) and a matrix ofcoefficients for the criterion (rxiy).

6 ( 1 2) rx1x1rx1x2rx1x2rx2x2 =(rx1yrx2x2)( )If we let =( 1 2),R= rx1x1rx1x2rx1x2rx2x2 andrxy=(rx1yrx2x2) then R=rxy( )and we can solve by multiplying both sides by the inverse ofR. = RR 1=rxyR 1( )Similarly, ifcxyrepresents the covariances of thexiwithy,thenthebweights may be foundbyb=cxyS 1and thus, the predicted scores are y= X=rxyR 1X.( )The iare thedirect effectsof effectsofxionyare the correlations ,theindirect effectsreflect the product of the correlations between the predictor variables andthe direct effects of each predictor of thebor vectors, with many diagnostic statistics of the quality of theregression, may be found using thelmfunction. When using categorical predictors, the linearmodel is also known asanalysis of variancewhich may be done using theanovaandaovfunc-tions.

7 When the outcome variables are dichotomous,logistic regressionusing thegeneralizedlinear modelfunctionglmand a binomial error function. A complete discussion of the powerof the generalized linear model is beyond any introductory text, and the interested readeris referred to ,Cohen et al.(2003);Dalgaard(2002);Fox(2008);Judd and McClelland(1989);Venables and Ripley(2002).Diagnostic tests of the regressions, including plots of the residuals versus estimated values,tests of the normality of the residuals, identification of highly weighted subjects are availableas part of the graphics associated with Multiple correlation and Multiple Direct and indirect effects, suppression and other surprisesIf the predictor setxi,xjare uncorrelated, then each separate variable makes a unique con-tribution to the dependent variable,y, andR2,the amount of variance accounted for iny,isthe sum of the individualr2.

8 In that case, even though each predictor accounted for only10% of the variance ofy, with just 10 predictors, there would be no unexplained , most predictors are correlated, and the s found less thanthe original correlations and sinceR2= irxiy= rxytheR2will not increase as much as it would if the predictors were less or not interesting case that occurs infrequently, but is important to consider, is the case not correlate with the criterion variable, but, because it doescorrelate with the other predictor variables, removes variance from those other predictor vari-ables (Nickerson,2008;Paulhus et al.,2004). This has the effect of reducing the denominatorin thus increasing thebetaifor the other variables. Consider the case oftwo predictors of stock broker success: self reported need for achievement and self reportedanxiety ( ).

9 Although Need Achievement has a modest correlation with success, andAnxiety has none at all, adding Anxiety into the Multiple regression increases to .12. An explanation for this particular effect might be that people wo want to be stockbrokers are more likely to say that they have high Need Achievement. Some of this vari-ance is probably legitimate, but some might be due to a tendency to fake positive anxious scores could reflect a tendency to fake positive by denying negative those who are willing to report being anxious probably are anxious, and are telling thetruth. Thus, adding anxiety into the regression removes some misrepresentation from theNeed Achievement scores, and increases the Multiple Interactions and product terms: the need to center the dataIn psychometric applications, the main use of regression is in predicting a single criterionvariable in terms of the linear sums of a predictor set.

10 Sometimes, however, a more appropriatemodel is to consider that some of the variables have multiplicative effects ( , interact) suchthe effect of x on y depends upon a third variable z. This can be examined by using theproduct terms of x and z. But to do so and to avoid problems of interpretation, it is firstnecessary tozero centerthe predictors so that the product terms are not correlated with theadditive terms. The default values of thescalefunction will center as well as standardizethe scores. To just center a variable, x, usescale(x,scale=FALSE). This will preserve theunits of a matrix but thelmfunction requires a as input. Thus,it is necessary to convert the output ofscaleback into a detailed discussion of how to analyze and then plot data showing interactions betweenexperimental variables and subject variables ( , manipulated positive affect and extraver-sion) or interactions of subject variables with each other ( , neuroticism and extraversion)1 Atlhough the correlation values are enhanced to show the effect, this particular example was observedin a high stakes employment testing Multiple regression131 Table example of suppression is found when predicting stockbroker success from self reportmeasures of need for achievement and anxiety.


Related search queries