Example: stock market

CHAPTER 13 INTRODUCTION TO MULTIPLE CORRELATION …

CHAPTER 13 INTRODUCTION TO MULTIPLE CORRELATIONC hapter 12 introduced you to the concept of partialling and how partialling couldassist you in better interpreting the relationship between two primary variables. Anothercorrelational technique that utilizes partialling in its derivation is called MULTIPLECORRELATION. The main purpose of MULTIPLE CORRELATION , and also MULTIPLEREGRESSION, is to be able to predict some criterion variable better. Thus, while thefocus in partial and semi-partial CORRELATION was to better understand the relationshipbetween variables, the focus of MULTIPLE CORRELATION and regression is to be able to betterpredict criterion variables. The data set below represents a fairly simple and commonsituation in which MULTIPLE CORRELATION is used. STUDENT SATV SATM GPA 1 570 755 2 648 611 3 571 650 4 578 584 5 550 680

Remember I mentioned when using Minitab to do the regression analysis, that the basic format of the regression command is to indicate first which variable is the criterion, then indicate how many predictor variables there are, and then finally indicate which columns represent the predictor variables. For a simple regression problem, there would

Tags:

  Analysis, Variable, Regression, Regression analysis

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of CHAPTER 13 INTRODUCTION TO MULTIPLE CORRELATION …

1 CHAPTER 13 INTRODUCTION TO MULTIPLE CORRELATIONC hapter 12 introduced you to the concept of partialling and how partialling couldassist you in better interpreting the relationship between two primary variables. Anothercorrelational technique that utilizes partialling in its derivation is called MULTIPLECORRELATION. The main purpose of MULTIPLE CORRELATION , and also MULTIPLEREGRESSION, is to be able to predict some criterion variable better. Thus, while thefocus in partial and semi-partial CORRELATION was to better understand the relationshipbetween variables, the focus of MULTIPLE CORRELATION and regression is to be able to betterpredict criterion variables. The data set below represents a fairly simple and commonsituation in which MULTIPLE CORRELATION is used. STUDENT SATV SATM GPA 1 570 755 2 648 611 3 571 650 4 578 584 5 550 680 6 669 701 7 630 605 8 502 528 9 641 764

2 10 623 509 These data relate to the situation where we have information on the ScholasticAptitude Test, both Verbal and Math sections, and want to use that information to predictcollege grades (GPA). Generally speaking, we will have SATV and SATM data prior towhen students are admitted to college, and then will collect the criterion or GPA data information purposes, the descriptive statistics and correlations are given below. DESCRIPTIVE STATISTICS N MEAN STDEV SATV 10 SATM 10 GPA 10 CORRELATIONS SATV SATM SATM GPA What if we are interested in the best possible prediction of GPA?

3 If we use only onepredictor variable , either SATV or SATM, the predictor variable that correlates most highlywith GPA is SATV with a CORRELATION of .409. Thus, if you had to select one predictorvariable, it would be SATV since it correlates more highly with GPA than does SATM. As ameasure of how predictable the GPA values are from SATV, we could simply use thecorrelation coefficient or we could use the coefficient of determination, which is simply rsquared. Remember that r squared represents the proportion of the criterion variance thatis predictable. That value or coefficient of determination is as follows. 2 2 r = .409 = (SATV)(GPA)Approximately percent of the GPA criterion variance is predictable based onusing the information available to us by having students' SATV scores.

4 Actually, we mighthave done the regression analysis using SATV to predict GPA and these results wouldhave been as follows. This is taken from Minitab output; see page 108 for anotherexample. The regression equation is: PREDICTED GPA = + SATV Predictor Coef Stdev t-ratio p Constant SATV s = R-sq = R-sq(adj) = Notice that in the output from the regression analysis includes an r squared value(listed as R-sq) and that value is percent. In this regression model, based on aPearson CORRELATION , we find that about 17% of the criterion variance is predictable. But,can we do better? Not with one of the two predictors. However, we see that the best singlepredictor of GPA in this case is SATV accounting for approximately percent of thecriterion variance.

5 The CORRELATION we find between SATV and GPA is .409. But, it isobvious that we have a second predictor (SATM) that, while not correlating as high withGPA as SATV, does still have a positive CORRELATION of .34 with GPA. In selecting SATV asthe best single predictor, it is not because the SATM has no predictability but rather thatSATV is somewhat better. Perhaps it would make sense to explore the possibility ofcombining the two predictors together in some way to take into account the fact that bothdo correlate with the criterion of GPA. Is it possible that by combining SATV and SATM together in some way that we can improve on the prediction that is made when onlyselecting SATV in this case? By using both SATV and SATM added together in somefashion as a new variable , can we find a CORRELATION with the criterion that is larger (the larger of the two separate r values) or account for more criterion variance Remember I mentioned when using Minitab to do the regression analysis , that thebasic format of the regression command is to indicate first which variable is the criterion,then indicate how many predictor variables there are, and then finally indicate whichcolumns represent the predictor variables.

6 For a simple regression problem, there wouldbe one criterion and one predictor variable . However, the regression command allows youto have more than one predictor. For example, if we wanted to use both SATV and SATMas the combined predictors to estimate GPA, we would need to modify the regressioncommand as follows: c3 is the column for the GPA or criterion variable , there will be 2predictors, and the predictors will be c1 and c2 (where the data are located) for the SATVand SATM scores. Also, you can again store the predicted and error values. See below. MTB > regr c3 2 c1 c2 c10(c4);<---- Minitab command line SUBC> resi c5.<---- Minitab subcommand line The regression equation is: GPA = - + SATV + SATM s = R-sq = R-sq(adj) = usual, the first thing that appears from the regression command is theregression equation.

7 In the simple regression case, there will be an intercept value and aslope value that are attached to the predictor variable . However, in this MULTIPLE regressioncase, the regression equation needs to have the second predictor variable included andthere will be a "weight" or "slope-like" value also attached to the second predictor variable ,SATM in this instance. To make specific predictions using the equation, we would need tosubstitute both the SATV and SATM scores into the equation and then come up with thepredicted GPA value. For example, for the first student who obtained 570 on SATV and755 on SATM, the predicted GPA value would be as follows. PREDICTED GPA' = + .00318 (570) + .00139 (755) = the predicted GPA is and the real GPA is , that means that this predictionis - = in error; that is, the predicted value in this case underestimates the truevalue by.

8 72 of a GPA unit. Since the predicted GPA values and errors have been stored incolumns, we can look at all the data. As is normally the case, some of the predictions areoverestimates (case 3 for example) and some are underestimates (case 8 for example). Infact, if you added up the errors, you would find (within round-off error), that the sum wouldbe 0 since the overestimates and underestimates will balance out over the full set of at the data and statistics below. STUDENT SATV SATM GPA PredGPA ErrGPA 1 570 755 2 648 611 3 571 650 4 578 584 5 550 680 6 669 701 7 630 605 8 502 528 9 641 764

9 10 623 509 CORRELATIONS SATV SATM GPA PredGPA SATM GPA PredGPA ErrGPA The first 3 correlations at the top of the matrix are the intercorrelations amongst the3 primary variables: the two predictors and the criterion (these are underlined). Notice thatthe CORRELATION of each predictor with the residuals is 0 (these are slanted); recall from thediscussion of partial corrlation the fact was mentioned that the predictor will beuncorrelated with the residual or error values. The most important CORRELATION in the matrixis between GPA, or the actual criterion, and the predicted GPA: this CORRELATION is.

10 477(this has been bolded). When we combine the two predictors in some optimal way byusing the weights in the regression equation (.00318 for SATV and .00139 for SATM), wefind that this combination produces a CORRELATION of .477 with the criterion. In the literature,THE CORRELATION BETWEEN THE ACTUAL CRITERION variable AND THEPREDICTED CRITERION variable (based on a weighted combination of two or morepredictors) IS CALLED THE MULTIPLE CORRELATION . The symbolism is as = r(GPA)(GPA') = MULTIPLE CORRELATION In simple regression , it is commonplace to use a "small" r to indicate CORRELATION butto differentiate it from the MULTIPLE predictor case, where we use captial R for multiplecorrelation. The subscripts simply mean (in this case) that Y is the criterion variablethat is being predicted by a best weighted combination of predictors 1 and 2.


Related search queries