Example: dental hygienist

CORRELATION COEFFICIENT: ASSOCIATION BETWEEN TWO ...

TUTORIAL | SCOPE. CORRELATION COEFFICIENT: ASSOCIATION . BETWEEN TWO CONTINUOUS VARIABLES. Dr Jenny Freeman and Dr Tracey Young use statistics to calculate the CORRELATION coefficient: the ASSOCIATION BETWEEN two continuous variables Many statistical analyses can be variable is influencing the value of the undertaken to examine the relationship other variable; CORRELATION simply BETWEEN two continuous variables within measures the degree to which the two a group of subjects. Two of the main vary together. A positive CORRELATION purposes of such analyses are: indicates that as the values of one I To assess whether the two variables variable increase the values of the other are associated. There is no variable increase, whereas a negative distinction BETWEEN the two CORRELATION indicates that as the values variables and no causation is of one variable increase the values of implied, simply ASSOCIATION . the other variable decrease.

Pearson's correlation coefficient r can only take values between –1 and +1; a value of +1 indicates perfect positive association (figure 1), a value of –1 indicates perfect negative association (figure 2), and a value of 0 indicates no linear association (figure 3). The easiest way to check whether it

Tags:

  Correlations, Pearson, Pearson s correlation

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of CORRELATION COEFFICIENT: ASSOCIATION BETWEEN TWO ...

1 TUTORIAL | SCOPE. CORRELATION COEFFICIENT: ASSOCIATION . BETWEEN TWO CONTINUOUS VARIABLES. Dr Jenny Freeman and Dr Tracey Young use statistics to calculate the CORRELATION coefficient: the ASSOCIATION BETWEEN two continuous variables Many statistical analyses can be variable is influencing the value of the undertaken to examine the relationship other variable; CORRELATION simply BETWEEN two continuous variables within measures the degree to which the two a group of subjects. Two of the main vary together. A positive CORRELATION purposes of such analyses are: indicates that as the values of one I To assess whether the two variables variable increase the values of the other are associated. There is no variable increase, whereas a negative distinction BETWEEN the two CORRELATION indicates that as the values variables and no causation is of one variable increase the values of implied, simply ASSOCIATION . the other variable decrease.

2 The I To enable the value of one variable standard method (often ascribed to to be predicted from any known pearson ) leads to a statistic called r, FIGURE 1. Perfect positive CORRELATION (r = +1). value of the other variable. One pearson 's CORRELATION coefficient. In variable is regarded as a response essence r is a measure of the scatter of to the other predictor (explanatory) the points around an underlying linear variable and the value of the trend: the closer the spread of points to predictor variable is used to predict a straight line the higher the value of the what the response would be. CORRELATION coefficient; the greater the For the first of these, the statistical spread of points the smaller the method for assessing the ASSOCIATION CORRELATION coefficient. Given a set of n BETWEEN two continuous variables is pairs of observations (x1, y1), (x2, y2), .. known as CORRELATION , whilst the (xn, yn) the formula for the pearson technique for the second, prediction of CORRELATION coefficient r is given by: one continuous variable from another, is known as regression.

3 CORRELATION and FIGURE 2. Perfect negative CORRELATION (r = 1). regression are often presented together and it is easy to get the impression that they are inseparable. In fact, they have distinct purposes and it is relatively rare that one is genuinely interested in Certain assumptions need to be met performing both analyses on the same for a CORRELATION coefficient to be valid as set of data. However, when preparing to outlined in Box 1. Both x and y must be analyse data using either technique it is continuous random variables (and always important to construct a scatter Normally distributed if the hypothesis plot of the values of the two variables test is to be valid). against each other. By drawing a scatter pearson 's CORRELATION coefficient r plot it is possible to see whether or not can only take values BETWEEN 1 and +1;. there is any visual evidence of a straight a value of +1 indicates perfect positive FIGURE 3.

4 No linear ASSOCIATION (r = 0). line or linear ASSOCIATION BETWEEN the ASSOCIATION (figure 1), a value of 1. two variables. indicates perfect negative ASSOCIATION This tutorial will deal with (figure 2), and a value of 0 indicates no CORRELATION , and regression will be the linear ASSOCIATION (figure 3). subject of a later tutorial. The easiest way to check whether it is valid to calculate a CORRELATION CORRELATION coefficient is to examine the scatterplot The CORRELATION coefficient is a measure of the data. This plot should be produced of the degree of linear ASSOCIATION as a matter of routine when CORRELATION BETWEEN two continuous variables, coefficients are calculated, as it will give when plotted together, how close to a a good indication of whether the straight line is the scatter of points. No relationship BETWEEN the two variables assumptions are made about whether is roughly linear and thus whether it is FIGURE 4.

5 The CORRELATION for this plot is It is heavily the relationship BETWEEN the two appropriate to calculate a CORRELATION influenced by the extreme cluster of four points away from variables is causal, whether one coefficient all. In addition, as the the main body. M. SCOPE | JUNE 09 | 31. SCOPE | TUTORIAL. M. CORRELATION coefficient is highly sensitive example, with 10 observations a to a few abnormal values, a scatterplot CORRELATION of is significant at the 5. will show whether this is the case, as per cent level, whereas with 150. illustrated in figures 4 and 5. observations a CORRELATION of is significant at the 5 per cent level. Figure 7. EXAMPLE illustrates this. Consider the heights and weights of 10 The statistical test is based on the test elderly men: statistic t = r / se(r) which under the null hypothesis follows a Students' t (173, 65), (165, 57), (173, 77), (183, 89), distribution on n 2 degrees of freedom (178, 93), (188, 73), (180, 83), (183, 86), and the confidence interval is given by: (163, 70), (178, 83).

6 The standard error of r =. FIGURE 5. The CORRELATION for this plot is close to 0. Whilst it Plotting these data indicates that, is clear that the relationship is not linear and so a unsurprisingly, there is a positive linear CORRELATION is not appropriate, it is also clear that there is a strong n-shaped relationship BETWEEN these two variables. relationship BETWEEN height and weight For the pearson CORRELATION coefficient (figure 6). The shorter a person is the above the standard error is , the t lower their weight and, conversely, the statistic is and the P-value is taller a person is the greater their weight. In order to examine whether WHEN NOT TO USE A. there is an ASSOCIATION BETWEEN these CORRELATION COEFFICIENT. two variables, the CORRELATION coefficient Whilst the CORRELATION coefficient is a can be calculated (table 1). In calculating useful measure for summarising how the CORRELATION coefficient, no two continuous variable are related, there assumptions are made about whether are certain situations when it should not the relationship is causal, whether be calculated, as has already been one variable is influencing the value of alluded to above.

7 As it measures the the other variable. linear ASSOCIATION BETWEEN two variables, Thus the pearson CORRELATION it should not be used when the coefficient for these data is , relationship is non-linear. Where outliers indicating a positive ASSOCIATION BETWEEN are present in the data, care should be height and weight. When calculating the taken when interpreting its value. It FIGURE 6. Plot of weight against height for 10 elderly men. CORRELATION coefficient it is assumed that should not be used when the values of at least one of the variables is Normally one of the variables are fixed in advance, distributed. If the data do not have a for example when measuring the Normal distribution, a non-parametric responses to different doses of a drug. CORRELATION coefficient, Spearman's rho Causation should not be inferred from a (rs), can be calculated. This is calculated CORRELATION coefficient. There are many in the same way as the pearson other criteria that need to be satisfied CORRELATION coefficient, except that the before causation can be concluded.

8 Data are ordered by size and given ranks Finally, just because two variables are (from 1 to n, where n is the total sample correlated at a particular range of values, size) and the CORRELATION is calculated it should not be assumed that the same using the ranks rather than the actual relationship holds for a different range. values. For the data above the Spearman CORRELATION coefficient is (table 2). SUMMARY. The square of the CORRELATION This tutorial has outlined how to coefficient gives the proportion of the construct the CORRELATION coefficient FIGURE 7A. CORRELATION of , P = , n = 10. variance of one variable explained by the BETWEEN two continuous variables. other. For the example above, the square However, CORRELATION simply quantifies of the CORRELATION coefficient is , the degree of linear ASSOCIATION (or not). indicating that about per cent of the BETWEEN two variables. It is often more variance of one variable is explained by useful to describe the relationship the other.

9 BETWEEN the two variables, or even predict a value of one variable for a given HYPOTHESIS TESTING value of the other and this is done using The null hypothesis is that the regression. If it is sensible to assume that CORRELATION coefficient is zero. However, one variable may be causing a response its significance level is influenced by the in the other then regression analysis number of observations and so it is should be used. If, on the other hand, worth being cautious when comparing there is doubt as to which variable is the correlations based on different sized causal one, it would be most sensible to samples. Even a very small CORRELATION use CORRELATION to describe the can be statistically significant if the relationship. Regression analysis will be FIGURE 7B. CORRELATION of , P = , n = 150. number of observations is large. For covered in a subsequent tutorial. 32 | JUNE 09 | SCOPE. TUTORIAL | SCOPE. TABLE 1.

10 Subject 1 173 65 2 165 57 3 174 77 4 183 89 5 178 93 6 188 73 7 180 83 8 182 86 9 163 70 10 179 82 Total 1765 775 TABLE 1. Calculation of pearson 's CORRELATION coefficient (r). 1765 / 10 = cm =775 / 10 = kg r = / ( * ) = TABLE 2. Subject Rank ( ) Rank ( ). 1 3 (173) 2 (65) 2 2 (165) 1 (57) 3 4 (174) 5 (77) 4 9 (183) 9 (89) 5 5 (178) 10 (93) 6 10 (188) 4 (73) 7 7 (180) 7 (83) 8 8 (182) 8 (86) 9 1 (163) 3 (70) 10 6 (179) 6 (82) Total 55 55 TABLE 2. Calculation of Spearman's rank CORRELATION coefficient (rs). (ranks) = 55 / 10 = (ranks) = 55 / 10 = rs = / ( * ) = BOX 1: The assumptions underlying the validity of the hypothesis test associated with the CORRELATION coefficient 1 The two variables are observed on a random sample of individuals. 2 The data for at least one of the variables should have a Normal distribution in the population. 3 For the calculation of a valid confidence interval for the CORRELATION coefficient both variables should have a Normal distribution.


Related search queries