Transcription of CORRELATION COEFFICIENT: ASSOCIATION BETWEEN TWO ...
1 TUTORIAL | SCOPE. CORRELATION COEFFICIENT: ASSOCIATION . BETWEEN TWO CONTINUOUS VARIABLES. Dr Jenny Freeman and Dr Tracey Young use statistics to calculate the CORRELATION coefficient: the ASSOCIATION BETWEEN two continuous variables Many statistical analyses can be variable is influencing the value of the undertaken to examine the relationship other variable; CORRELATION simply BETWEEN two continuous variables within measures the degree to which the two a group of subjects. Two of the main vary together. A positive CORRELATION purposes of such analyses are: indicates that as the values of one I To assess whether the two variables variable increase the values of the other are associated.
2 There is no variable increase, whereas a negative distinction BETWEEN the two CORRELATION indicates that as the values variables and no causation is of one variable increase the values of implied, simply ASSOCIATION . the other variable decrease. The I To enable the value of one variable standard method (often ascribed to to be predicted from any known Pearson) leads to a statistic called r, FIGURE 1. Perfect positive CORRELATION (r = +1). value of the other variable. One Pearson's CORRELATION coefficient. In variable is regarded as a response essence r is a measure of the scatter of to the other predictor (explanatory) the points around an underlying linear variable and the value of the trend: the closer the spread of points to predictor variable is used to predict a straight line the higher the value of the what the response would be.
3 CORRELATION coefficient; the greater the For the first of these, the statistical spread of points the smaller the method for assessing the ASSOCIATION CORRELATION coefficient. Given a set of n BETWEEN two continuous variables is pairs of observations (x1, y1), (x2, y2), .. known as CORRELATION , whilst the (xn, yn) the formula for the Pearson technique for the second, prediction of CORRELATION coefficient r is given by: one continuous variable from another, is known as regression. CORRELATION and FIGURE 2. Perfect negative CORRELATION (r = 1).
4 Regression are often presented together and it is easy to get the impression that they are inseparable. In fact, they have distinct purposes and it is relatively rare that one is genuinely interested in Certain assumptions need to be met performing both analyses on the same for a CORRELATION coefficient to be valid as set of data. However, when preparing to outlined in Box 1. Both x and y must be analyse data using either technique it is continuous random variables (and always important to construct a scatter Normally distributed if the hypothesis plot of the values of the two variables test is to be valid).
5 Against each other. By drawing a scatter Pearson's CORRELATION coefficient r plot it is possible to see whether or not can only take values BETWEEN 1 and +1;. there is any visual evidence of a straight a value of +1 indicates perfect positive FIGURE 3. No linear ASSOCIATION (r = 0). line or linear ASSOCIATION BETWEEN the ASSOCIATION (figure 1), a value of 1. two variables. indicates perfect negative ASSOCIATION This tutorial will deal with (figure 2), and a value of 0 indicates no CORRELATION , and regression will be the linear ASSOCIATION (figure 3).
6 Subject of a later tutorial. The easiest way to check whether it is valid to calculate a CORRELATION CORRELATION coefficient is to examine the scatterplot The CORRELATION coefficient is a measure of the data. This plot should be produced of the degree of linear ASSOCIATION as a matter of routine when CORRELATION BETWEEN two continuous variables, coefficients are calculated, as it will give when plotted together, how close to a a good indication of whether the straight line is the scatter of points. No relationship BETWEEN the two variables assumptions are made about whether is roughly linear and thus whether it is FIGURE 4.
7 The CORRELATION for this plot is It is heavily the relationship BETWEEN the two appropriate to calculate a CORRELATION influenced by the extreme cluster of four points away from variables is causal, whether one coefficient all. In addition, as the the main body. M. SCOPE | JUNE 09 | 31. SCOPE | TUTORIAL. M. CORRELATION coefficient is highly sensitive example, with 10 observations a to a few abnormal values, a scatterplot CORRELATION of is significant at the 5. will show whether this is the case, as per cent level, whereas with 150.
8 Illustrated in figures 4 and 5. observations a CORRELATION of is significant at the 5 per cent level. Figure 7. EXAMPLE illustrates this. Consider the heights and weights of 10 The statistical test is based on the test elderly men: statistic t = r / se(r) which under the null hypothesis follows a Students' t (173, 65), (165, 57), (173, 77), (183, 89), distribution on n 2 degrees of freedom (178, 93), (188, 73), (180, 83), (183, 86), and the confidence interval is given by: (163, 70), (178, 83). The standard error of r =. FIGURE 5.
9 The CORRELATION for this plot is close to 0. Whilst it Plotting these data indicates that, is clear that the relationship is not linear and so a unsurprisingly, there is a positive linear CORRELATION is not appropriate, it is also clear that there is a strong n-shaped relationship BETWEEN these two variables. relationship BETWEEN height and weight For the Pearson CORRELATION coefficient (figure 6). The shorter a person is the above the standard error is , the t lower their weight and, conversely, the statistic is and the P-value is taller a person is the greater their weight.
10 In order to examine whether WHEN NOT TO USE A. there is an ASSOCIATION BETWEEN these CORRELATION COEFFICIENT. two variables, the CORRELATION coefficient Whilst the CORRELATION coefficient is a can be calculated (table 1). In calculating useful measure for summarising how the CORRELATION coefficient, no two continuous variable are related, there assumptions are made about whether are certain situations when it should not the relationship is causal, whether be calculated, as has already been one variable is influencing the value of alluded to above.