http://hughesbennett.co.uk/CorrelationAnalysis
©2012 Hughes Bennett Education
Hughes Bennett EducationQuestions by topic
Primary school
Secondary school
Computers & networks
Business economics

SMART subscriptions
Login
Subscribe
SMART Learning Method™

Popular software tools
Octave
Maxima
R Project
Graphviz
Context Free

All software tools
Bamboo Toolbox



Search

Terms & conditions
Privacy policy

Updated 2012-03-05 14:00:23
©2012 Hughes Bennett Education
Published using WikkaWiki

Correlation analysis using Pearson and Spearman coefficients


Correlation analysis can be used to examine whether two data variables change together in a consistent manner. This technique does not provide any information about whether one variable could be used to predict the other variable and it does not provide any indication about whether one variable is the cause of changes in the other variable.

For example, there is a correlation between the number of employees and the annual revenue of a company because both variables change together. However, the annual revenue is not determined or caused by the number of employees.

Linear correlation is measured by calculating the Pearson correlation coefficient. This coefficient is symbolized by r for a sample of data values and by the Greek letter ρ for a population. It is common practice to simply refer to this as the correlation coefficient.

The correlation coefficient varies between -1.00 and +1.00. An r value of 1 indicates a perfect positive linear correlation. This happens when the values of both variables increase together and their coordinates on a scatter plot form a straight line. An r value of -1 indicates a perfect negative linear correlation. This happens when the values of one variable increases while the other variable decreases and their coordinates on a scatter plot form a straight line. Values of r that are not zero show decreasing significance as they approach zero. The scatter plot of variables with r values not equal to 1 or -1 do not form a straight line.

An r value of zero indicates that there is no relationship between the two variables. Note that the correlation coefficient is only intended to detect linear relationships between variables that are normally distributed.

Common examples of correlation coefficients

image
Source: McKillup, S. (2005). Statistics explained: an introductory guide for life scientists. Cambridge, UK: Cambridge University Press.

In the diagram above, example (a) is a perfect linear correlation with r = 1 and (d) has a positive linear correlation with 0 <r< 1. Example (b) is a perfect linear correlation with r = -1 and (e) has a negative linear correlation with -1 <r< 0. Example (c) is not correlated with r= 0 and (g) has a non-linear correlation which cause r to be close to zero.

The Pearson correlation coefficient is calculated in an R script by using the cor command.

 

When the variables may not be normally distributed or that they do not appear to have a linear relationship, the Spearman's rank correlation is a more common test of correlation. This coefficient is symbolized by rs for a sample of data values and by the Greek letter ρs for a population. To calculate this coefficient, separately rank the values of each variable from lowest to highest. The Pearson correlation coefficient for the ranked values is the Spearman's rank correlation. Values of 1 or -1 occur more frequently for rs because it is more common for the ranked values of two variables to be in perfect agreement or disagreement than to find a straight line agreement for linear relationships.

Common examples of Spearman's rank correlation

image
Source: McKillup, S. (2005). Statistics explained: an introductory guide for life scientists. Cambridge, UK: Cambridge University Press.

The Spearman's rank correlation coefficient is calculated in an R script by using the cor and rank commands.

 

Average rating is