Dr. Mark Gardener 



Statistics for Ecologists Using R and Excel (Edition 2)Data Collection, Exploration, Analysis and Presentationby: Mark GardenerAvailable soon from Pelagic Publishing Welcome to the support pages for Statistics for Ecologists. These pages provide information and support material for the book. You should be able to find an outline and table of contents as well as support datafiles and additional material. Support Index  Exercises Index  Outline & TOC  Data files 

Exercise 8.3.2 

Table of Contents 

Use Excel for Spearman Rank correlation. Exercise data: 
8.3.2 Use Excel for Spearman's Rank correlationThis exercise is concerned with correlation (Chapter 8) and in particular how you can use Excel to calculate Spearman's Rank correlation coefficient. IntroductionExcel has builtin functions that can calculate correlation, but only when data are normally distributed. The CORREL and PEARSON functions both calculate Pearson's Product Moment, a correlation coefficient. Once you have the correlation coefficient it is fairly easy to calculate the statistical significance. You compute a tvalue then use TINV to compute a critical value or TDIST to get an exact pvalue (see Section 8.3.1 in the book). This exercise shows how you can use the RANK.AVG fuction to rank the data, then use CORREL on the ranks to obtain a Spearman Rank coefficient. The data for this exercise are freshwater correlation.xlsx, which look like this:
These data represent the abundance of a freshwater invertebrate and water speed. 

Use CORREL to calculate regular correlation between two normally distributed variables. 
Pearson correlationTo get the regular Pearson correlation you can use the CORREL (or PEARSON) function. In the spreadsheet the data are in columns B and C. To get the correlation coefficient the function required is:
The result (0.614) is the "regular" parametric correlation coefficient, which is based on the normal distribution. The PEARSON function gives an identical result (Pearson's Product Moment). Regression is another term used for correlation with parametric (normally distributed or Gaussian) data. The correlation coefficient, r, between two normally distributed variables is calculated like so:
Once you have r you can determine a critical value or exact pvalue. Use Pearson's correlation when you have normally distributed data and when you expect there to be a linear relationship between the two variables. 

Use RANK.AVG to rank x and y variables to use in Spearman Rank correlation 
Spearman's Rank correlationSpearman's Rank correlation is used to determine the strenght of the relationship between two numerical variables. The data do not have to be normally distributed and the relationship does not have to be strictly linear either. The relationship should be one where the variables are generally headed in one direction though (so not Ushaped or the inverse). As the name suggests, the correlation is based on the ranks of the data. The x and y variables are ranked and the ranks of x are compared to the ranks of y like so:
In the formula D is the difference between ranks. There is no builtin function to work out Spearman Rank correlation but you can work out the ranks using RANK.AVG and calculate the terms in the formula easily enough. 

Use RANK.AVG to rank x and y variables to use in Spearman Rank correlation 
Rank the dataThe RANK.AVG function allows you to rank data (note the older RANK function is not the same). In statistical analysis you want the lowest value to have the smallest rank. Tied values should get a tied rank. To try this out using the example data do the following:
If you wanted to fill out the Spearman Rank formula "the long way" you would now have to subtract each rank in column E from its counterpart in column D. Then square the differences. However, there is a quicker way. 

Use CORREL on ranks to approximate Spearman Rank correlation coefficient. 
Compute RsNow you have ranks you can use the CORREL (or PEARSON) function on the ranks to get a close approximation of the Spearman Rank correlation coefficient. It is usually the same to at least 2 decimal places.
The correlation between the ranks is a close approximation to the Spearman Rank coefficient (0.773) computed the "long way". 

Use the correlation coefficient and number of data items to calculate a tvalue. Use the tvalue to determine statistical significance. 
Get a tvalueYou can compare your calculated Spearman Rank coefficient to a table of critical values (e.g. Table 8.5 in the book) or compute a tvalue (another approximation). To get a value for t you need the correlation coefficient and the number of pairs of values. These then go into the following formula:
The df in the formula is degrees of freedom and is the number of pairs of data minus 2. In the example datafile:
Once you have a value for t you are on your way to determining the statistical significance. 

Use TINV to get a critical value for t. Use TDIST to get an exact pvalue. 
Determine significanceNow you have a value for t you can:
The TINV function requires the significance level (usually 0.05) and the degrees of freedom. The TDIST function requires a tvalue, the degrees of freedom and a 2 (for a twotailed test, which is most often what you need).
You can compare your calculated value of t to the critical value and simply say that your correlation is significant at p < 0.05 or you can use the exact pvalue (of 0.009). This method of calculating Spearman Rank gives a good approximation of the correlation coefficient but the pvalues are a little conservative. This is the method that the base distribution of R uses in its cor.test() command when method="spearman". 

Top  
My Publications  My Publications See my personal pages at GardenersOwn 

Follow me... 


See also: 
KeywordsHere is a list of keywords: it is by no means complete! Ttest, Utest, KruskalWallis, Analysis of Variance, Spearman Rank, Correlation, Regression, Logistic Regression, Curved linear regression, histogram, scatter plot, bar chart, boxwhisker plot, pie chart, Mean, Median, Mode, Standard Deviation, Standard Error, Range, Max, Min, Interquartile Range, IQR 

Top  DataAnalytics Home  Contact  GardenersOwn Homepage 