Dr. Mark Gardener

Available soon from
Pelagic Publishing

# Statistics for Ecologists Using R and Excel (Edition 2)

### by: Mark Gardener

Available soon from Pelagic Publishing

Welcome to the support pages for Statistics for Ecologists. These pages provide information and support material for the book. You should be able to find an outline and table of contents as well as support datafiles and additional material.

## Exercise 8.3.2

Use Excel for Spearman Rank correlation.

Exercise data:
freshwater correlation.xlsx

Top

### 8.3.2 Use Excel for Spearman's Rank correlation

This exercise is concerned with correlation (Chapter 8) and in particular how you can use Excel to calculate Spearman's Rank correlation coefficient.

#### Introduction

Excel has built-in functions that can calculate correlation, but only when data are normally distributed. The CORREL and PEARSON functions both calculate Pearson's Product Moment, a correlation coefficient. Once you have the correlation coefficient it is fairly easy to calculate the statistical significance. You compute a t-value then use TINV to compute a critical value or TDIST to get an exact p-value (see Section 8.3.1 in the book).

This exercise shows how you can use the RANK.AVG fuction to rank the data, then use CORREL on the ranks to obtain a Spearman Rank coefficient.

The data for this exercise are freshwater correlation.xlsx, which look like this:

 Abund Speed 12 18 18 12 17 12 14 7 9 8 7 9 6 6 7 11 5 6 3 2

These data represent the abundance of a freshwater invertebrate and water speed.

Use CORREL to calculate regular correlation between two normally distributed variables.

Top

### Pearson correlation

To get the regular Pearson correlation you can use the CORREL (or PEARSON) function. In the spreadsheet the data are in columns B and C. To get the correlation coefficient the function required is:

=CORREL(B2:B11,C2:C11)

The result (0.614) is the "regular" parametric correlation coefficient, which is based on the normal distribution. The PEARSON function gives an identical result (Pearson's Product Moment). Regression is another term used for correlation with parametric (normally distributed or Gaussian) data. The correlation coefficient, r, between two normally distributed variables is calculated like so:

The formula for Pearson's Product Moment (basic correlation or regression coefficient).

Once you have r you can determine a critical value or exact p-value.

Use Pearson's correlation when you have normally distributed data and when you expect there to be a linear relationship between the two variables.

Use RANK.AVG to rank x and y variables to use in Spearman Rank correlation

Top

### Spearman's Rank correlation

Spearman's Rank correlation is used to determine the strenght of the relationship between two numerical variables. The data do not have to be normally distributed and the relationship does not have to be strictly linear either. The relationship should be one where the variables are generally headed in one direction though (so not U-shaped or the inverse).

As the name suggests, the correlation is based on the ranks of the data. The x and y variables are ranked and the ranks of x are compared to the ranks of y like so:

The formula to calculate Spearman'r Rank correlation.

In the formula D is the difference between ranks.

There is no built-in function to work out Spearman Rank correlation but you can work out the ranks using RANK.AVG and calculate the terms in the formula easily enough.

Use RANK.AVG to rank x and y variables to use in Spearman Rank correlation

Top

#### Rank the data

The RANK.AVG function allows you to rank data (note the older RANK function is not the same). In statistical analysis you want the lowest value to have the smallest rank. Tied values should get a tied rank.

To try this out using the example data do the following:

1. Open the freshwater correlation.xlsx spreadsheet. The variables are in columns B & C.
2. In cell D1 type a label, RA, for the heading of the Ranks of the Abund variable.
3. In E1 type a label, RS, for the ranks of the Speed variable.
4. Now in D2 type a formula to get the rank of the first datum (in cell B2): =RANK.AVG(B2,B\$2:B\$11,1). Note the \$ used to "fix" the rows references. The final 1 ensures the ranks are ascending order.
5. Copy the formula down the rest of the column to fill cells D2:D11.
6. Copy the cells in D2:D11 across to column E to get ranks in cells E2:E11.

If you wanted to fill out the Spearman Rank formula "the long way" you would now have to subtract each rank in column E from its counterpart in column D. Then square the differences.

However, there is a quicker way.

Use CORREL on ranks to approximate Spearman Rank correlation coefficient.

Top

#### Compute Rs

Now you have ranks you can use the CORREL (or PEARSON) function on the ranks to get a close approximation of the Spearman Rank correlation coefficient. It is usually the same to at least 2 decimal places.

1. In cell A13 type a label, Correl, for the correlation coefficient(s).
2. In B13 type a formula to work out Pearson's correlation coefficent for the original data: =CORREL(B2:B11,C2:C11). The result is 0.614 for the example data.
3. In D13 type a formula to work out the correlation between the ranks (i.e. columns D & E): =CORREL(D2:D11,E2:E11). The result is 0.771.

The correlation between the ranks is a close approximation to the Spearman Rank coefficient (0.773) computed the "long way".

Use the correlation coefficient and number of data items to calculate a t-value.

Use the t-value to determine statistical significance.

Top

#### Get a t-value

You can compare your calculated Spearman Rank coefficient to a table of critical values (e.g. Table 8.5 in the book) or compute a t-value (another approximation).

To get a value for t you need the correlation coefficient and the number of pairs of values. These then go into the following formula:

Calculating t from a correlation coefficient, df = n - 2

The df in the formula is degrees of freedom and is the number of pairs of data minus 2. In the example datafile:

1. In cell A14 type a label, n, for the number of pairs of data.
2. In B14 type a formula to work out the number of items you have: =COUNT(B2:B11). You get 10.
3. In A15 type a label, df, for the degrees of freedom.
4. In B15 type a formula to work out degrees of freedom: =B14-2. You get 8.
5. In A16 type a label, t, for the t-value.
6. In B16 type a formula to work out a t-value: =SQRT(D13^2*B15/(1-D13^2)). The result is 3.420.

Once you have a value for t you are on your way to determining the statistical significance.

Use TINV to get a critical value for t.

Use TDIST to get an exact p-value.

Top

#### Determine significance

Now you have a value for t you can:

• Get a critical value using the TINV function.
• Get an exact probability (a p-value) using the TDIST function.

The TINV function requires the significance level (usually 0.05) and the degrees of freedom.

The TDIST function requires a t-value, the degrees of freedom and a 2 (for a two-tailed test, which is most often what you need).

1. In cell A17 type a label, t-crit, for the critical value.
2. In B17 type a formula to calculate the critical t value: =TINV(0.05,B15). The result is 2.306.
3. In A18 type a label, p-value, for the exact probability.
4. In B18 type a formula to calculate the exact p-value: =TDIST(B16,B15,2). The result is 0.009.

You can compare your calculated value of t to the critical value and simply say that your correlation is significant at p < 0.05 or you can use the exact p-value (of 0.009).

This method of calculating Spearman Rank gives a good approximation of the correlation coefficient but the p-values are a little conservative. This is the method that the base distribution of R uses in its cor.test() command when method="spearman".

Top

My Publications

My Publications

See my personal pages at GardenersOwn