Dr. Mark Gardener

 
About

Statistics for Ecologists Edition 2 Cover
Available soon from
Pelagic Publishing


Statistics for Ecologists Using R and Excel (Edition 2)

Data Collection, Exploration, Analysis and Presentation

by: Mark Gardener

Available soon from Pelagic Publishing

Welcome to the support pages for Statistics for Ecologists. These pages provide information and support material for the book. You should be able to find an outline and table of contents as well as support datafiles and additional material.

Support Index | Exercises Index | Outline & TOC | Data files


 

Exercise 10.2a

Pelagic Publishing Logo

Table of Contents


It is recommended to adjust the result of the Kruskal-Wallis test when you have tied values (and so tied ranks).

Top

10.2b. Kruskal-Wallis test and tied values/ranks

These notes are concerned with the Kruskal-Wallis test (Section 10.2). The notes show you how to adjust your test statistic when you have tied values, and so tied ranks.

Introduction

The Kruskal-Wallis test is appropriate when you have non-parametric data and one predictor variable (Section 10.2). It it analogous to a one-way ANOVA but uses ranks of items in various groups to determine the likely significance.

When you have tied values, you will get tied ranks. In these circumstances you should apply a correction to your calculated test statistic. The notes show you how this can be done. The calculations are simple but in Excel it can be difficult to get the process "automated". In R the kruskal.test() command computes the adjustment for you.


The adjustment factor needs values for the number of ties and the total number of data items (replicates).

Top

Calculating the adjustment factor

In order to correct for tied ranks you first need to know which values are tied. Then you need to know how many ties there are for each rank value. Finally you'll need to know how many replicates there are in the dataset.

Once you've ascertained these things you can use the following formula to work out a correction, or adjustment, factor:

The adjustment statistic for Kruskal Wallis

In the formula t is the number of ties for each rank value. For each value of t you evaluate the t-cubed minus t part. This is then summed for all the tied values (values without ties can also be evaluated but 1^3 - 1 = 0). Once you have the numerator you work out the denominator using n, the number of replicates in the original dataset.

The final value of D is then 1 - your fraction.


The Kruskal-Wallis result is adjusted by dividing by the correction factor.

Top

Adjusting the KW test statistic

Once you have the value of D, the correction factor, you can use it to adjust the original Kruakal-Wallis test statistic (H).

Applying the adjustment to the Kruskal Wallis result

The correction is simple: H/D.

You then use the Hadj value in place of the original to determine the final test significance (using critical values tables).


Example data

When you see tied values you know there will be tied ranks.

Top

Example data

Have a look at the correction in action using the following example:

Upper Mid Lower
3 4 11
4 3 12
5 7 9
9 9 10
8 11 11
10  
9    

Here there are three samples and you can see that there are tied values, so you know there will be tied ranks.


Ranks:

Original values are replaced by their rank in the overall dataset.

The Kruskal-Wallis test evaluates differences in the rank sums between samples.

Top

Example ranks

The first step is to evaluate the ranks. Each value is replaced by its rank in the overall dataset.

Upper Mid Lower
1.5 3.5 15
3.5 1.5 17
5 6 9.5
9.5 9.5 12.5
7 15 15
12.5  
9.5    

The Kruskal-Wallis test looks at the sum of the ranks from each sample. If the ∑rank is different between samples there is a good chance that differences are statistically significant. If the ∑rank are close then differences are less likely to be significant.

In this example the rank sums are:

Upper Mid Lower
48.5 35.5 69

You can now calculate the Kruskal-Wallis test statistic, H.


The Kruskal-Wallis formula produces a test statistic, H.

Top

Original H value

Once you have the rank sums you can compute the Kruskal-Wallis test statistic:

Formula for the Kruskal Wallis test

The Kruskal–Wallis formula looks pretty horrendous but actually it is not that bad. The numbers 12 and 3 are constants. Uppercase N is the total number of observations. The R refers to the ranks of the observations in each sample and n is the number of observations per sample.

In the example the final value of the Kruskal-Wallis statistic works out to be H = 6.403.


It is not trivial to get Excel to work out tied ranks "automatically".

Use copy/paste to assemble ranks into a single column

Sort the ranks

Determine ties

Top

Tied ranks

The formula for working out the correction factor was given earlier. You need to work out, for each rank value, the number of repeats.

It is not easy to do this "automatically" using Excel. There are ways you could make a template to evaluate the ties for you but it is complicated. Since you can do it using a bit of copy and paste, this is what you'll see here.

Start by assembling the ranks into a single column. This means a bit of copy and paste but you'll probably need to use Paste Special to place the Values only.

Paste Button In Excel 2013
The Paste button in Excel 2013 allows you to paste values only

When you paste the ranks you only want the numbers and don't want the formula (which would give an error).

Once you have the column of ranks you can simply re-arrange them in ascending order (use the Home > Sort & Filter button or the Data > Sort button). In the following table the second column shows the sorted ranks (you don't need to make a second column, I have done this to show the effect).

Rnk Ord Ties T3-T
1.5 1.5
3.5 1.5 2 6
5 3.5
9.5 3.5 2 6
7 5
12.5 6
9.5 7
48.5 9.5
3.5 9.5
1.5 9.5
6 9.5 4 60
9.5 12.5
15 12.5 2 6
17 15
9.5 15
12.5 15 3 24
15 17
15 48.5

Once you have the ranks in order it is easy to work out the number of repeats by insepction. You can simply fill in the values as you work down the column. In the preceeding table the 3rd column shows the tied rank repeats. So, for example the rank 1.5 is repeated 2 times. The rank 9.5 has 4 repeats.

The final column shows the T^3 - T values. In other words, you take the number of repeats and cube it, then subtract the number of repeats.

Once you have these values you can sum them to get an overall value, in this case 102.


H-adj = H/D

Top

Final H adjustment

Once you have your final sum of T^3 -3 values you can work out the value of D using the formula given earlier. You need to know the total number of replicates in the original dataset (in this case 17).

The final value of D works out at: 0.9792.

The adjusted H value is then H/D = 6.403 / 0.9792 = 6.540.

You can now use the adjusted H-value to compare to critical value tables to see if your result is statistically significant.


Top
Support Index | Exercises Index | Outline & TOC | Data files
 
My Publications

My Publications

Managing Data Using Excel, Cover

See my personal pages at GardenersOwn

Follow me...
Facebook Twitter Google+ Linkedin Amazon

See also:

Writer's Bloc
MonogRaphs
Tips & Tricks

Keywords

Here is a list of keywords: it is by no means complete!

T-test, U-test, Kruskal-Wallis, Analysis of Variance, Spearman Rank, Correlation, Regression, Logistic Regression, Curved linear regression, histogram, scatter plot, bar chart, box-whisker plot, pie chart, Mean, Median, Mode, Standard Deviation, Standard Error, Range, Max, Min, Inter-quartile Range, IQR

Top DataAnalytics Home
Publications
Contact GardenersOwn Homepage