Dr. Mark Gardener 



Statistics for Ecologists Using R and Excel (Edition 2)Data Collection, Exploration, Analysis and Presentationby: Mark GardenerAvailable soon from Pelagic Publishing Welcome to the support pages for Statistics for Ecologists. These pages provide information and support material for the book. You should be able to find an outline and table of contents as well as support datafiles and additional material. Support Index  Exercises Index  Outline & TOC  Data files 

Exercise 10.2a 

Table of Contents 

It is recommended to adjust the result of the KruskalWallis test when you have tied values (and so tied ranks). 
10.2b. KruskalWallis test and tied values/ranksThese notes are concerned with the KruskalWallis test (Section 10.2). The notes show you how to adjust your test statistic when you have tied values, and so tied ranks. IntroductionThe KruskalWallis test is appropriate when you have nonparametric data and one predictor variable (Section 10.2). It it analogous to a oneway ANOVA but uses ranks of items in various groups to determine the likely significance. When you have tied values, you will get tied ranks. In these circumstances you should apply a correction to your calculated test statistic. The notes show you how this can be done. The calculations are simple but in Excel it can be difficult to get the process "automated". In R the kruskal.test() command computes the adjustment for you. 

The adjustment factor needs values for the number of ties and the total number of data items (replicates). 
Calculating the adjustment factorIn order to correct for tied ranks you first need to know which values are tied. Then you need to know how many ties there are for each rank value. Finally you'll need to know how many replicates there are in the dataset. Once you've ascertained these things you can use the following formula to work out a correction, or adjustment, factor: In the formula t is the number of ties for each rank value. For each value of t you evaluate the tcubed minus t part. This is then summed for all the tied values (values without ties can also be evaluated but 1^3  1 = 0). Once you have the numerator you work out the denominator using n, the number of replicates in the original dataset. The final value of D is then 1  your fraction. 

The KruskalWallis result is adjusted by dividing by the correction factor. 
Adjusting the KW test statisticOnce you have the value of D, the correction factor, you can use it to adjust the original KruakalWallis test statistic (H). The correction is simple: H/D. You then use the Hadj value in place of the original to determine the final test significance (using critical values tables). 

Example data When you see tied values you know there will be tied ranks. 
Example dataHave a look at the correction in action using the following example:
Here there are three samples and you can see that there are tied values, so you know there will be tied ranks. 

Ranks: Original values are replaced by their rank in the overall dataset. The KruskalWallis test evaluates differences in the rank sums between samples. 
Example ranksThe first step is to evaluate the ranks. Each value is replaced by its rank in the overall dataset.
The KruskalWallis test looks at the sum of the ranks from each sample. If the ∑rank is different between samples there is a good chance that differences are statistically significant. If the ∑rank are close then differences are less likely to be significant. In this example the rank sums are:
You can now calculate the KruskalWallis test statistic, H. 

The KruskalWallis formula produces a test statistic, H. 
Original H valueOnce you have the rank sums you can compute the KruskalWallis test statistic: The Kruskal–Wallis formula looks pretty horrendous but actually it is not that bad. The numbers 12 and 3 are constants. Uppercase N is the total number of observations. The R refers to the ranks of the observations in each sample and n is the number of observations per sample. In the example the final value of the KruskalWallis statistic works out to be H = 6.403. 

It is not trivial to get Excel to work out tied ranks "automatically". Use copy/paste to assemble ranks into a single column Sort the ranks Determine ties 
Tied ranksThe formula for working out the correction factor was given earlier. You need to work out, for each rank value, the number of repeats. It is not easy to do this "automatically" using Excel. There are ways you could make a template to evaluate the ties for you but it is complicated. Since you can do it using a bit of copy and paste, this is what you'll see here. Start by assembling the ranks into a single column. This means a bit of copy and paste but you'll probably need to use Paste Special to place the Values only.
When you paste the ranks you only want the numbers and don't want the formula (which would give an error). Once you have the column of ranks you can simply rearrange them in ascending order (use the Home > Sort & Filter button or the Data > Sort button). In the following table the second column shows the sorted ranks (you don't need to make a second column, I have done this to show the effect).
Once you have the ranks in order it is easy to work out the number of repeats by insepction. You can simply fill in the values as you work down the column. In the preceeding table the 3rd column shows the tied rank repeats. So, for example the rank 1.5 is repeated 2 times. The rank 9.5 has 4 repeats. The final column shows the T^3  T values. In other words, you take the number of repeats and cube it, then subtract the number of repeats. Once you have these values you can sum them to get an overall value, in this case 102. 

Hadj = H/D 
Final H adjustmentOnce you have your final sum of T^3 3 values you can work out the value of D using the formula given earlier. You need to know the total number of replicates in the original dataset (in this case 17). The final value of D works out at: 0.9792. The adjusted H value is then H/D = 6.403 / 0.9792 = 6.540. You can now use the adjusted Hvalue to compare to critical value tables to see if your result is statistically significant. 

Top  
My Publications  
Follow me... 


See also: 
KeywordsHere is a list of keywords: it is by no means complete! Ttest, Utest, KruskalWallis, Analysis of Variance, Spearman Rank, Correlation, Regression, Logistic Regression, Curved linear regression, histogram, scatter plot, bar chart, boxwhisker plot, pie chart, Mean, Median, Mode, Standard Deviation, Standard Error, Range, Max, Min, Interquartile Range, IQR 

Top  DataAnalytics Home  Contact  GardenersOwn Homepage 