Dr. Mark Gardener 


Statistics for Ecologists Using R and Excel. Edition 2. Statistics for Ecologists Using R and Excel: Get a 20% discount on "Statistics for Ecologists" when you buy direct from the publisher! Enter the voucher code S4E20 in the shopping basket at Pelagic Publishing. 
Writer's BlocOn this page you can find out about my latest writing project. I'll post updates on progress, tables of contents and also some of the R scripts (and possibly Excel spreadsheets) I am developing in support of the new book. I'll try to keep the material reasonably up to date. The Writer's Bloc homepage contains a table of contents and an index of the pages that contain custom R commands and R scripts.
I am working on a new edition of my book Statistics for Ecologists Using R and Excel. I am currently revising the chapter on exploring differences between more than two samples. These notes are about post hoc analysis when you use the nonparametric KruskalWallis test.
Post Hoc testing for KruskalWallis analysisIntroductionThe KruskalWallis test is a nonparametric test for differences between more than two samples. It is essentially an analogue for a oneway anova. There is no "standard" method for carrying out post hoc analysis for KW tests. These notes show you how you can use a modified form of the Utest to carry out post hoc analysis. When you carry out a KruskalWallis test you are looking at the ranks of the data and comparing them. If the ranks are sufficiently different between samples you may be able to determine that these differences are statistically significant. However, the main analysis only tells you about the overall situation, the result tells you that "there are differences". Look at the following graph, which shows three samples. 

A KruskalWallis test of these data gives a significant result, H = 6.54 p < 0.05, but does not give any information about the pair by pair comparisons. Looking at the graph you might suppose that the Upper and Mid samples are perhaps not significantly different as their error bars (IQR) overlap considerably. The Lower sample appears perhaps to be different from the other two. The way to determine these pairwise differences is with a post hoc test. You cannot simply carry out regular Utests because you'll need to carry out at least 3 and this "boosts" the chances of you getting a significant result by a factor of 3. You could simply adjust the pvalues (e.g. using a Bonferroni correction), but this is generally too conservsative. These notes show you how to carry out a modified version of the Utest as a post hoc tool. The approach is analogous to the Tukey HSD test you'd use with parametric data. 

Use a modified version of the Utest to calculate a critical value using Q, the Studentized Range. 
A modified Utest as a posthoc toolWith a bit of tinkering you can modify the formula for the U test to produce the following:
In the formula n is the harmonic mean of the sample sizes being compared. and Q is the value of the Studentized Range for df = Inf, and number of groups equal to the original number of samples.
The formula calculates a critical Uvalue for the pairwise comparison. You simply carry out a regular Utest, then use the largest Uvalue as a test statistic. If your value is equal or larger than the critical value from the formula then the pairwise comparison is a statistically significant one. 

Harmonic mean is a way to get an "average" sample size. 
Harmonic meanThe harmonic mean is easy to determine:
The harmonic mean is a way of overcoming differences in sample size. The more different the sample sizes the more unreliable this approach will be. 

Calculate Q, the Studentized Range from the result of a Utest. R can compute an exact pvalue from Q. 
Calculate Q directlyIf you rearrange the formula you can calculate a value for Q:
You can now use the result of a Utest (use the larger of the two calculated Uvalues) to work out a value for Q. Now you can compare your Qvalue to the critical value. The Studentized range is a distribution built into the basic R distro. This gives you a way to compute an exact pvalue for the pairwise comparison. 

Custom functions for KruskalWallis post hoc available in the file: 
Custom R functions for KruskalWallis post hocI've produced four custom functions for use with KruskalWallis post hoc tests:
These functions are contained in a single file: KW posthoc.R. If you source() the file you will setup the functions and see a message giving some idea of what the functions do. 

h.mean() calculates the harmonic mean of two values. 
The h.mean() function
This function simply returns the harmonic mean of two numbers, i.e. the sample sizes of two samples. > h.mean(5, 7) The function is called by the other post hoc functions (and is builtin) but it might be "handy" to have separately. 

KW.post() calculates the post hoc significance between two samples from a larger dataset. Results include: 
The KW.post() function
The function returns several results as a list:
The function also displays the results to the console, even if you assign the result to a named object. > hog3
Upper Mid Lower
1 3 4 11
2 4 3 12
3 5 7 9
4 9 9 10
5 8 11 11
6 10 NA NA
7 9 NA NA
> KW.post(Upper, Lower, data = hog3)
Data: hog3 Pairwize comparison of: Upper and Lower Uvalue: 32.5 Ucrit (95%): 31.06011 Posthoc pvalue: 0.02641968 If your data are in scientific recording layout, that is you have a response variable and a predictor variable, then you need a slightly different approach. You'll have to work out a Utest result first, then run the KWp.post() and/or KWu.post() commands (see below). 

KWp.post() calculates an exact pvalue given a Uvalue. You supply the original number of groups and the sample sizes too. 
The KWp.post() function
This function returns an exact pvalue for a post hoc analysis. The value is returned immediately. > KWp.post(18, grp = 3, 7, 5) Posthoc pvalue: 0.9851855 You can carry out a wilcox.test() on two samples to obtain a Uvalue. You need to know samples sizes because you need the larger of the two U values. The wilcox.test() only gives one value so you must work out if the value you got was the largest. It so happens that:
This means that you can work out the alternative Uvalue easily if you know sample sizes. If your data are in recording layout, with a predictor and response, you'll need to use the subset parameter to carry out the pairwise test: > hog2 count site 1 3 Upper 2 4 Upper 3 5 Upper 4 9 Upper 5 8 Upper 6 10 Upper 7 9 Upper 8 4 Mid 9 3 Mid 10 7 Mid 11 9 Mid 12 11 Mid 13 11 Lower 14 12 Lower 15 9 Lower 16 10 Lower 17 11 Lower > wilcox.test(count ~ site, data = hog2, subset = site %in% c("Upper", "Lower")) Wilcoxon rank sum test with continuity correction data: count by site W = 32.5, pvalue = 0.01732 alternative hypothesis: true location shift is not equal to 0 Check the sample sizes: > replications(count ~ site, data = hog2) Now you can see if the Uvalue you got was the largest: > 5*7  32.5 Since it is the largest, you can use it in the KWp.post() function: > KWp.post(32.5, grp = 3, 5, 7) Posthoc pvalue: 0.02641968 Generally speaking the wilcox.test() will return the largest Uvalue if you use the response ~ predictor format for the command. If you run the command on separate samples the returned Uvalue will depend on the order you specify the samples. 

KWu.post() calculates a critical value for U at a given confidence interval. You supply the original number of groups and the sample sizes too. You can compare a pairwise Utest result to your critical value. 
The KWu.post() function
This function returns a Uvalue for a given confidence level. You supply the number of groups (samples) in the original KruskalWallis test and the sizes of the two samples being compared. The result is a critical value, which means you can carry out the wilcox.test() and compare the resulting Uvalue to this critical value. > KWu.post(CI = c(0.95, 0.99), grp = 3, n1 = 5, n2 = 5) Posthoc critical U value: 23.71961 26.44729 In the example you see that you can set multiple confidence intervals. Here the critical U values for p = 0.05 and p = 0.01 are returned. 

Top 


Providing training for:


Follow me... 

Top  Contact  DataAnalytics Homepage 