Dr. Mark Gardener


Community Ecology Cover

Community Ecology: Analytical Methods Using R and Excel

Writer's Bloc

On this page you can find out about my latest writing project. I'll post updates on progress, tables of contents and also some of the R scripts (and possibly Excel spreadsheets) I am developing in support of the new book. I'll try to keep the material reasonably up to date.

The Writer's Bloc homepage contains a table of contents and an index of the pages that contain custom R commands and R scripts.

Community Ecology: analytical Methods Using R and Excel

Available now from
Pelagic Publishing

Pelagic Publishing

Chapter 13. Association analysis: Identifying communities

Species Association Analysis

This chapter deals with association analysis, that is determining if species are associated with one another in a positive or negative way. This is one way to split species into groups and start to define communities. Species that tend to be found together when you sample are likely to be from the same community. Species that tend to be found not together (i.e. in different quadrats or whatever) are likely to be from separate communities.

I can see that I am going to need to make some custom R commands for this chapter becuase the Chi Squared approach is not one that is covered in this context. The general chisq.test() command tests for general association between categories and goodness of fit tests. Here I need to look at species co-occurrence as a way to make a dissimilarity measure and so split species into clusters (communities). Also required will be a 1x1 comparison to look at the significane of the association between two species.

Chapter 13. Association Analysis: Identifying Communities

Chi squared species association

function: dist_chi()


Chi Squared species association

Species that tend to be found together are likely to be from the same community. Species found separately are likely to be from different communities. If you examine species co-occurrence you can use a chi squared approach to work out the associations between species. Usually you have lots of samples. Your data can be simple presence-absence or quantitative but essentially the chi squared approach boils down to presence-absence.

The vegan package contains a useful command designdist(), which allows you to make your own dissimilarity index. I used this to help compute the co-occurrence of species across multiple samples. The expected values can also be determined using the designdist() command. From that point it is a relatively easy matter to make a chi square dissimilarity, which can be used as the basis for an hierarchical clustering.

I made the result of my custom command take on a custom class "distchi" so that I could also make a plot and summary command. Here is the script I came up with:

## Chi-Squared Association/Dissimilarity
## Mark Gardener 2013
## www.dataanalytics.org.uk
dist_chi <- function(data) {
 # comm = community data, species as rows
 # Make the dissimilarity matrices and do a chi.sq test
 # Co-occurrence based on presence-absence
       data.co <- vegan::designdist(data, method = "J", terms = "binary")

# Expected values (spA total * SpB total / Grand total) data.exp <- vegan::designdist(data, method ="A*B/P", terms = "binary")
# Chi squared values data.csq <- (data.co - data.exp)^2 / data.exp data.resid <- (data.co - data.exp) / sqrt(data.exp) # Pearson residuals csq.sum <- sum(data.csq) # Total chi sq value csq.pval <- pchisq(sum(data.csq), df = nrow(data), lower.tail = FALSE) # P-value
 # rescale rediduals to be all +ve
       data.pr <- data.resid + abs(min(data.resid)) # Makes all +ve
       data.pr <- max(data.pr) - data.pr            # Converts similarity to dissimilarity
 # make results 
result <- list(chi = data.csq, co.occur = data.co,
               expected = data.exp, residuals = data.resid,
               distance = data.pr, p.val = csq.pval,
               data = deparse(substitute(data)),
               spp = nrow(data))

     class(result) <- "distchi" # custom class for plot and summary
invisible(result)               # Save result invisibly
## END
## Plot Chi.Sq dissimilarity
## Mark Gardener 2013
## www.dataanalytics.org.uk
plot.distchi <- function(x, method = "complete", ...) {
 # x = result of dist_chi()
 # method = hclust() method
 # ... = other instructions to pass to plot()
 # match-up the hclust() joining method
   hc.join <- c("ward", "single", "complete", "average",
                "mcquitty", "median", "centroid")

   hc.meth <- match.arg(method, hc.join) # set method for hclust()
 # make cluster object and plot
   hc <- hclust(x$distance, method = hc.meth)
  plot(hc, ...)
 # Save result (can be used with rect.hclust() for example)
## END
## Summary of dist_chi()
## Mark Gardener 2013
## www.dataanalytics.org.uk
summary.distchi <- function(x) {
 # x = result of dist_chi()
cat("\nAvailable components:\n")
 # result object
   result = data.frame(Total.Chi = sum(x$chi),
                       df = x$spp,
                       p.val = x$p.val)
rownames(result) <- x$data
## END

The main command, dist_chi() calculates a dissimilarity matrix (class "dist"), which can be used to plot a dendrogram (the $distance component). The result also contains the co-occurrence matrix and an overall chi squared test result. You can use the plot.distchi() command to make a hierarchical cluster, which is then plotted. You can choose the cluster joining algorithm by selecting one of the options (same as for hclust() command). You can add graphical commands to the plot.distchi() command. The results from dist_chi() don't display "nicely" because they are dist objects and usually will not fit the display. The summary.distchi() commmand produces a simple overview of the chi squared association test.


Providing training for:

  • Ecology
  • Data analysis
  • Statistics
  • R The statistical programming language
  • Data management
  • Data mining

Follow me...
Facebook Twitter Google+ Linkedin Amazon
Contact DataAnalytics Homepage