  Dr. Mark Gardener Community Ecology: Analytical Methods Using R and Excel

# Writer's Bloc

On this page you can find out about my latest writing project. I'll post updates on progress, tables of contents and also some of the R scripts (and possibly Excel spreadsheets) I am developing in support of the new book. I'll try to keep the material reasonably up to date.

The Writer's Bloc homepage contains a table of contents and an index of the pages that contain custom R commands and R scripts.

Community Ecology: analytical Methods Using R and Excel

## Chapter 13. Association analysis: Identifying communities

### Species Association Analysis

This chapter deals with association analysis, that is determining if species are associated with one another in a positive or negative way. This is one way to split species into groups and start to define communities. Species that tend to be found together when you sample are likely to be from the same community. Species that tend to be found not together (i.e. in different quadrats or whatever) are likely to be from separate communities.

I can see that I am going to need to make some custom R commands for this chapter becuase the Chi Squared approach is not one that is covered in this context. The general chisq.test() command tests for general association between categories and goodness of fit tests. Here I need to look at species co-occurrence as a way to make a dissimilarity measure and so split species into clusters (communities). Also required will be a 1x1 comparison to look at the significane of the association between two species.

Chapter 13. Association Analysis: Identifying Communities

Chi squared species association

function: dist_chi()

Top

### Chi Squared species association

Species that tend to be found together are likely to be from the same community. Species found separately are likely to be from different communities. If you examine species co-occurrence you can use a chi squared approach to work out the associations between species. Usually you have lots of samples. Your data can be simple presence-absence or quantitative but essentially the chi squared approach boils down to presence-absence.

The vegan package contains a useful command designdist(), which allows you to make your own dissimilarity index. I used this to help compute the co-occurrence of species across multiple samples. The expected values can also be determined using the designdist() command. From that point it is a relatively easy matter to make a chi square dissimilarity, which can be used as the basis for an hierarchical clustering.

I made the result of my custom command take on a custom class "distchi" so that I could also make a plot and summary command. Here is the script I came up with:

```## Chi-Squared Association/Dissimilarity
## Mark Gardener 2013
## www.dataanalytics.org.uk```
`dist_chi <- function(data) {`
` # comm = community data, species as rows`
`require(vegan)`
``` # Make the dissimilarity matrices and do a chi.sq test
# Co-occurrence based on presence-absence
data.co <- vegan::designdist(data, method = "J", terms = "binary")
# Expected values (spA total * SpB total / Grand total)
data.exp <- vegan::designdist(data, method ="A*B/P", terms = "binary")
# Chi squared values
data.csq <- (data.co - data.exp)^2 / data.exp

data.resid <- (data.co - data.exp) / sqrt(data.exp) # Pearson residuals
csq.sum <- sum(data.csq)                         # Total chi sq value
csq.pval <- pchisq(sum(data.csq), df = nrow(data), lower.tail = FALSE) # P-value```
``` # rescale rediduals to be all +ve
data.pr <- data.resid + abs(min(data.resid)) # Makes all +ve
data.pr <- max(data.pr) - data.pr            # Converts similarity to dissimilarity```
``` # make results
result <- list(chi = data.csq, co.occur = data.co,
expected = data.exp, residuals = data.resid,
distance = data.pr, p.val = csq.pval,
data = deparse(substitute(data)),
spp = nrow(data))

class(result) <- "distchi" # custom class for plot and summary
invisible(result)               # Save result invisibly```
`  }`
`## END`
```## Plot Chi.Sq dissimilarity
## Mark Gardener 2013
## www.dataanalytics.org.uk```
`plot.distchi <- function(x, method = "complete", ...) {`
``` # x = result of dist_chi()
# method = hclust() method
# ... = other instructions to pass to plot()```
``` # match-up the hclust() joining method
hc.join <- c("ward", "single", "complete", "average",
"mcquitty", "median", "centroid")

hc.meth <- match.arg(method, hc.join) # set method for hclust()```
``` # make cluster object and plot
hc <- hclust(x\$distance, method = hc.meth)
plot(hc, ...)```
``` # Save result (can be used with rect.hclust() for example)
invisible(hc)```
` }`
`## END`
```## Summary of dist_chi()
## Mark Gardener 2013
## www.dataanalytics.org.uk```
`summary.distchi <- function(x) {`
` # x = result of dist_chi()`
```cat("\nAvailable components:\n")
print(names(x))
cat("\n")```
``` # result object
result = data.frame(Total.Chi = sum(x\$chi),
df = x\$spp,
p.val = x\$p.val)```
```rownames(result) <- x\$data
print(result)```
` }`
`## END`

The main command, dist_chi() calculates a dissimilarity matrix (class "dist"), which can be used to plot a dendrogram (the \$distance component). The result also contains the co-occurrence matrix and an overall chi squared test result. You can use the plot.distchi() command to make a hierarchical cluster, which is then plotted. You can choose the cluster joining algorithm by selecting one of the options (same as for hclust() command). You can add graphical commands to the plot.distchi() command. The results from dist_chi() don't display "nicely" because they are dist objects and usually will not fit the display. The summary.distchi() commmand produces a simple overview of the chi squared association test.

Top

Providing training for:

• Ecology
• Data analysis
• Statistics
• R The statistical programming language
• Data management
• Data mining     