  Dr. Mark Gardener

# Statistics for Ecologists Using R and Excel (Edition 2)

### by: Mark Gardener

Available now from Pelagic Publishing

Welcome to the support pages for Statistics for Ecologists. These pages provide information and support material for the book. You should be able to find an outline and table of contents as well as support datafiles and additional material. ## Exercises & supplementary notes

These exercises and supplementary notes provide a few additional details that I thought would be useful. The exercises are listed with some notes on this page. Use the Quick Index to view a summary of each exercise/note or the Links to exercises/notes to go directly to the individual exercise/notes page (the sidebar also has direct links).

Index of Exercises & Notes

Top

### Links to exercises/notes

• Chapter 3
• Chapter 6
• Chapter 7
• Chapter 8
• Chapter 9
• Chapter 10
• Chapter 11
• Chapter 12

Top The exercises & examples are listed more or less in the order they appear in the book. At the time of writing (early 2016) these are referred to in the book text. However, I will probably add additional notes and exercises from time to time so please look back from time to time.
Chapter 3
Data exploration – using software tools

## Chapter 3

The following exercises and notes relate to Chapter 3, Beginning data exploration – using software tools.

Section 3.3

How to deal with an "incomplete final line" error message

Goto Exercise 3.3

Top

### 3.3 Getting data from Excel into R

These notes relate to Section 3.3, which is about how to export data from Excel and import to R. The notes show you what to do if you get an "incomplete final line" error message when importing a CSV file into R. Goto Exercise 3.3.

#### Introduction

R can read CSV files using the read.csv() command. You can easily make CSV files from Excel spreadsheets. Occasionally you can get an error "incomplete final line", when you are using the read.csv() command. The error arises because sometimes Excel does not add a complete linefeed at the end of the final data row. You have two options:

• Edit the CSV file in a text editor and press Enter at the end of the last line.
• Use the cat() command to append a newline character.

The notes (part of the Tips & Tricks series) show you how to achieve this.

Chapter 6
Exploring data using graphs

## Chapter 6

The following exercises & notes relate to Chapter 6, exploring data using graphs.

Section 6.2.1

Make a Tally plot in Excel in lieu of a histogram.

Goto Exercise 6.2.1

Top

### 6.2.1 Histograms & tally plots

These notes relate to Chapter 6, exploring data using graphs. More specifically to Section 6.2.1 Exploratory graphs in Excel. The notes show how you can make a "quick and dirty" plot in Excel, a Tally plot. Goto Exercise 6.2.1.

#### Introduction

Data distribution is important. You need to know the shape of your data so that you can determine the best:

• Summary statistics
• Analytical routines

Some statistical tests use the properties of the normal (Gaussian) distribution, whilst others use data ranks. So, it's important to know if your data are normally distributed or otherwise.

The classic way to look at the shape of your data is with a histogram, showing the frequency of observations that fall into certain size classes (called bins). However, you can make a "quick and dirty" histogram using pencil and paper, a tally plot. The tally plot is useful as it is something you can do in a notebook in the field as you collect data.

These notes show how you can make a tally plot in Excel.

Section 6.2.2

Make a Tally plot in R using the hist() command.

Goto Exercise 6.2.2

Top

### 6.2.2 Tally plots in R

These notes relate to Chapter 6, exploring data using graphs. More specifically to Section 6.2.2 Exploratory graphs using R. The notes show how you can coerce the hist() command to produce a histogram with dots instead of bars, thus forming a Tally plot. Goto Exercise 6.2.2.

#### Introduction

The hist() command is flexible and allows you to make a range of histograms in R. You can also use the stem() command to make a kind of Tally plot.

The notes show you how to make a custom command that produces a histogram but with dots instead of bars, thus mimicking a Tally plot.

Section 6.3

Using colour in graphs and charts

Goto Exercise 6.3

Top

### 6.3 Colour in graphs

These notes relate to Chapter 6, exploring data using graphs. The notes are mentioned in Section 6.3 Graphs to illustrate differences although they are generally relevant to graphical presentation. These are general notes with a little more information about controlling colour than is mentioned in the book. Goto Exercise 6.3.

#### Introduction

Colour is very important in presenting data and results. Both Excel and R have a wide range of colours you can use when creating your graphs and charts (certainly more than 50 shades of gray!).

Controlling and managing the colours you display is an important element in presenting your work. With an increasing volume of work being presented via the Internet, colour is something not to take for granted. Using default colours is "easy" but for maximum impact you should think carefully about how to present the best colours for the job.

Traditional journals generally use monochrome, which you can think of as just another set of colours, but even if you are "stuck" with shades of grey you need to think carefully. Pattern filling can be an especially useful option when using monochrome.

These notes give a bit more information about controlling colour in Excel and R graphs than in the book.

Section 6.3.1

Using legends in R plots

Goto Exercise 6.3.1a

Top

### 6.3.1a Legends

These notes relate to Chapter 6, exploring data using graphs. The notes are especially relevant to adding legends to barplots in R (Section 6.3.1). Goto Exercise 6.3.1a.

#### Introduction

You can add a legend to a barplot() using the legend parameter. You can also add a legend to any plot via the legend() command. These notes provide a few more details about how to produce and control legends in R plots.

Section 6.3.1

Gridlines in graphs and charts

Goto Exercise 6.3.1b

Top

### 6.3.1b Gridlines in graphs & charts

These notes relate to Chapter 6, exploring data using graphs. The notes are especially relevant to adding gridlines to bar charts in Excel and R. Goto Exercise 6.3.1b.

#### Introduction

Gridlines are potentially useful items you might want to incorporate in your charts. Gridlines can help the reader to gauge the height of bars in a column chart more easily for example, and so the readability is improved.

On the other hand gridlines can "get in the way" and hinder readability by making your chart cluttered. In scatter plots you may require both horizontal and vertical gridlines, having gridlines on one axis only can "lead the eye". Knowing when to apply gridlines or not is part of the skill of presentation.

These notes show you how to add and tweak gridlines in both Excel and R.

Section 6.3.2

Ordering the boxes in a boxplot()

Goto Exercise 6.3.2

Top

### 6.3.2 Ordering the boxes of a boxplot()

These notes relate to Chapter 6, exploring data using graphs. The notes are relevant to box-whisker plots, which are used to display "differences" between samples. Goto Exercise 6.3.2.

#### Introduction

The boxplot() command is one of the most useful graphical commands in R. The box-whisker plot is useful because it shows a lot of information concisely. However, the boxes do not always appear in the order you would prefer. These notes show you how you can take control of the ordering of the boxes in a boxplot().

Section 6.4.2

Axis labels in R plots, using expression() to make superscript etc.

Goto Exercise 6.4.2

Top

### 6.4.2 Axis labels using the expression() command

These notes relate to Chapter 6, exploring data using graphs. They are relevant to all types of R plot because they are concerned with labelleing of axes, especially making of superscript and subscript elements. Goto Exercise 6.4.2.

#### Introduction

The labelling of your graph axes is an important element in presenting your data and results. You often want to incorporate text formatting to your labelling. Superscript and subscript are particularly important for scientific graphs. You may also need to use bold or italics (the latter especially for species names).

In Excel you can simply select the text of your labels and alter the formatting. In R you must create a specially formatted string using the expression() command. These notes show you how to use the expression() command and also how to place items using text(), title() and mtext() commands.

Section 6.5.1

Interactive text placement in R plots.

Use locator() in place of x, y co-ordinates to use the mouse as a pointer.

Goto Exercise 6.5.1

Top

### 6.5.1 Interactive labels in pie() charts

These notes relate to Chapter 6, exploring data using graphs. They are especially relevant to Section 6.5.1, which is about using pie charts to show association data. However, the notes are generally relevant as they show how you can place text onto an existing R plot in an interactive manner, using your mouse as a pointer. Goto Exercise 6.5.1.

#### Introduction

The locator() command is used to "read" the mouse position and generate x, y co-ordinates. These can be used in various ways, in commands that require those x, y co-ordinates. For example, sometimes the default placement of labels on a plot is not quite what you want. You can use the text() command with locator() to place the labels exactly where you want.

In this exercise you'll see the locator() command used to place labels on a pie() chart as well as some notes about making custom labels.

Chapter 7
Tests for differences

## Chapter 7

The following exercises & notes relate to Chapter 7, tests for differences.

Section 7.1.1

Welch two-sample t-test modifies the degrees of freedom to produce a more conservative result

Use Excel function TTEST

Goto Exercise 7.1.1

Top

### 7.1.1 Welch two-sample t-test

This exercise is concerned with the t-test in Chapter 7 (Section 7.1). This exercise walks you through the process of a t-test in Excel and calculating the modified degrees of freedom associated with the TTEST function. Goto Exercise 7.1

#### Introduction

The t-test is used to compare the means of two samples that have a normal (parametric or Gaussian) distribution. The "classic" t-test has two major variants:

• Assumption of equal variance for the two samples.
• Adjustment of degrees of freedom (Satterthwaite modification).

In the first case the common variance is calculated and used in place of the variance in the regular formula. The calculation for this is relatively simple but it is also pointless, since you still have to determine the variance of the two samples.

The most commonly used modification is to adjust the degrees of freedom to make the result of the t-test a little more conservative. The degrees of freedom are reduced slightly using the Satterthwaite modification. This version of the t-test is generally called the Welch 2-sample t-test.

Section 7.1.2

Use t.test() command to carry out the t-test in R

Goto Exercise 7.1.2

Top

### 7.1.2 Using R for the t-test

This exercise is concerned with the t-test in Chapter 7 (Section 7.1) and particularly with running the t-test using R. Goto Exercise 7.1.2.

#### Introduction

The t.test() command carries out the t-test in R. The default is to compute the Welch two-sample test (unequal variances).

You can have your data in several forms and this exercise is a brief reminder of how to manage data in different forms, such as:

• Two separate samples as two data vectors.
• Two separate samples but in a single data.frame object (i.e. sample format).
• A response variable and a predictor variable (i.e. scientific recording format)

The data used are the same as those illustrated in the book text.

Section 7.3.3

Use RANK.AVG in Excel to rank values.

Use sum of ranks from positive and negative differences as the test statistic for Wilcoxon matched pair test.

Goto Exercise 7.3.3

Top

### 7.3.3 Using Excel for the Wilcoxon Matched Pairs test

This exercise is concerned with matched pairs tests (Section 7.3) and in particular how to carry out the non-parametric Wilcoxon Matched Pairs test using Excel (Section 7.3.3). Goto Exercise 7.3.3.

#### Introduction

There is no in-built function that will carry out the Wilcoxon Matched Pairs test in Excel. However, you can rank the data and compute the rank sums you require using the RANK.AVG function.

You do need to omit zero differences from the calculations and also to separate ranks of positive differences from ranks of negative differences. In this exercise you can see how to use the IF function to help you do this separation.

Once you have your result you'll have to look up the critical values for W (see Table 7.13 in the book), to see if your result is a significant one.

Use a scatter plot in matched pairs situations.

Plot one sample against the other and use an isocline.

Goto Exercise 7.3.5

Top

### 7.3.5 Graphs and matched pairs results

This exercise is concerned with matched pairs tests (Section 7.3), and in particular how you can represent the data/results graphically. Goto Exercise 7.3.5.

#### Introduction

Usually you'll use a bar chart or box-whisker plot to display data when looking at differences between samples. When you have a matched pairs situation however, these sorts of graph may not always be the best way to summarize your data/results.

An alternative is to use a scatter plot, where you plot one sample against the other. If you add an isocline (a straight line with slope 1 and intercept 0) you can see more clearly how the pairs of observations match up with one another. These notes show you how to prepare such a scatter plot.

Chapter 8
Tests for linking data – Correlations

## Chapter 8

The following exercises & notes relate to Chapter 8, tests for linking data – correlations.

Use Excel for Spearman Rank correlation.

Use RANK.AVG to rank data then CORREL to get a Spearman Rank coefficient.

Goto Exercise 8.3.2

Top

### 8.3.2 Spearman Rank correlation in Excel via t approximation

This exercise is concerned with correlation (Chapter 8) and in particular how you can use Excel to calculate Spearman's Rank correlation coefficient. Goto Exercise 8.3.2.

#### Introduction

Excel has built-in functions that can calculate correlation, but only when data are normally distributed. The CORREL and PEARSON functions both calculate Pearson's Product Moment, a correlation coefficient. Once you have the correlation coefficient it is fairly easy to calculate the statistical significance. You compute a t-value then use TINV to compute a critical value or TDIST to get an exact p-value (see Section 8.3.1 in the book).

This exercise shows how you can use the RANK.AVG function to rank the data, then use CORREL on the ranks to obtain a Spearman Rank coefficient.

Chapter 9
Tests for linking data – Associations

## Chapter 9

The following exercises & notes relate to Chapter 9, tests for linking data – associations.

Section 9.4

Goodness of fit tests compare one set of frequencies against another.

Often used in genetic studies to compare observed phenotype ratios.

Excel functions:
CHITEST
CHIINV
CHIDIST

Goto Exercise 9.4

Top

### 9.4 Using Excel for Chi-squared goodness of fit tests

This exercise is concerned with association (Chapter 9), in particular goodness of fit testing using Excel (Section 9.4). Goto Exercise 9.4.

#### Introduction

Excel has several functions related to the Chi-squared statistic. This allows you to undertake chi-squared goodness of fit testing for example.

In goodness of fit tests you have one set of frequency data in various categories. You also have a matching set of frequencies that you want to "compare". The comparison set may be a theoretical set of values or perhaps a previous set of observations. The goodness of fit test is often used in genetic studies where you match up observed phenotypes against a theoretical ratio of expected phenotypes.

The following Excel functions are helpful in carrying out the goodness of fit test:

• CHITEST - takes observed and expected values and computes an exact p-value.
• CHIINV - takes a probability and degrees of freedom and returns a chi-squared value (e.g. a critical value).
• CHIDIST - takes a chi-squared value and degrees of freedom and returns an exact p-value.

This exercise shows you how to carry out the calculations using these Excel functions.

Chapter 10
Differences between more than two samples. ANOVA and Kruskal-Wallis

## Chapter 10

The following exercises & notes relate to Chapter 10, Differences between more than two samples. Essentially this means analysis of variance (ANOVA) and the Kruskal-Wallis non-parametric equivalent of 1-way ANOVA.

Layout of data is important.

Excel tends to use sample layout, whilst R tends to use recording layout.

Goto Exercise 10.1.4

Top

### 10.1.4 Alter sample format data to scientific recording format

These notes are related to Chapter 10, which is concerned with differences between more than two samples. The notes are relevant to other analyses too, as they show you how to alter the layout of your data from the form "preferred" by Excel and that required by R. Goto Exercise 10.1.4.

#### Introduction

There are two main ways you can layout your data. In sample-format each column is a separate sample, which forms some kind of logical sampling unit. This is a typical way you layout data if you are using Excel because that's how you have to have your data to be able to make charts and carry out most forms of analysis.

In scientific recording format each column is a variable; you have response variables and predictor variables. This recording-layout is a more powerful and ultimately flexible layout because you can add new variables or observations easily. In R this layout is also essential for any kind of complicated analysis, such as regression or analysis of variance.

I've written about scientific recording format before, see my Writer's Bloc page for a brief summary.

When you have data in the "wrong" layout you need to be able to rearrange them into a more "sensible" layout so that you can unleash the power of R most effectively. The stack() command is a useful tool that can help you achieve this layout.

Section 10.1.5

Use Excel for two-way ANOVA

Set out your data in sample format

Goto Exercise 10.1.5

Top

### 10.1.5 Two-Way ANOVA using Excel

This exercise is related to analysis of variance and in particular how you can carry out two-way ANOVA using Excel (Section 10.1.5). The calculations are not especially hard but it can be fiddly. Goto Exercise 10.1.5.

#### Introduction

Excel can carry out the necessary calculations to conduct ANOVA and has several useful functions that can help you. However, it is most suitable for one-way ANOVA, where you have a single predictor variable. When you have two predictor variables two-way ANOVA is possible, but can be tricky to arrange.

In order to carry out the calculations you need to have your data arranged in a particular layout, let's call it sample layout or "on the ground" layout. This is not generally a good layout to record your results but it is the only way you can proceed sensibly using Excel. In this exercise you'll see how to set out your data and have a go at the necessary calculations to perform a two-way ANOVA.

If you have Windows you can use the Analysis ToolPak to carry out the computations for you but you'll still need to arrange the data in a particular manner.

Section 10.2

Critical values for the Kruskal-Wallis test.

Goto Exercise 10.2a

Top

### 10.2a Critical values for Kruskal-Wallis test

These notes give critical values for the Kruskal-Wallis test. The K-W test is used to analyse differences between more than two samples (Chapter 10). The K-W test is presented in Section 10.2. Goto Exercise 10.2a.

#### Introduction

The Kruskal-Wallis test is appropriate when you have non-parametric data and one predictor variable (Section 10.2). It it analogous to a one-way ANOVA but uses ranks of items in various groups to determine the likely significance.

If there are at least 5 replicates in each group the critical values are close to the Chi-Squared distribution. There are exact critical values computed when you have equal group sizes. There are also exact critical values for situation where you have un-equal group sizes.

Adjusting Kruskal-Wallis when there are tied ranks.

Goto Exercise 10.2b

Top

### 10.2b Adjustment for tied ranks in Kruskal-Wallis test

These notes relate to the Kruskal-Wallis test for differences between more than two samples (Section 10.2). The notes show how you can adjust the test statistic in situations where there are tied ranks. Goto Exercise 10.2b.

#### Introduction

The Kruskal-Wallis test is appropriate when you have non-parametric data and one predictor variable (Section 10.2). It it analogous to a one-way ANOVA but uses ranks of items in various groups to determine the likely significance.

When you have tied values, you will get tied ranks. In these circumstances you should apply a correction to your calculated test statistic. The notes show you how this can be done. The calculations are simple but in Excel it can be difficult to get the process "automated". In R the kruskal.test() command computes the adjustment for you.

Post Hoc testing with Kruskal-Wallis tests.

Goto Exercise 10.2.3

Top

### 10.2.3 Post-hoc testing in Kruskal-Wallis using R

These notes relate to Chapter 10, differences between more than two samples. Specifically the notes deal with post hoc analysis when you use the Kruskal-Wallis test (Section 10.2). Goto Exercise 10.2.3.

#### Introduction

The Kruskal-Wallis test is a non-parametric test for differences between more than two samples. It is essentially an analogue for a one-way anova. There is no "standard" method for carrying out post hoc analysis for KW tests. These notes show you how you can use a modified form of the U-test to carry out post hoc analysis.

The notes include the custom R commands in the file KW posthoc.R.

Chapter 11
Tests for linking several factors:
Regression (linear, multiple, curvilinear, logistic).

## Chapter 11

The following exercises and notes relate to Chapter 11, tests for linking several factors. This means regression, multiple regression, curvilinear regression and logistic regression.

Regression model diagnostics:
Fitted values, residuals and plots.

Goto Exercise 11.1

Top

### 11.1 Graphing Multiple regression

These notes relate to Chapter 11, tests for linking several factors. The notes relate especially to Sections 11.1 (multiple regression) and 11.2 (curvilinear regression). The notes show how you can summarize your regression models graphically. Goto Exercise 11.1.

#### Introduction

When you only have two variables (a predictor and a single response) you can use a regular scatter plot to show the relationship. Even if the relationship is logarithmic or polynomial you can represent the situation, as long as there is only one predictor variable.

When you have two or more predictor variables it becomes hard to represent the situation graphically. You can try a 3D plot but they are rarely successful. Generally you'll stick to plotting the "most important" predictor variable and display the model as a standard regression table.

However, it can be helpful show some diagnostics from your regression model as graphics. These notes show you how you can produce some simple regression diagnostics and present them graphically.

Using R to calculate beta coefficients from regression models.

Goto Exercise 11.1.2

R Code example:
Beta coeff calc.R

Top

### 11.1.2 Beta Coefficients

These notes are related to Sections 11.1 (multiple regression) and 11.2 (curvilinear regression). More specifically they show how to calculate beta coefficients from regression models using R. Goto Exercise 11.1.2.

#### Introduction

In linear regression your aim is to describe the data in terms of a (relatively) simple equation. The simplest form of regression is between two variables:

y = mx + c

In the equation y represents the response variable and x is a single predictor variable. The slope, m, and the intercept, c, are known as coefficients. If you know the values of these coefficients then you can plug them into the formula for values of x, the predictor, and produce a value for the response.

In multiple regression you "extend" the formula to obtain coefficients for each of the predictors. If you standardize the coefficients (using standard deviation of response and predictor) you can compare coefficients against one another, as they effectively assume the same units/scale.

The functions for computing beta coefficients are not built-in to R. In these notes you'll see some custom R commands that allow you to get the beta coefficients easily. You can download the Beta coeff calc.R file directly (the code is explained in Exercise 11.1.2).

Use the results of a regression model as a predictive tool.

Predict the level of the response variable in a logistic regression.

Goto Exercise 11.3.1

Top

### 11.3.1 Logistic regression: model prediction

These notes relate to Section 11.3, logistic regression. In particular the notes show how to predict the value of the response variable for given values of the predictor in a logistic regression. Goto Exercise 11.3.1. The notes are also generally relevant to regression (Chapter 11) as the methods apply to general regression models.

#### Introduction

When you carry out a regression you are looking to describe the data in terms of the variables that form the relationships. When you've got your regression model you are able to describe the relationship using a mathematical model (which is what the regression model is).

The regression model can be used in several ways, for example you can calculate fitted values, which are "idealised" values of the response variable. These are used in making lines of best-fit and also in diagnostic plots (see Exercise 11.1). The difference between the idealised values and the actually observed values are called residuals, which are also used in diagnostic plots (see Exercise 11.1).

You can also use the regression model to make predicted values, which is where you use "new" values of the predictor (that is ones not observed in the original dataset) to predict the response variable. These are especially important in logistic regression, where your response is binary, that is it only has two possibilities. The result you get when you "predict" response values in a logistic regression is a probability; the likelihood of getting a "positive" result when the predictor variable is set to a particular value.

Section 11.3

Regression model building.

Build a model in logistic regression.

Goto Exercise 11.3.2

Top

### 11.3.2 Logistic regression: model building

These notes relate to Section 11.3 logistic regression. The notes show how to build a logistic regression model that contains only the "best" components. The principles apply to all regression models but the exercise uses an example dataset that has a binary response variable. Goto Exercise 11.3.2.

#### Introduction

When you have several (or indeed many) predictor variables you want to find a regression model that best describes the relationship between the variables. You should not incorporate every variable that you've got. Eventually you'll explain all the variability in your response variable simply because you've got so many explanatory predictors.

The process of model-building allows you to select the "best" variable to add to your current regression model. In the book you see how to carry out stepwise model building using a regular multiple regression (Section 11.1.2). In this exercise you can have a go at building a logistic regression model. The process is much the same as described in Section 11.1.2.

Chapter 12
Community ecology:
Diversity & Similarity

## Chapter 12

These notes and exercises relate to Chapter 12, Community ecology. Specifically to diversity and similarity.

Preparing & managing
community data

Use a Pivot Table in Excel to rearrange data in scientific recording layout to community layout

Top

### 12.0.0 Preparing & managing community data

This exercise relates to all of Chapter 12 (community ecology), and is primarily aimed at helping you to prepare data and assemble it in a form that allows you to carry out further investigation. This follows on from an earlier exercise in Section 3.2.7, where you used an Excel Pivot Table. Goto Exercise 12.0.0.

### Introduction

It is important that your data are arranged and set out in a manner that allows you to carry out the analyses that you require. In general a scientific recording layout is a good starting point (see Section 2.2). In the scientific recording layout (a.k.a. biological recording layout) you have a column for each variable (e.g. site name, species name, abundance). For most purposes this is the most "robust" way to record your data, as you can use the data most flexibly. For community analyses however, you'll generally want to have the data arranged by site and species, with the rows being site names and the columns the species (with the body of the table being the abundance).

This exercise shows you how to take a dataset that is in recording layout and convert to community layout. The exercise data are: Preparing and managing community data exercise.xlsx.

See also some notes about data layout and management from my book Managing Data Using Excel, on my Writer's Bloc page.

Comparing Shannon diversity index from two community samples.

Hutcheson t-test.

Goto Exercise 12.1.4

Shannon diversity t-test calculator.xlsx

Top

### 12.1.4 Comparing diversity

These notes relate to Section 12.1.4 comparing diversity. The notes show how to compare the Shannon diversity index from two separate community samples. This is done using the Hutcheson t-test, a variant of the classic t-test. Goto Exercise 12.1.4.

#### Introduction

When you have two samples of community data you can calculate a diversity index for each one. The Shannon diversity index is a commonly used measure of diversity. However, you cannot compare the two index values using classic hypothesis tests because you do not have replicated data.

The Hutcheson t-test is a modified version of the t-test that provides a way to compare two samples. The key is the formula that determines the variance of the Shannon index. These notes will show you how to conduct the Hutcheson t-test and so get a statistical significance of the difference in Shannon diversity between two samples. There is also a spreadsheet calculator, that you can download and use for your own data.

Section 12.2

Visualizing community similarity using a dendrogram in Excel.

Goto Exercise 12.2.1

Dendrogram Exercise.xlsx

### 12.2.1 Visualizing Similarity

This exercise is concerned with looking at similarity between ecological communities (Section 12.2). This exercise shows you how to visualize the similarity between several communities using a dendrogram drawn using Excel. Goto Exercise 12.2.1.

#### Introduction

When you have two or more ecological communities you can use the presence-absence of species (or abundance information, if you have it) to determine measures of similarity (or the corollary, dissimilarity). In the case of presence-absence data you use the species richness of each sample and the number of shared species to calculate an index of (dis)similarity. Once you have a matrix of (dis)similarity you can visualize the relationship between the community samples using a dendrogram. Think of it as being like a family tree, with communities most similar being "near" one another in the diagram.

You can carry out calculations for (dis)similarity using Excel, although things can get rather tedious when you have more than a few samples. There is no built-in chart type that will create a dendrogram in Excel so you must use other drawing tools. R is able to carry out the calculations and dendrogram rather easily but it is a worthwhile exercise to use Excel as it helps you understand how the (dis)similarity is "converted" to a dendrogram and therefore helps you to understand more clearly what you are looking at.

You can get the sample data here: Dendrogram Exercise.xlsx.

Community distance measures using abundance data. Calculations with Excel using various metrics.

Goto Exercise 12.2.2

Distance metrics.xlsx

Top

### 12.2.2 Abundance-based dissimilarity metrics

This exercise is concerned with similarity between ecological communities (Section 12.2) and in particular at calculation of distance measures when you have abundance data (Section 12.2.2). In this exercise you'll see get the chance to undertake some simple calculations using Excel. Goto Exercise 12.2.2.

#### Introduction

When your community data samples include abundance information (as opposed to simple presence-absence) you have a wider choice of metrics to use in calculating (dis)similarity. When you have presence-absence data you use the number of shared species (J) and the species richness of each sample (A & B). Measures of (dis)similarity obtained are therefore slightly "crude".

When you have abundance data your measures of (dis)similarity are a bit more "refined" and you have the potential to pick up patterns in the data that you would otherwise not see using presence-absence data.

There are many metrics that you might use to explore (dis)similarity, in this exercise you'll see four of the more commonly used ones:

• Bray-Curtis
• Canberra
• Manhattan (City Block)
• Euclidean

The exercise shows you how you can carry out the calculations using Excel (in the book you also see how to do this using R). You can get the sample spreadsheet here: Distance metrics.xlsx.

My Publications     