Dr. Mark Gardener 



Statistics for Ecologists Using R and Excel (Edition 2)Data Collection, Exploration, Analysis and Presentationby: Mark GardenerAvailable now from Pelagic Publishing Welcome to the support pages for Statistics for Ecologists. These pages provide information and support material for the book. You should be able to find an outline and table of contents as well as support datafiles and additional material. Support Index  Outline & TOC  Exercises & supplementary notes 

Datasets: 
Data filesThere are a number of data files associated with the book. I've tried to ensure that all the data mentioned and illustrated in the text are available for you to download. Many of the datasets are used in Have a Go exercises, you can get the data and then follow along with the exercises. On this page you can find some details about each dataset and of course download the files.
See also the Exercises support page, where there are additional notes and exercises. You'll find more datasets with these exercises and links to the files as you need. 

Data resources listed by type of analysis (topic) 
Data resources listed by topicClick a name to go to a description of the data. The data are contained in two archives, one for the R data, S4E2e.RData and one for spreadsheet data, S4E2e Archive Excel.zip. Note that some items may be listed in more than one section. Graphics are not mentioned as a topic because all the data can be shown graphically one way or another! Similarly many of the data can be used for practice at manipulating data in Excel and/or R in some way (see Miscellaneous). Summary Statistics  Data Distribution  Differences (2 samples)  Correlations (2 variables) Archives  Instructions (& download) 

Mean, median 
Summary Statistics 

Histogram Tally plot 
Data Distribution 

ttest Utest Wilcoxon matched pairs 
Differences (2 samples)


Spearman's Rank Pearson's Product Moment Curvilinear correlation 
Correlations (2 variables)


Chi Squared Yates' correction Goodness of Fit 
Associations


KruskalWallis ANOVA 
Differences (>2 samples) 

Linear regression Curvilinear regression Logistic regression 
Regression


Species richness Diversity Index 
Diversity


Jaccard Sørensen BrayCurtis Euclidean metric 
Similarity 

Rearranging data Managing data Pivot Tables Lookup tables 
Miscellaneous


Archives 
Archives
You can use the RData file in several ways:


Ant species and fire regime Use for dissimilarity and diversity calculations 
Ants and fireThese data are adapted from Hoffmann, B.D. 2003. Austral Ecol. 28, p.182 and show the abundance of 91 species of ant in 10 samples. The samples are from two types of soil (red and black) and from 5 fire regimes. The data are arranged with the samples as columns, the column names indicate the soil and regime as follows:
The data are used for dissimilarity calculations (including visualisation of dissimilarity with a dendrogram) but you can also use them to explore diversity. The data are in the S4E2e.RData archive and are named ant. 

Beetle sizes Use for data summary or differences tests 
Beetle sizesThese data give the sizes (in mm) of a species of water beetle. The main sample can be used for data summary, a second sample is available (in R) to use for comparisons.
The data can be used for data summary, such as mean, median, standard deviation and so on. You can also practice drawing histograms. The two samples can also be compared with the ttest or Utest. 

Beach hoppers Use data for logistic regression 
Beach hoppersThese data show the allele frequencies at the mannose6phosphate isomerase (Mpi) locus in the amphipod crustacean Megalorchestia californiana, Californian beach hopper. Data from McDonald, J.H. 1985 (Heredity 54: 359–366).
These data are used to demonstrate logistic regression. Each row of the data gives the latitude and the number of specimens that had each form of the allele. There are two forms, so the data are binary, which is why a logistic regression is the appropriate method of analysis. Logistic regression is a form of generalized linear modelling (GLM). 

Butterflies and Year Use data for graphical summaries: bar chart, column chart, pie chart, line chart. Use the data for diversity index: Simpson's or ShannonWiener. Explore similarity between years. 
Butterflies and YearThese data show the abundance (as a count) of six butterfly species over five years at a site in Scotland. The data are arranged with the columns giving the year of the sample, each row gives the abundance of a species.
You can use the data for graphical summary, showing line plots of abundance and time for example, as well as bar charts and pie charts. You can also look at the diversity of the samples as a bit of practice with diversity indices, such as Shannon and Simpson's. You could also explore similarity between years. Although perhaps not the most sensible types of analysis you might also use the data for comparison of differences or changes with time (as a correlation). 

Butterflies and Habitat Use data for Pivot Table practice. Use the data for summary, graphs and for exploring differences between samples. 
Butterflies & HabitatThese data show the abundance of butterflies in three habitats. Each habitat was sampled several times. The datafile has three columns, for the abundance, habitat and an index variable (the replicate).
These data are used as an example of rearranging and managing data using a Pivot Table in Excel. You can also use the data to look at data summary, graphics and differences (there are 3 samples). To do that using R you'll need to save a copy as a CSV file then import into R. 

Butterfly Food Use the data for regression (multiple regression) Use the data for graphical summary (scatter plot) 
Butterfly foodThese data show the abundance of butterflies and the availability of food plants and nectar resources.
These data can be used to look at (multiple) regression (or correlation), and associated statistics (such as beta coefficients). You can also use the data for some graphical summary, such as scatter plots. 

Birds and Habitat Use the data for association analysis (Chi Squared test) Use the data for graphics such as bar chart (column chart) or pie chart Use birds.xlsx for practice with Pivot Tables 
Birds & HabitatThese data show the abundance of some common UK bird species in various habitats.
The main purpose of these data is to look at tests of association (the Chi squared test). You can also use them for graphical summary, using bar charts and pie charts. The birds.xlsx file can also be used for translating data from recording layout into a contingency table (in Excel using a Pivot table, in R you can use the xtabs() command). 

Bluebell abundance Use the data for regression using a polynomial model Use the data for graphics (scatter plot and trendline) 
Bluebell abundanceThese data show the abundance of bluebell in a wood in England. Data are presented showing the abundance of the plant and the light intensity at the growing site.
These data show an interesting relationship between abundance and light, an inverted U shape. This lends itself to a regression using a polynomial equation. You can also use the data for graphical summary, such as a scatter plot and line of best fit (a trendline). 

Dominance/Abundance scales This file is used as part of an exercise in using lookup tables =VLOOKUP in Excel 
Dominance/Abundance scalesThis datafile is used in an exercise on using lookup tables. It is useful to be able to convert from an ordinal scale that uses text values to an ordinal scale with "real" numbers.
Use this to help practice using the =LOOKUP function in Excel. This allows you to replace one value with another. In this case an abundance as a text label (D = dominant, A = abundant etc.) can be replaced with a numerical value. This allows you to carry out nonparametric statistics (e.g. the Utest). You are essentially replacing a textbased ordinal scale with a numberbased ordinal scale. 

Diversity Use these files to calculate indices of diversity e.g. Shannon Entropy 
DiversityThese two files show you how to calculate two indices of diversity; both are in the S4E2e Archive Excel.zip archive.
These files can be used as the basis for a spreadsheet calculator (you can add extra rows as you need), which you can use to help compute the two commonly used indices of diversity. 

Flour beetles Use these data to explore differences between two samples Use these data for data summary and for graphics (e.g. boxplot, histogram) 
Flour beetlesThese data show the abundance of flour beetles in samples taken from two different (fictitious) farms. The Excel version shows the data in sample layout, with one column for each sample. There are several R versions, with the data in different layouts:
These data can be used for exploring differences between samples. You can also use them for graphical summary, e.g. bar charts, boxwhisker plots, and for data summary (e.g. mean, median, standard error). The Rformat data are in several forms so that you can practice carrying out commands on the different type of object. 

Freshwater invertebrates (correlation) Use these data for correlation and for graphics (e.g. scatter plot) 
Freshwater invertebrates (correlation)These data show the abundance of a freshwater invertebrate and the water speed at the point of collection. See also the Mayfly (correlation) data, which are very similar.
These data can be used for exploration of correlation as well as some graphical summary (e.g. scatter plot). 

Freshwater invertebrates (diversity) Use these data for calculation of diversity index 
Freshwater invertebrates (diversity)These data give the abundance of some freshwater invertebrates from Goredale Beck in Yorkshire. There is also some taxonomic information for each invertebrate recorded.
You can use these data for looking at diversity. You can also practice transferring the data from Excel into R. 

Growth (plant growth) Use these data to look at curvilinear regression, i.e. linear regression with a logarithmic equation Use the data for graphics (scatter plot and trendline) 
Growth (plant growth)These data show the growth of a plant species in response to different levels of a nutrient.
These data show an interesting relationship. If you plot one variable against the other you'll see that the points "curve". In fact the relationship is a logarithmic one. You can use these data to look at curvilinear regression, in this case logarithmic regression. This is ordinary linear regression but with a logarithmic equation. You can also use the data to look at graphical summary, e.g. a scatter plot with line of best fit (trendline). 

Heather species A 2x2 contingency table showing cooccurrence of 2 species Use the data for tests of association (Chi squared) and for graphical summary (bar chart) 
Heather speciesThis dataset shows the abundance of two species of heather in Cornwall. The data are in the form of a contingency table. In total 137 quadrats were used and the presence of each species noted. The contingency table shows the frequency of occurrence, thus you have four options:
The data can be used to explore the association between the two species.
You can use the data for tests of association (e.g. the Chi squared test), and since this is a 2x2 contingency you can apply Yates' correction. You can also use the data for graphical summary (e.g. bar chart, pie chart). You can also use the CSV file to practice transferring data from spreadsheet to R. 

Hoglouse abundance Use these data for differences between >2 samples (e.g. KruskalWallis test) Use these data for data summary and for graphics (e.g. boxplot or bar chart) 
Hoglouse abundanceThese data show the abundance of hoglouse (Asellus spp), a freshwater invertebrate, at three sampling locations.
You can use the data for exploring differences between samples. In the book text you use these data for a KruskalWallis test, which is a nonparametric test of differences between more than two samples. You can also use the data for practice at data summary and graphics (these data are used to draw bar charts in the book text). You could also subset the data and look at comparing just two samples. 

Hornbill diet Use these data for dissimilarity indices (e.g. Jaccard, Sørensen) and for dendrogram 
Hornbill dietThese data show the presence of different fruits in the diet of three species of hornbill from India (data adapted from Datta, A. & Rawat, G.S. 2003. Biotropica 35, p.208). The data are in the form of presenceabsence, so if a fruit species was found in the diet a 1 is recorded, if the fruit was absent it is shown as 0.
You can use these data to look at similarity (dissimilarity) and to draw a simple dendrogram to show the relationship between the samples in terms of the presence of fruit species. You can also use the data for diversity (species richness). 

Invertebrates and Habitat Use these data for tests of association (e.g. Chi squared) Use these data for graphical summary (e.g. bar chart, pie chart) 
Invertebrates and HabitatThese data show the frequency of observation of some terrestrial invertebrate taxa on different parts of plants. There are two datasets, which are similar. Both give the frequency of observation in the form of a contingency table.
You can use these data for tests of association (e.g. Chi squared) and for data summary (e.g. bar charts, pie charts). Note that the CSV dataset invert.csv is not duplicated in R, which means you can practice importing CSV data into R. 

Leaf sizes Use these data for data summary (e.g. running mean) Use these data for graphical summary (e.g. line plot for running mean) 
Leaf sizesThese data show the sizes of tree leaves in millimetres. The main (Excel) dataset gives 10 samples, each of 10 measurements.
You can use these data to look at data summary and especially the calculation of running means and standard error. You can use them for graphical summary too, such as line plots and the running mean. If you want to use the entire dataset in R you'll need to transfer the data from the Excel file, which will give you some practice. You could also explore differences between samples (pairs or several/all at once). 

Mayfly (correlation) Use these data for tests of correlation Use these data for graphical summary (e.g. scatter plot) 
Mayfly (correlation)These data show the abundance of a mayfly species and the speed of the water at the sampling location.
Use these data for looking at correlation between the two variables. You can also look at graphical summary (scatter plot), as well as the data summary (averages, distribution). 

Mayfly (regression) Use these data for multiple regression Use these data for graphical summary (e.g. scatter plot and line of best fit) 
Mayfly (regression)These data give the sizes of a freshwater invertebrate and several environmental variables at the sampling location for each size measurement. The data contains the following variables:
The dataset is in two forms:
You can use these data for regression analysis (multiple regression), and for graphical summary (e.g. scatter plots, perhaps with a trendline). You can also look at the data summary and could conduct simple correlation between pairs of variables. The dataset provides a simple introduction to regression model building. 

Mosses and trees Use these data for analysis of similarity Use these data for visualizing similarity (e.g. dendrogram) 
Mosses and treesThese data show the abundance of some bryophyte species on trees in North Carolina (data adapted from Palmer M.W. 1986. The Bryologist 89, p.59). The data are in a community layout, with the columns giving the sample names (the trees) and the rows being the bryophyte species.
You can use these data to calculate indices of similarity (dissimilarity) and indeed the Excel file contains a second worksheet with a completed Euclidean dissimilarity matrix. You can also use these data for visualizing dissimilarity (e.g. with a dendrogram). You could also use the data to look at diversity indices and for graphical summary (e.g. bar charts, pie charts). 

Newt presenceabsence Use these data for logistic regression and for regression model building 
Newt presenceabsenceThese data give the presenceabsence of great crested newts at ponds in Buckinghamshire, UK. The presence of a newt is recorded with a 1 and the absence by a 0; hence the data are binary. The other columns give various habitat factors such as the area of the pond, an index based on presence of fish, and other factors.
You can use the data for logistic regression, which is a form of generalised linear modelling (GLM). You can carry out logistic regression on single factors or build a regression model with several terms. 

Pearson correlation data Use these data for correlation between two variables Use the data for data summary and graphics (e.g. scatter plot) 
Pearson correlation dataThese data show the abundance of a freshwater invertebrate with corresponding flow rate.
Use these data to carry out correlation between the two variables. You can also use the data to look at data summary (including distribution) and for graphical summary (e.g. scatter plot). 

Pea genetics Use these data for Goodness of Fit testing (a form of association test using Chi squared) Use these data for graphical summary (e.g. bar chart) 
Pea geneticsThese data show the frequency of pea plants exhibiting combinations of coat colour and type.
You can use these data for tests of goodness of fit (a kind of association test), using Chi squared. You can also summarise the data graphically (e.g. bar charts). Note that the Rformat data only contains the observed frequencies. If you want to conduct a goodness of fit test in R you will have to incorporate the "expected" ratio data in some manner. 

Plant species lists Use these data for practice at cross tabulation using Pivot table in Excel or table() in R Use these data for species richness (diversity) and similarity 
Plant species listsThese data give vascular plant species names for samples from 10 sites from a survey in Shropshire, UK.
The data provide some practice at cross tabulation, using the Pivot Table in Excel or table() in R. You can rearrange the data to give a presence/absence table, where 1 = presence of a species at a site and 0 is absence (in fact you will need to do that in order to determine species richness). The S4E2e.RData file contains an object called ps, which has already been tabulated. Once the data are tabulated you can use them to explore species richness (a measure of diversity). You can also look at similarity (which you can also plot using a dendrogram). 

Plant species and watering Use these data for twoway ANOVA 
Plant species & wateringThese data show the growth (in cm) of two plant species in response to three different watering regimes( low, high and middle).
Use these data for twoway analysis of variance (2way ANOVA). The analysis is straightforward in R but less so in Excel (see the Exercises support page for an online exercise on computing 2way ANOVA using Excel). You can also use the data for graphical representation of results (e.g. boxplot or bar chart) as well as general data summary. 

Plant species abundance Use these data to explore diversity using Simpson's D index or ShannonWiener Index Use these data to explore similarity and visualise it with a dendrogram 
Plant species abundanceThese data give the abundance of some terrestrial vascular plants at 10 sites in Shropshire, UK. The data are the same as for the Plant species lists dataset but include the abundance information.
You can use these data to explore diversity using Simpson's or the Shannon index. You can also use these data to look at similarity, which you can also plot using a dendrogram. 

Ridge and furrow meadow Use these data to explore differences between two samples (ttest or Utest) Use the data for summary statistics and graphics 
Ridge & Furrow meadowThese data show the abundance of meadow buttercup plants in one metre square quadrats in an ancient ridge & furrow meadow in Buckinghamshire, UK.
You can use these data to explore differences between two samples (ttest or Utest). You can also display the results graphically (e.g. bar chart or boxwhisker plot). You can use the data for summary statistics (mean, median, IQR etc.), and to look at data distribution. The data are in different forms in Rformat so that you can explore how to use different syntax according to the layout of data you have. 

Seashore seaweed Use these data to convert a textbased ordinal scale with a numerical version Use the =VLOOKUP function with a lookup table in Excel 
Seashore seaweedThese data show the abundance of some seaweed species on a rocky shore in South Devon, UK. The data are presented using a textbased abundance scale (ACFOR) and give the abundance of five species at 13 transect stations across the shore (the stations are different heights above mean tide height).
The data are intended to be used as an example of how to use the =VLOOKUP function in Excel. You use this to replace one item with another. In this case the textbased ordinal scale is replaced by a numerical scale. 

Sward height Use these data to explore differences between multiple samples (e.g. KruskalWallis or ANOVA) Use the data to look at summary statistics and graphics 
Sward heightThese data give the height of vegetation (cm) in the sward at three sampling locations in a meadow in Shropshire, UK.
You can use these data for exploring differences between more than two samples, e.g. the KruskalWallis test (nonparametric) or analysis of variance (ANOVA, parametric). You can also use the data to look at data summaries (e.g. median, mean) and graphically (e.g. boxplot of results or histogram of sample distribution). 

Tree sizes and growth Use these data to draw a line plot of growth over time 
Tree sizes and monthThese data show the size of Sitka spruce trees measured at monthly intervals. The size is determined as the height multiplied by the diameter squared, on a log scale (i.e. log(h x d^{2})). The data are modified from an original (larger) dataset from within R.
Use these data to draw a line plot of spruce growth. 

Whitefly Use these data for a matched pairs test (e.g. Wilcoxon matched pairs) 
WhiteflyThese data show the counts of whitefly (a greenhouse pest) attracted to different coloured sticky traps. Each trap is bicoloured so the data are in the form of matched pairs.
You can use these data to carry out a matched pairs test (either a ttest or Utest, the latter also called Wilcoxon matched pairs). You can also summarise the results graphically. 

Wilcoxon online exercise Use these data for Wilcoxon matched pairs test 
Wilcoxon online exerciseThese data are in the form of matched pairs (they are fictitious data).
You can use the data for Wilcoxon matched pairs analysis, as well as graphical summary. See the Exercises support page where there is an online exercise in using Excel to carry out a matched pairs test. 

My Publications  My Publications See my personal pages at GardenersOwn 

Follow me... 


See also: 
KeywordsHere is a list of keywords: it is by no means complete! Ttest, Utest, KruskalWallis, Analysis of Variance, Spearman Rank, Correlation, Regression, Logistic Regression, Curved linear regression, histogram, scatter plot, bar chart, boxwhisker plot, pie chart, Mean, Median, Mode, Standard Deviation, Standard Error, Range, Max, Min, Interquartile Range, IQR 

Top  DataAnalytics Home  Contact  GardenersOwn Homepage 