Dr. Mark Gardener

 
About

Statistics for Ecologists Edition 2 Cover
Available from
Pelagic Publishing


Statistics for Ecologists Using R and Excel (Edition 2)

Data Collection, Exploration, Analysis and Presentation

by: Mark Gardener

Available now from Pelagic Publishing

Welcome to the support pages for Statistics for Ecologists. These pages provide information and support material for the book. You should be able to find an outline and table of contents as well as support datafiles and additional material.

Support Index | Outline & TOC | Exercises & supplementary notes


Pelagic Publishing Logo

Datasets:
S4E2e.RData
S4E2e Archive Excel.zip

Data files

There are a number of data files associated with the book. I've tried to ensure that all the data mentioned and illustrated in the text are available for you to download. Many of the datasets are used in Have a Go exercises, you can get the data and then follow along with the exercises. On this page you can find some details about each dataset and of course download the files.

  • Archive files - there are two files, one for Spreadsheet files and one for R-format data.
  • Data resources listed by topic - this gives you an idea of what sorts of things you can do with each data resource.
  • List of data resources - this section lists all the data files (more or less in alphabetical order) and provides some information about each. There are some ideas about what you might "do" with each dataset.

See also the Exercises support page, where there are additional notes and exercises. You'll find more datasets with these exercises and links to the files as you need.


Data resources listed by type of analysis (topic)

Top

Data resources listed by topic

Click a name to go to a description of the data. The data are contained in two archives, one for the R data, S4E2e.RData and one for spreadsheet data, S4E2e Archive Excel.zip. Note that some items may be listed in more than one section. Graphics are not mentioned as a topic because all the data can be shown graphically one way or another! Similarly many of the data can be used for practice at manipulating data in Excel and/or R in some way (see Miscellaneous).

Summary Statistics | Data Distribution | Differences (2 samples) | Correlations (2 variables)
Associations | Differences (>2 samples) | Regression | Diversity | Similarity | Miscellaneous

Archives - Instructions (& download)


Mean, median
Standard deviation, IQR

Summary Statistics

Histogram
Tally plot

Data Distribution

t-test
U-test
Wilcoxon matched pairs

Differences (2 samples)

Spearman's Rank
Pearson's Product Moment
Curvilinear correlation

Correlations (2 variables)

Chi Squared
Yates' correction
Goodness of Fit

Associations

Kruskal-Wallis
ANOVA

Differences (>2 samples)

Linear regression
Curvilinear regression
Logistic regression

Regression

Species richness
Diversity Index

Diversity

Jaccard
Sørensen
Bray-Curtis
Euclidean metric

Similarity

Rearranging data
Managing data
Pivot Tables
Lookup tables

Miscellaneous

Archives

S4E2e.RData

S4E2e Archive Excel.zip

Top

Archives

You can use the RData file in several ways:

  • Open R then use load(file.choose()) and select the S4E2e.RData file (in Linux you should use the filename (in quotes) explicitly, including the path).
  • Double-click the file. If R is already open it will add the data to your workspace, if R is not open it will open and the workspace will contain only these data (and the working directory will be set to wherever the RData file was stored.
  • Drag the file to the R icon. The behaviour is the same as above.
 
 

List of data resources

Arranged more or less alphabetically

Ant species and fire regime

Use for dissimilarity and diversity calculations

Top

Ants and fire

These data are adapted from Hoffmann, B.D. 2003. Austral Ecol. 28, p.182 and show the abundance of 91 species of ant in 10 samples. The samples are from two types of soil (red and black) and from 5 fire regimes. The data are arranged with the samples as columns, the column names indicate the soil and regime as follows:

  • r = red soil
  • b = black soil
  • E2 = burnt every 3yr with grazing early (May)
  • E3 = burnt, spelled & burnt in 2 successive yr
  • L2 = burnt every 3yr with late grazing (Oct)
  • L3 = burnt, spelled , burnt in 2 successive yr
  • U = unburnt control

The data are used for dissimilarity calculations (including visualisation of dissimilarity with a dendrogram) but you can also use them to explore diversity. The data are in the S4E2e.RData archive and are named ant.


Beetle sizes

Use for data summary or differences tests

Top

Beetle sizes

These data give the sizes (in mm) of a species of water beetle. The main sample can be used for data summary, a second sample is available (in R) to use for comparisons.

  • The beetles.xls file in the S4E2e Archive Excel.zip archive contains the main sample as well as examples of histograms.
  • The S4E2e.RData file contains the main sample as an object called bd, there is a second sample Mar (there are also copies sunny, shady).

The data can be used for data summary, such as mean, median, standard deviation and so on. You can also practice drawing histograms. The two samples can also be compared with the t-test or U-test.


Beach hoppers

Use data for logistic regression

Top

Beach hoppers

These data show the allele frequencies at the mannose-6-phosphate isomerase (Mpi) locus in the amphipod crustacean Megalorchestia californiana, Californian beach hopper. Data from McDonald, J.H. 1985 (Heredity 54: 359–366).

  • Beach hopper allele.csv - this file is in the S4E2e Archive Excel.zip archive and will open in a spreadsheet. You can use this for practice at importing data into R but see below...
  • The S4E2e.RData file contains the data as an object called cbh.

These data are used to demonstrate logistic regression. Each row of the data gives the latitude and the number of specimens that had each form of the allele. There are two forms, so the data are binary, which is why a logistic regression is the appropriate method of analysis. Logistic regression is a form of generalized linear modelling (GLM).


Butterflies and Year

Use data for graphical summaries: bar chart, column chart, pie chart, line chart.

Use the data for diversity index: Simpson's or Shannon-Wiener.

Explore similarity between years.

Top

Butterflies and Year

These data show the abundance (as a count) of six butterfly species over five years at a site in Scotland. The data are arranged with the columns giving the year of the sample, each row gives the abundance of a species.

  • Butterfly table - the data are in the S4E2e Archive Excel.zip archive as an XLSX file and a CSV. You can read the CSV file into R, in which case you need to add check.names = FALSE to the read.csv() command.
  • The S4E2e.RData file contains the data as two objects: bf is a matrix and butterfly is a data.frame.

You can use the data for graphical summary, showing line plots of abundance and time for example, as well as bar charts and pie charts. You can also look at the diversity of the samples as a bit of practice with diversity indices, such as Shannon and Simpson's. You could also explore similarity between years.

Although perhaps not the most sensible types of analysis you might also use the data for comparison of differences or changes with time (as a correlation).


Butterflies and Habitat

Use data for Pivot Table practice.

Use the data for summary, graphs and for exploring differences between samples.

Top

Butterflies & Habitat

These data show the abundance of butterflies in three habitats. Each habitat was sampled several times. The datafile has three columns, for the abundance, habitat and an index variable (the replicate).

These data are used as an example of rearranging and managing data using a Pivot Table in Excel. You can also use the data to look at data summary, graphics and differences (there are 3 samples). To do that using R you'll need to save a copy as a CSV file then import into R.


Butterfly Food

Use the data for regression (multiple regression)

Use the data for graphical summary (scatter plot)

Top

Butterfly food

These data show the abundance of butterflies and the availability of food plants and nectar resources.

These data can be used to look at (multiple) regression (or correlation), and associated statistics (such as beta coefficients). You can also use the data for some graphical summary, such as scatter plots.


Birds and Habitat

Use the data for association analysis (Chi Squared test)

Use the data for graphics such as bar chart (column chart) or pie chart

Use birds.xlsx for practice with Pivot Tables

Top

Birds & Habitat

These data show the abundance of some common UK bird species in various habitats.

  • birds.xlsx - the data are in the S4E2e Archive Excel.zip archive as an Excel format file. These data are in recording format and you can use these as practice at using a Pivot Table.
  • bird.csv - these data are also in the archive and contain the data in the form of a contingency table.
  • bird.xlsx - this datafile is in Excel format and shows the data in a contingency table. There is also a completed association analysis in another worksheet.
  • The S4E2e.RData file contains the data (contingency table layout) as two objects: birds is a matrix and bird is a data.frame.

The main purpose of these data is to look at tests of association (the Chi squared test). You can also use them for graphical summary, using bar charts and pie charts. The birds.xlsx file can also be used for translating data from recording layout into a contingency table (in Excel using a Pivot table, in R you can use the xtabs() command).


Bluebell abundance

Use the data for regression using a polynomial model

Use the data for graphics (scatter plot and trendline)

Top

Bluebell abundance

These data show the abundance of bluebell in a wood in England. Data are presented showing the abundance of the plant and the light intensity at the growing site.

These data show an interesting relationship between abundance and light, an inverted U shape. This lends itself to a regression using a polynomial equation. You can also use the data for graphical summary, such as a scatter plot and line of best fit (a trendline).


Dominance/Abundance scales

This file is used as part of an exercise in using lookup tables =VLOOKUP in Excel

Top

Dominance/Abundance scales

This datafile is used in an exercise on using lookup tables. It is useful to be able to convert from an ordinal scale that uses text values to an ordinal scale with "real" numbers.

Use this to help practice using the =LOOKUP function in Excel. This allows you to replace one value with another. In this case an abundance as a text label (D = dominant, A = abundant etc.) can be replaced with a numerical value. This allows you to carry out non-parametric statistics (e.g. the U-test). You are essentially replacing a text-based ordinal scale with a number-based ordinal scale.


Diversity

Use these files to calculate indices of diversity e.g.

Shannon Entropy
Simpson's D index

Top

Diversity

These two files show you how to calculate two indices of diversity; both are in the S4E2e Archive Excel.zip archive.

  • Diversity Simpson D.xls - as the name suggests, this calculates the Simpson's D index of diversity.
  • Diversity Shannon.xls - this spreadsheet computes the Shannon index (also called Shannon-Wiener or Shannon-Weaver).

These files can be used as the basis for a spreadsheet calculator (you can add extra rows as you need), which you can use to help compute the two commonly used indices of diversity.


Flour beetles

Use these data to explore differences between two samples

Use these data for data summary and for graphics (e.g. boxplot, histogram)

Top

Flour beetles

These data show the abundance of flour beetles in samples taken from two different (fictitious) farms. The Excel version shows the data in sample layout, with one column for each sample. There are several R versions, with the data in different layouts:

  • flour beetles.xls - the data are in the S4E2e Archive Excel.zip archive. One column shows the counts of beetles from Woad Farm, the other column shows the counts from Glebe Farm.
  • The S4E2e.RData file contains the data as four objects:
    • flour1 - a data.frame with a column qty and a column site, i.e. in recording layout
    • flour2 - a data.frame with two columns, Woad.Fm and Glebe.Fm, each containing the counts from a separate farm
    • Woad.Fm - a vector of values representing the counts of beetles at Woad farm
    • Glebe.Fm - a vector of values representing counts of beetles at Glebe farm

These data can be used for exploring differences between samples. You can also use them for graphical summary, e.g. bar charts, box-whisker plots, and for data summary (e.g. mean, median, standard error). The R-format data are in several forms so that you can practice carrying out commands on the different type of object.


Freshwater invertebrates (correlation)

Use these data for correlation and for graphics (e.g. scatter plot)

Top

Freshwater invertebrates (correlation)

These data show the abundance of a freshwater invertebrate and the water speed at the point of collection. See also the Mayfly (correlation) data, which are very similar.

  • freshwater correlation.xlsx - the data are in the S4E2e Archive Excel.zip archive. Once column shows the Abundance and the other the Speed.
  • The S4E2e.RData file contains the data as a data.frame called fw.

These data can be used for exploration of correlation as well as some graphical summary (e.g. scatter plot).


Freshwater invertebrates (diversity)

Use these data for calculation of diversity index

Top

Freshwater invertebrates (diversity)

These data give the abundance of some freshwater invertebrates from Goredale Beck in Yorkshire. There is also some taxonomic information for each invertebrate recorded.

  • Freshwater invertebrates.xlsx - the data are in the S4E2e Archive Excel.zip archive. There is a column for the count of each taxa. Other columns give the taxonomic information (e.g. phylum, order).

You can use these data for looking at diversity. You can also practice transferring the data from Excel into R.


Growth (plant growth)

Use these data to look at curvilinear regression, i.e. linear regression with a logarithmic equation

Use the data for graphics (scatter plot and trendline)

Top

Growth (plant growth)

These data show the growth of a plant species in response to different levels of a nutrient.

  • Growth Logarithmic.xlsx - the data are in the S4E2e Archive Excel.zip archive. There are two columns Growth and Nutrient.
  • The S4E2e.RData file contains the data as a data.frame called pg.

These data show an interesting relationship. If you plot one variable against the other you'll see that the points "curve". In fact the relationship is a logarithmic one. You can use these data to look at curvilinear regression, in this case logarithmic regression. This is ordinary linear regression but with a logarithmic equation.

You can also use the data to look at graphical summary, e.g. a scatter plot with line of best fit (trendline).


Heather species

A 2x2 contingency table showing co-occurrence of 2 species

Use the data for tests of association (Chi squared) and for graphical summary (bar chart)

Top

Heather species

This dataset shows the abundance of two species of heather in Cornwall. The data are in the form of a contingency table. In total 137 quadrats were used and the presence of each species noted. The contingency table shows the frequency of occurrence, thus you have four options:

  • Both species present together in a quadrat
  • Calluna vulgaris present only
  • Erica cinerea present only
  • Neither species present (i.e. both absent)

The data can be used to explore the association between the two species.

  • heather.csv - the data are in the S4E2e Archive Excel.zip archive and will open in a spreadsheet.
  • The S4E2e.RData file contains the data as a data.frame called heather.

You can use the data for tests of association (e.g. the Chi squared test), and since this is a 2x2 contingency you can apply Yates' correction. You can also use the data for graphical summary (e.g. bar chart, pie chart). You can also use the CSV file to practice transferring data from spreadsheet to R.


Hoglouse abundance

Use these data for differences between >2 samples (e.g. Kruskal-Wallis test)

Use these data for data summary and for graphics (e.g. boxplot or bar chart)

Top

Hoglouse abundance

These data show the abundance of hoglouse (Asellus spp), a freshwater invertebrate, at three sampling locations.

  • Hoglouse.xlsx - the data are in the S4E2e Archive Excel.zip archive. These data are in sample format; there is a column of abundance data for each of the three sampling locations. The spreadsheet also contains a second worksheet giving the summary statistics and a bar chart with error bars.
  • The S4E2e.RData file contains the data as two data.frame objects:
    • hog2 - gives the data in recording layout, there is a column for count and a column for site.
    • hog3 - gives the data in sample layout, there is a column for each sample.

You can use the data for exploring differences between samples. In the book text you use these data for a Kruskal-Wallis test, which is a non-parametric test of differences between more than two samples. You can also use the data for practice at data summary and graphics (these data are used to draw bar charts in the book text). You could also subset the data and look at comparing just two samples.


Hornbill diet

Use these data for dissimilarity indices (e.g. Jaccard, Sørensen) and for dendrogram

Top

Hornbill diet

These data show the presence of different fruits in the diet of three species of hornbill from India (data adapted from Datta, A. & Rawat, G.S. 2003. Biotropica 35, p.208). The data are in the form of presence-absence, so if a fruit species was found in the diet a 1 is recorded, if the fruit was absent it is shown as 0.

  • hornbill.csv - the data are in the S4E2e Archive Excel.zip archive and will open in a spreadsheet. There is a column for the species names of the fruit (as an abbreviated scientific name). Each of the next columns shows the presence-absence of these fruits in the diet of three species of hornbill:
    • GH = great hornbill
    • WH = wreathed hornbill
    • OPH = oriental pied hornbill
  • The S4E2e.RData file contains the data as a data.frame, hornbill. The rownames have been set as the fruit species and the main data are the three columns of fruit presence-absence for the hornbill species.

You can use these data to look at similarity (dissimilarity) and to draw a simple dendrogram to show the relationship between the samples in terms of the presence of fruit species. You can also use the data for diversity (species richness).


Invertebrates and Habitat

Use these data for tests of association (e.g. Chi squared)

Use these data for graphical summary (e.g. bar chart, pie chart)

Top

Invertebrates and Habitat

These data show the frequency of observation of some terrestrial invertebrate taxa on different parts of plants. There are two datasets, which are similar. Both give the frequency of observation in the form of a contingency table.

  • invert habitat.csv - the data are in the S4E2e Archive Excel.zip archive and will open in a spreadsheet. The data are in the form of a contingency table. The first column gives five invertebrate taxa and there are three three sites (Upper, Lower, Stem).
  • invert.csv - the data are in the S4E2e Archive Excel.zip archive and will open in a spreadsheet. The data are in the form of a contingency table. The first column gives the names of four sites (Upper leaf, Lower leaf, Stem, Bud) and the subsequent columns show frequencies for three invertebrate taxa.
  • The S4E2e.RData file contains the data from invert habitat.csv as a data.frame called inv.hab.

You can use these data for tests of association (e.g. Chi squared) and for data summary (e.g. bar charts, pie charts). Note that the CSV dataset invert.csv is not duplicated in R, which means you can practice importing CSV data into R.


Leaf sizes

Use these data for data summary (e.g. running mean)

Use these data for graphical summary (e.g. line plot for running mean)

Top

Leaf sizes

These data show the sizes of tree leaves in millimetres. The main (Excel) dataset gives 10 samples, each of 10 measurements.

  • leaves.xlsx - the data are in the S4E2e Archive Excel.zip archive. There are 100 measurements, separated into 10 samples of 10 readings.
  • The S4E2e.RData file contains a vector object called lf, which gives just one of the samples from the XLSX file (the 2nd).

You can use these data to look at data summary and especially the calculation of running means and standard error. You can use them for graphical summary too, such as line plots and the running mean. If you want to use the entire dataset in R you'll need to transfer the data from the Excel file, which will give you some practice.

You could also explore differences between samples (pairs or several/all at once).


Mayfly (correlation)

Use these data for tests of correlation

Use these data for graphical summary (e.g. scatter plot)

Top

Mayfly (correlation)

These data show the abundance of a mayfly species and the speed of the water at the sampling location.

  • mayfly.csv - the data are in the S4E2e Archive Excel.zip archive. There is a column for Speed and one for the corresponding Abund.
  • The S4E2e.RData file contains a data.frame object called mayfly, which contains two columns.

Use these data for looking at correlation between the two variables. You can also look at graphical summary (scatter plot), as well as the data summary (averages, distribution).


Mayfly (regression)

Use these data for multiple regression

Use these data for graphical summary (e.g. scatter plot and line of best fit)

Top

Mayfly (regression)

These data give the sizes of a freshwater invertebrate and several environmental variables at the sampling location for each size measurement.

The data contains the following variables:

  • Length = the length of the invertebrate in mm
  • Speed = the water speed (time taken for a hydroprop to complete)
  • Algae = the percentage cover of algae on the substrate
  • NO3 = the concentration of nitrates
  • BOD = the biological oxygen demand

The dataset is in two forms:

  • mayfly regression.csv - the data are in the S4E2e Archive Excel.zip archive. The file contains five data columns plus an extra "index" column giving a simple number.
  • The S4E2e.RData file contains a data.frame called mf, which has five columns corresponding to the items listed earlier.

You can use these data for regression analysis (multiple regression), and for graphical summary (e.g. scatter plots, perhaps with a trendline). You can also look at the data summary and could conduct simple correlation between pairs of variables. The dataset provides a simple introduction to regression model building.


Mosses and trees

Use these data for analysis of similarity

Use these data for visualizing similarity (e.g. dendrogram)

Top

Mosses and trees

These data show the abundance of some bryophyte species on trees in North Carolina (data adapted from Palmer M.W. 1986. The Bryologist 89, p.59). The data are in a community layout, with the columns giving the sample names (the trees) and the rows being the bryophyte species.

  • Moss data.xls - the data are in the S4E2e Archive Excel.zip archive. The main worksheet gives the species abundance information and the second worksheet shows a completed dissimilarity matrix (Euclidean metric).
  • The S4E2e.RData file contains a data.frame called mosess, which has the data with rows as sites and columns as species (which is transposed from the layout in the Excel file).

You can use these data to calculate indices of similarity (dissimilarity) and indeed the Excel file contains a second worksheet with a completed Euclidean dissimilarity matrix. You can also use these data for visualizing dissimilarity (e.g. with a dendrogram).

You could also use the data to look at diversity indices and for graphical summary (e.g. bar charts, pie charts).


Newt presence-absence

Use these data for logistic regression and for regression model building

Top

Newt presence-absence

These data give the presence-absence of great crested newts at ponds in Buckinghamshire, UK. The presence of a newt is recorded with a 1 and the absence by a 0; hence the data are binary. The other columns give various habitat factors such as the area of the pond, an index based on presence of fish, and other factors.

  • Newt HSI.csv - the data are in the S4E2e Archive Excel.zip archive. There are several columns in the file:
    • presence = the presence or absence of newts (1 or 0).
    • area = the area of the pond in square metres.
    • dry = an index of how often the pond dries (1 = never, 2 = rarely, 3 = occasional, 4 = annually).
    • water = an index of water quality (1 = bad, 2 = poor, 3 = moderate, 4 = good).
    • shade = a value for the % shade (from trees and so on).
    • bird = an index for the presence of waterfowl (1 = absent, 2 = minor, 3 = major).
    • fish = an index for the presence of fish (1 = major, 2 = minor, 3 = possible, 4 = absent).
    • other ponds = the number of other ponds within 1 km.
    • land = an index of land use quality (for newts: 1 = bad, 2= poor, 3 = moderate, 4 = good).
    • macro = the % cover of macrophytes.
    • HSI = the overall Habitat Suitability Index (a measure of "how suitable" a pond might be as a habitat for newts).
  • The S4E2e.RData file contains a data.frame called gcn, which contains the data (there are 200 rows).

You can use the data for logistic regression, which is a form of generalised linear modelling (GLM). You can carry out logistic regression on single factors or build a regression model with several terms.


Pearson correlation data

Use these data for correlation between two variables

Use the data for data summary and graphics (e.g. scatter plot)

Top

Pearson correlation data

These data show the abundance of a freshwater invertebrate with corresponding flow rate.

  • Pearson.xlsx - the data are in the S4E2e Archive Excel.zip archive. There are two columns; abund and flow.
  • The S4E2e.RData file contains a data.frame called pearson, which contains the data.

Use these data to carry out correlation between the two variables. You can also use the data to look at data summary (including distribution) and for graphical summary (e.g. scatter plot).


Pea genetics

Use these data for Goodness of Fit testing (a form of association test using Chi squared)

Use these data for graphical summary (e.g. bar chart)

Top

Pea genetics

These data show the frequency of pea plants exhibiting combinations of coat colour and type.

  • peas.csv - the data are in the S4E2e Archive Excel.zip archive. These data are arranged with columns like so:
    • Colour = the colour of the pea (green or yellow)
    • Coat = the type of coat (wrinkled or smooth)
    • Obs = the number of peas for each combination of colour and coat
    • Ratio = the expected ratio of observations based on genetic theory
  • The S4E2e.RData file contains a simple vector called peas. This vector gives the frequency of observation for the various combinations of coat and colour.

You can use these data for tests of goodness of fit (a kind of association test), using Chi squared. You can also summarise the data graphically (e.g. bar charts).

Note that the R-format data only contains the observed frequencies. If you want to conduct a goodness of fit test in R you will have to incorporate the "expected" ratio data in some manner.


Plant species lists

Use these data for practice at cross tabulation using Pivot table in Excel or table() in R

Use these data for species richness (diversity) and similarity

Top

Plant species lists

These data give vascular plant species names for samples from 10 sites from a survey in Shropshire, UK.

  • Plant species lists.csv - the data are in the S4E2e Archive Excel.zip archive. The first column gives the site name as a simple abbreviation (there are 10). The second column give the scientific name of the species. There are 187 observations in total.
  • The S4E2e.RData file contains a data.frame called plrich. This gives the Site and Species names in two columns.

The data provide some practice at cross tabulation, using the Pivot Table in Excel or table() in R. You can re-arrange the data to give a presence/absence table, where 1 = presence of a species at a site and 0 is absence (in fact you will need to do that in order to determine species richness). The S4E2e.RData file contains an object called ps, which has already been tabulated.

Once the data are tabulated you can use them to explore species richness (a measure of diversity). You can also look at similarity (which you can also plot using a dendrogram).


Plant species and watering

Use these data for two-way ANOVA

Top

Plant species & watering

These data show the growth (in cm) of two plant species in response to three different watering regimes( low, high and middle).

  • Two way online.xlsx - the data are in the S4E2e Archive Excel.zip archive. The data are in a particular layout that allows Excel to calculate ANOVA. The data are arranged in an "on the ground" layout with samples in separate blocks. There is a separate worksheet containing a completed 2-way ANOVA (see the Exercises support page).
  • The S4E2e.RData file contains a data.frame called pw. This gives the data in recording layout; the response variable is height, and the two predictor variables are plant and water.

Use these data for two-way analysis of variance (2-way ANOVA). The analysis is straightforward in R but less so in Excel (see the Exercises support page for an online exercise on computing 2-way ANOVA using Excel). You can also use the data for graphical representation of results (e.g. boxplot or bar chart) as well as general data summary.


Plant species abundance

Use these data to explore diversity using Simpson's D index or Shannon-Wiener Index

Use these data to explore similarity and visualise it with a dendrogram

Top

Plant species abundance

These data give the abundance of some terrestrial vascular plants at 10 sites in Shropshire, UK. The data are the same as for the Plant species lists dataset but include the abundance information.

  • Plant species abundance.csv - the data are in the S4E2e Archive Excel.zip archive. These data are in community layout with the first column giving the species name (scientific binomial) and subsequent columns for each of the 10 sample sites. The data are based on average domin scores (an abundance scale similar to Braun Blanquet) from five quadrats.
  • The S4E2e.RData file contains a data.frame called psa. This gives the data with rows as species and columns as sites.

You can use these data to explore diversity using Simpson's or the Shannon index. You can also use these data to look at similarity, which you can also plot using a dendrogram.


Ridge and furrow meadow

Use these data to explore differences between two samples (t-test or U-test)

Use the data for summary statistics and graphics

Top

Ridge & Furrow meadow

These data show the abundance of meadow buttercup plants in one metre square quadrats in an ancient ridge & furrow meadow in Buckinghamshire, UK.

  • ridge furrow.xlsx - the data are in the S4E2e Archive Excel.zip archive. The file gives the data in sample layout, with a column for ridge and one for furrow. The spreadsheet also contains a completed t-test in a separate worksheet.
  • The S4E2e.RData file contains the data in four separate objects:
    • rf1 - gives that data in recording layout, with a column for count and one for area (Ridge or Furrow).
    • rf2 - gives the data in sample layout, with a column for Ridge and one for Furrow.
    • furrow - a separate object for the Furrow data sample.
    • ridge - a separate object for the Ridge data sample.

You can use these data to explore differences between two samples (t-test or U-test). You can also display the results graphically (e.g. bar chart or box-whisker plot). You can use the data for summary statistics (mean, median, IQR etc.), and to look at data distribution.

The data are in different forms in R-format so that you can explore how to use different syntax according to the layout of data you have.


Seashore seaweed

Use these data to convert a text-based ordinal scale with a numerical version

Use the =VLOOKUP function with a lookup table in Excel

Top

Seashore seaweed

These data show the abundance of some seaweed species on a rocky shore in South Devon, UK. The data are presented using a text-based abundance scale (ACFOR) and give the abundance of five species at 13 transect stations across the shore (the stations are different heights above mean tide height).

  • seashore.xlsx - the data are in the S4E2e Archive Excel.zip archive. The file contains two worksheets, with one giving an extra table of data where the ACFOR scale is converted to a numerical ordinal scale.

The data are intended to be used as an example of how to use the =VLOOKUP function in Excel. You use this to replace one item with another. In this case the text-based ordinal scale is replaced by a numerical scale.


Sward height

Use these data to explore differences between multiple samples (e.g. Kruskal-Wallis or ANOVA)

Use the data to look at summary statistics and graphics

Top

Sward height

These data give the height of vegetation (cm) in the sward at three sampling locations in a meadow in Shropshire, UK.

  • Sward height.xlsx - the data are in the S4E2e Archive Excel.zip archive. There are three columns, one for each of the samples; Upper, Middle, Lower.
  • The S4E2e.RData file contains the data in two data.frame objects:
    • sward2 - the data are in recording layout with a column for Height (the response variable) and a column for Site (the predictor variable).
    • sward3 - the data are in sample layout with a column for each of the three sites.

You can use these data for exploring differences between more than two samples, e.g. the Kruskal-Wallis test (non-parametric) or analysis of variance (ANOVA, parametric). You can also use the data to look at data summaries (e.g. median, mean) and graphically (e.g. boxplot of results or histogram of sample distribution).


Tree sizes and growth

Use these data to draw a line plot of growth over time

Top

Tree sizes and month

These data show the size of Sitka spruce trees measured at monthly intervals. The size is determined as the height multiplied by the diameter squared, on a log scale (i.e. log(h x d2)). The data are modified from an original (larger) dataset from within R.

  • The S4E2e.RData file contains the data in a data.frame object called tree. There are two columns, month and size.

Use these data to draw a line plot of spruce growth.


Whitefly

Use these data for a matched pairs test (e.g. Wilcoxon matched pairs)

Top

Whitefly

These data show the counts of whitefly (a greenhouse pest) attracted to different coloured sticky traps. Each trap is bi-coloured so the data are in the form of matched pairs.

  • Whitefly.xlsx - the data are in the S4E2e Archive Excel.zip archive. There are two columns, one for the count of whitefly on the White side and the corresponding count for the Yellow side.
  • The S4E2e.RData file contains the data in a data.frame object called whitefly. There are two columns, white and yellow.

You can use these data to carry out a matched pairs test (either a t-test or U-test, the latter also called Wilcoxon matched pairs). You can also summarise the results graphically.


Wilcoxon online exercise

Use these data for Wilcoxon matched pairs test

Top

Wilcoxon online exercise

These data are in the form of matched pairs (they are fictitious data).

You can use the data for Wilcoxon matched pairs analysis, as well as graphical summary. See the Exercises support page where there is an online exercise in using Excel to carry out a matched pairs test.


My Publications

Follow me...
Facebook Twitter Google+ Linkedin Amazon

See also:

Writer's Bloc
MonogRaphs
Tips & Tricks

Keywords

Here is a list of keywords: it is by no means complete!

T-test, U-test, Kruskal-Wallis, Analysis of Variance, Spearman Rank, Correlation, Regression, Logistic Regression, Curved linear regression, histogram, scatter plot, bar chart, box-whisker plot, pie chart, Mean, Median, Mode, Standard Deviation, Standard Error, Range, Max, Min, Inter-quartile Range, IQR

Top DataAnalytics Home
Publications
Contact GardenersOwn Homepage