Dr. Mark Gardener

Statistics for Ecologists Using R and Excel.

Edition 2 is on the way.

Writer's Bloc

On this page you can find out about my latest writing project. I'll post updates on progress, tables of contents and also some of the R scripts (and possibly Excel spreadsheets) I am developing in support of the new book. I'll try to keep the material reasonably up to date.

The Writer's Bloc homepage contains a table of contents and an index of the pages that contain custom R commands and R scripts.

I am working on a new edition of my book Statistics for Ecologists Using R and Excel. I am currently revising the chapter(s) on graphics and thought I'd make some notes about Tally plots...

Statistics for Ecologists Using R and Excel:
Data collection, exploration, analysis and presentation

Original Edition 1 Available now from
Pelagic Publishing

Get a 20% discount on "Statistics for Ecologists" when you buy direct from the publisher! Enter the voucher code S4E20 in the shopping basket at Pelagic Publishing.

Visualizing Data distribution:

Make a Tally plot using Excel

Data distribution is important. You need to know the shape of your data so that you can determine the best:

• Summary statistics
• Analytical routines

Some statistical tests use the properties of the normal (Gaussian) distribution, whilst others use data ranks. So, it's important to know if your data are normally distributed or otherwise.

The classic way to look at the shape of your data is with a histogram, showing the frequency of observations that fall into certain size classes (called bins). However, you can make a "quick and dirty" histogram using pencil and paper, a tally plot. The tally plot is useful as it is something you can do in a notebook in the field as you collect data.

A classic histogram has a continuous x-axis.

Use the hist() command in R to make a classic histogram.

Top

The classic histogram

A classic histogram uses an x-axis that is a continuous variable, like the following example drawn using R:

A classic histogram has a continuous x-axis (this drawn using R).

The x-axis is split according to the bin boundaries and the bars show the frequency of observations that fall between two bin boundaries. This classic histogram is easy to draw using R with the hist() command. The command works out the axis breakpoints and calculates the frequencies for you.

Excel uses a column chart in lieu of a histogram.

The x-axis is therefore discontinuous and split into categories representing the bins.

Top

Histograms in Excel

Excel cannot draw a "proper" histogram; the nearest it can get is a column chart with separate categories representing the bin classes:

In Excel the column chart is used in lieu of a true histogram
The x-axis is split into categories representing the bin sizes.

This is not so terrible, but it is not quite the same as a true histogram. In the preceding example the bars have been widened to reduce the gap width, which makes the chart appear more like a real histogram.

Calculate frequency of observation in each bin class using Excel's FREQUENCY function.

FREQUENCY is an array function and places a result in several cells at once.

Top

Using Excel to calculate data frequency

You can draw the histogram (column chart) in Excel easily enough but need to compute the frequency of observations for each bin class. To do this you need the FREQUENCY function.

FREQUENCY(data_array, bins_array)

To use the function follow these steps:

1. Make sure you have your data!
2. Work out the minimum value you'll need (the MIN function is helpful).
3. Work out the maximum (use the MAX function).
4. Determine the interval you need to give you the number of bins you want.
5. Type in the values for the bins somewhere in your worksheet.
6. Highlight the cells adjacent to the bins you just made, these will hold the frequencies.
7. Type the formula for FREQUENCY, highlighting the appropriate cells (or type their cell range) for the data and bins.
8. Do NOT press Enter since FREQUENCY is an array function. Instead press Ctrl+Shift+Enter (on a Mac Cmd+Shift+Enter), which will complete the function and place the result into all the cells you highlighted.

The FREQUENCY function places a result into an array of cells.

Now you have the bins and the frequency so you can build your histogram (column chart), using the Insert > Column Chart button.

The Analysis ToolPak can calculate frequencies and optionally draw a column chart (histogram).

Top

Frequency calculation via Analysis ToolPak

You could skip the frequency calculation stage altogether and use the Analysis ToolPak (Insert > Data Analysis), which will also draw a column chart (histogram) for you if you like.

If you cannot see the Data > Data Analysis button you may not have the ToolPak installed. Go to the Excel options and the Add-Ins section, where you can enable it.

The Histogram section of the Analysis ToolPak requires you to have some data and a range of bins. So:

1. Make sure you have your data!
2. Work out the minimum value you'll need (the MIN function is helpful).
3. Work out the maximum (use the MAX function).
4. Determine the interval you need to give you the number of bins you want.
5. Type in the values for the bins somewhere in your worksheet.
6. Click the Data > Data Analysis button.
7. Scroll down and choose the Histogram option.
8. Fill in the boxes for the data range and the bins ranges.
9. Click OK and the Analysis ToolPak will compute the frequencies for you.

The Analysis ToolPak Histogram dialogue box will compute frequencies
and optionally draw a column chart (histogram).

Once you've clicked OK you should see the results, the bins will be repeated alongside the appropriate frequencies. If you selected to have a chart output then you'll see a column chart too.

If you do not have Windows or indeed Excel, then you can't use the Analysis ToolPak but you can still use the FREQUENCY function. Both FREQUENCY and REPT are part of the armoury of OpenOffice and LibreOffice (and probably others).

Use REPT to copy text (a tally mark) a number of times (frequency) to build a tally plot in Excel.

Top

Making a Tally plot

You may not want a graphic and require only a simple tally plot. It is easy to do this in Excel using the FREQUENCY and REPT functions. You use the FREQUENCY function as described before to determine the frequency of observation in various bins. Once you have the frequencies you use the REPT function to repeat some text a number of times (corresponding to the frequency).

The REPT function repeats some text a number of times.
Use with FREQUENCY to make a tally plot.

In the example I have set the character text I want to use in cell D6. The frequencies are in E2:E12. The REPT function takes the text I want to display as a tally mark and repeats it by the number on the frequency column. Note the cell reference D\$6, which "fixes" the row so that the formula stays correct as it it copied down (from G2 to G12).

You could use the text directly in the REPT formula e.g. =REPT("X", E2) but then if you wanted to alter the tally plot you'd need to alter all the cells. If you point to a single cell you can alter the tally plot simply by typing a new tally character into that one cell.

The tally plot doesn't really replace the histogram but it can be useful to have a plain text representation of your data rather than a graphic. You can copy the spreadsheet cells into various programs.

Bins
16 X
18 X
20 X
22 XXX
24 XXX
26 XXXXX
28 XXX
30 XXX
32 XXX
34 X
36 X

If it not too hard get the tally plot oriented with the tallies vertical. You'll need to format the tally cells so that they are oriented at 90˚ but otherwise this is straightforward. However, in the "vertical" orientation the cells do not copy/paste so nicely!

Top

Providing training for:

• Ecology
• Data analysis
• Statistics
• R The statistical programming language
• Data management
• Data mining

See my personal pages at GardenersOwn