Dr. Mark Gardener


Managing Data Using Exce, Cover

Managing Data Using Excel

Writer's Bloc

On this page you can find out about my latest writing project. I'll post updates on progress, tables of contents and also some of the R scripts (and possibly Excel spreadsheets) I am developing in support of the new book. I'll try to keep the material reasonably up to date.

The Writer's Bloc homepage contains a table of contents and an index of the pages that contain custom R commands and R scripts.

Managing Data Using Excel:
Organizing, Summarizing and Visualizing Scientific Data

Available now from
Pelagic Publishing

Pelagic Publishing

Get £5 discount when you buy direct from the publisher. Enter voucher code MDUE20 in the shopping basket at Pelagic Publishing

Use your data like a database

Using a scientific recording format allows your data to be utilized in the widest manner and gives them great flexibility. You should maintain your data so that each column represents a separate variable, whilst each row is a single record a.k.a. observation or replicate.

The scientific recording format is the arrangement expected by most computer programs that carry out serious statistical analyses. This layout also allows you a great deal of flexibility in exploring and managing your data. You can use this layout like a database, and Excel has the tools that enable you to utilize the data easily.

In the first chapter I wanted to show how easy it is to use your data like a database. There are two key elements to making the best use of the data:

  • Pivot Tables
  • Index Variables

The Pivot Table is the way Excel uses to extract the information from your database and present it, in tabular form. You can make a Pivot Table using the Insert > Pivot Table button on the Ribbon menu in Excel.

The Pivot Table is able to use your data variables to present summaries of your data in many ways. However, if you include a few index variables you can extend the power and flexibility of your dataset a great deal. This also helps when it comes to error checking (which is covered in chapter 3 of the book).

A Pivot Table allows you to arrange and rearrange your data easily

Use Insert > Pivot Table button from the Ribbon menu


Pivot Tables

The Pivot Table essentially takes the columns of your data and creates summaries based on their content. By using several variables you can cross-tabulate your data and get summaries based on the various groupings in your dataset. For example, you may have a variable representing a measured variable, like weight. This is a numeric variable (it is also probably your response variable). You may have another variable representing some kind of grouping, perhaps the weight was recorded for different species. This variable is a factor (it's categorical and has a series of levels) and is a predictor variable. The Pivot Table is able to give you the average weight per group (there are other statistics you can use).

The example given is very simple but I hope you get the idea. Once you realize what Pivot Tables can do you'll wonder how you ever managed without them!

The Pivot Table operation is started from the Insert > Pivot Table button on the Ribbon menu. You don't need to select any data but you do need to click once anywhere in the block of data, Excel will expand around the place you clicked to include all the data it finds. Once you've clicked in the data use the Pivot Table button to open the Create Pivot Table menu window.

Create Pivot Table menu window
Use Insert > Pivot Table button to open the Create Pivot Table menu window

The data should be selected automatically – look for the marching ants in the background. At the bottom of the menu you get to choose the destination for the results, usually a New Worksheet is the best option.

After you click OK you'll get to create and adjust your Pivot Table. You are presented with a blank table and the Pivot Table Field List. You use this list menu box to construct the Pivot Table.

Pivot Table Field List menu window
The Pivot Table Field List menu window allows you to manage your Pivot Table

The top part shows the available variables (called Fields). These correspond to the columns in your dataset. The lower part shows four boxes:

  • Column Labels – items here form columns in the Pivot Table.
  • Row Labels – items here form rows in the Pivot Table.
  • Values – items here form the data that is sliced-up and grouped by the rows and columns.
  • Report Filter – this is an overall grouping, which is applied to the entire Pivot Table.

You click and drag items from the list box at the top to the boxes at the bottom to make the Pivot Table. The boxes can contain multiple items.

The Values box defaults to summarizing the data using the Sum. You can click on the item to bring up a menu allowing you to alter the summary, to the mean, count or standard deviation for example. The Report Filter allows you to apply a Filter to the entire Pivot Table, based on the field(s) in that section. You can also use a Filter for any of the other variables in the Pivot Table.

Once you've got a Pivot Table you'll see the Pivot Table Tools menus appear in the Ribbon. These allow you access to a range of tools that enable you to alter the appearence of your Pivot Table.

Pivot Table Tools menus in Ribbon
The Pivot Table Tools menus in the Ribbon provide tools to manage your Pivot Table

Once you have a Pivot Table you can use the Pivot Table Tools menus to alter it and can also drag and rearrange the items in the Pivot Table Field List menu window. This allows you to explore your data without altering the original content.

Summaries are useful but there are times when you want to rearrange your dataset and display the individual data entries (the records, observations, replicates) in different ways. This is where adding index variables becomes helpful.

Index variables can help you manage your data

Use =COUNTIF function to make index variables


Index variables

The Pivot Table uses column variables to cross-tabulate your dataset. Data are summarized using various functions, the default being the Sum. It is often helpful to make additional index variables to add to your dataset. These can help you to rearrange your samples and display the original data, rather than a summary.

I am going to cover the use of index variables in Chapter 2 but in the opening chapter I've included some details about how to use the =COUNTIF function to help you create index variables quickly and easily.

The =COUNTIF function looks at a range of cells and returns a numeric result based on how many of those cells match a criterion, which you provide. You simply need a criterion that "matches" the variable you want to count, and which increases as you copy the formula down the column.

COUNTIF formula
The =COUNTIF formula returns a value based on the number of cells in the range that matches the criterion you set

In the example the column B (Wool) is being indexed. The formula in cell E2 looks at the cell range B$2 to B2 and matches... itself! The $ indicates that row 2 is "fixed". This forms the top row of the cell range so that when you copy the formula down the column row 2 will remain.

This is a quick way to make an index based on a single variable. There will be more about the =COUNTIF function and index variables in Chapter 2 (including how to make indices based on multiple variables). Note that the =COUNTIF function is quite menory hungry so if you have a very large dataset your file size may be quite large. Look out for this and if necessary only create and use the index variables as they are needed. For most datasets this is not likely to be a problem. The index variables are especially useful for error checking and there will be more of this in chapter 3.


Providing training for:

  • Ecology
  • Data analysis
  • Statistics
  • R The statistical programming language
  • Data management
  • Data mining

Follow me...
Facebook Twitter Google+ Linkedin Amazon
Contact DataAnalytics Homepage