Dr. Mark Gardener

Home
About

Providing training for:

  • Ecology
  • Data analysis
  • Statistics
  • R The statistical programming language
  • Data management
  • Data mining

Tips and Tricks - for R and Excel

On these pages you can find tips, tricks and hints for using both R and Excel. At the end of each tip there are links forwards and backwards as appropriate. There is also an index of R tips and an index of Excel tips.

For most analytical purposes the combination of Excel and R is unbeatable! Excel is great as a data management tool and for preparing data for analysis. You can also use it to get an overview of your data or to make simple (and not so simple) graphs. R is an analytical "swiss army knife" and can carry out a mind-boggling array of analytical routines as well as producing great graphics.

Tips & Tricks for R | Tips & Tricks for Excel | An Introduction to R | MonogRaphs | Writer's Bloc


Use order() to select column order in a boxplot()

Use reorder() to change the order of a factor variable

Use ordered() to make a custom ordered factor variable

Top

Ordering up boxplot()

The boxplot() command is one of the most useful graphical commands in R. The box-whisker plot is useful because it shows a lot of information concisely. However, the boxes do not always appear in the order you would prefer. These notes show you how you can take control of the ordering of the boxes in a boxplot().

There are four main methods, which in turn depend on the layout of the data:

  • Use order() to select column order when you have separate samples (i.e. vectors, columns in a data.frame or a list).
  • Use [row, column] to select an explicit column order when you have separate samples.
  • Use reorder() to change the order of a factor variable according to a function (e.g. mean), when you have response and predictor variables.
  • Use ordered() to make a custom ordered factor variable when you have response and predictor variables.

There are subtle differences between these methods but essentially you are creating an index, which you can use in the boxplot() command to control the order the boxes appear in the plot.


Use x[row, column] syntax to specify an explicit order to plot columns in a boxplot()

This works for data.frame and matrix data objects where the columns are individual samples.

Top

Data in sample format

If your data are arranged as samples in a data.frame (or matrix) you can use boxplot() to plot the data in "one go". The order of the boxes will depend on the order of the columns.

> hog3
Upper Mid Lower
1 3 4 11
2 4 3 12
3 5 7 9
4 9 9 10
5 8 11 11
6 10 NA NA
7 9 NA NA
> boxplot(hog3)

You can specify an explicit order for the columns using column numbers:

> boxplot(hog3[, 3:1])

Selecting order of columns in a boxplot
The boxplot on the left uses the default column order.
The boxplot on the right uses an explicit order x[, columns].

Note the [row, column] syntax to specify the order for plotting.


Use apply() to get sample medians or means.

Use order() to get ascending or descending median or mean.

Use the new order in a boxplot() command using x[row, column] syntax.

Top

Order columns by a function

Rather than give an explicit order you may want to have the boxplot appear in order of some function (e.g. mean or median). You can use the order() command to arrange items in ascending (or descending) order. To proceed use these general steps:

  1. Use a command that gives you the values you require e.g. colMeans(), apply().
  2. Use the result from step 1 and make an order() result.
  3. Use the result of step 2 to define the order of the columns in the boxplot().

The apply() command is most flexible:

> m <- apply(hog3, MARGIN = 2, FUN = median, na.rm = TRUE)
> m
Upper Mid Lower
8 7 11

Now you can set an order based on the medians you calculated:

> o <- order(m, decreasing = FALSE)
> o
[1] 2 1 3

Use the x[row, column] syntax like before but use your calculated order:

> boxplot(hog3[, o])

If you want decreasing order set decreasing = TRUE.

Boxplot with variables ordered ascending and descending median
The order() command used to plot the samples in ascending and descending median order.

In this example apply() was used but any function that gives you a vector of "results" will work.


For list objects use lapply() to get a summary statistic per element.

Use unlist() on the result to get the order e.g. order(unlist(m))

A list has only 1 dimension so use boxplot(x[element]) when plotting.

Top

Data in a list

If your data are in a list you can use the same principles but need a slightly modified proceedure:

> hogl = list(U = hog3$Upper, M = hog3$Mid, L = hog3$Lower)
> hogl
$U
[1] 3 4 5 9 8 10 9

$M
[1] 4 3 7 9 11 NA NA

$L
[1] 11 12 9 10 11 NA NA

Use the lapply() command to work out the median over the list elements.

> m <- lapply(hogl, median, na.rm = TRUE)

If you try to order() the result you get an error, so you must unlist() the result first:

> order(unlist(m))
[1] 2 1 3

Now save the new order and use it in the plot.

> o <- order(unlist(m))
>
boxplot(hogl[o])

Note that you don't use [row, column] for the list, just give [element], as the list is one-dimensional.


Use reorder() to make a new predictor variable sorted by a function (e.g. median)

Use the new predictor in the boxplot() command

The order will be ascending. To make a descending result use negative response i.e. reorder(pred, -resp, FUN)

Top

Order a factor using a function

When your data are in scientific recording format you will have a column for each variable and will have response variables and predictor variables e.g.

> hog2
count site
1 3 Upper
2 4 Upper
3 5 Upper
4 9 Upper
5 8 Upper
6 10 Upper
7 9 Upper
8 4 Mid
9 3 Mid
10 7 Mid
11 9 Mid
12 11 Mid
13 11 Lower
14 12 Lower
15 9 Lower
16 10 Lower
17 11 Lower

These are the same data as before but in a more "sensible" layout. However, when you try a boxplot() you get the boxes plotted in alphabetical order.

You can use the reorder() command to reorder a predictor variable by a function applied to the response variable. In other words you can determine the order of the boxes using a median or other function. Use the following general process:

  1. Use reorder(predictor, response, FUN) to determine an order for the predictor variable.
  2. Use the result of reorder() in place of the original predictor variable in the boxplot() command.

> bpm <- with(hog2, reorder(site, count, FUN = median))
> boxplot(count ~ bpm, data = hog2)

Here the with() command is used to "see inside" the hog2 data. You could use:

> attach(hog2)
> bpm <- reorder(site, count, FUN = median)
> detach(hog2)

The result is ordered ascending. If you want a descending order simply add a minus sign in front of the response variable:

> bpm <- with(hog2, reorder(site, -count, FUN = median))
> boxplot(count ~ bpm, data = hog2)

The proceedure works with multiple predictors but you can only reorder() one at a time.


Use ordered() to make an ordered factor.

Use the ordered factor in your boxplot() command.

Top

Make a factor in an explicit order

You can make a factor variable into an explicit order using the ordered() command. You just give the name of the factor you want to order and then the names of the levels in the order you want.

The result of the ordered() command is an ordered factor. The upshot is that the order you set will take prescedent over the default alphabetical order.

> o <- ordered(hog2$site, levels = c("Upper", "Lower", "Mid"))
> o
[1] Upper Upper Upper Upper Upper Upper Upper Mid Mid Mid Mid Mid
[13] Lower Lower Lower Lower Lower
Levels: Upper < Lower < Mid
> boxplot(count ~ o, data = hog2)

Top << Previous tip: Making transparent colors >> Next tip: Incomplete final line error
 
Follow me...
Facebook Twitter Google+ Linkedin Amazon
Top Tips & Tricks Home Index of R Tips Index of Excel Tips  
More links:

An introduction to R

See my Publications about Excel, R, statistics and data analysis Courses in R, data analysis, data management and statistics Visit the R Project website
 

See my Publications about statistics and data analysis.

MonogRaphs: random topics in R

Writer's Bloc – my latest writing project includes R scripts

Courses in data analysis, data management and statistics.

My Publications about statistics and data analysis

Managing Data Using Excel, Cover

See my personal pages at GardenersOwn


Top Home
Data Analysis
Contact GardenersOwn Homepage