Dr. Mark Gardener

Data Analysis Publications Courses About

On this page...

Introduction

Data files

Inputting data

Seeing your data in R

What data are loaded?

Removing data files

Help and Documentation


Using R for statistical analyses - Introduction

This page is intended to be a help in getting to grips with the powerful statistical program called R. It is not intended as a course in statistics (see here for details about those). If you have an analysis to perform I hope that you will be able to find the commands you need here and copy/paste them into R to get going.

On this page learn how to create data files, read them into R and generally get ready to perform analyses. Also find out about getting further help and documentation.

What is R? | Topic Navigation Index| R Tips, Tricks and Hints | MonogRaphs | Go to 1st Topic


I run courses in using R; these may be held at various locations:

If you are interested then see our Courses page or contact us for details.


My publications about R

See my books about R on my Publications page

Statistics for Ecologists | Beginning R | The Essential R Reference | Community Ecology

Community Ecology is in production and expected by the end of 2013 from Pelegic Publishing.
Statistics for Ecologists
is available now from Pelagic Publishing. Get a 20% discount using the S4E20 code!
Beginning R is available from Wrox the publisher or see the entry on Amazon.co.uk.
The Essential R Reference is available from the publisher Wiley now (see the entry on Amazon.co.uk)!

I have more projects in hand - visit my Publications page from time to time. You might also like my random essays on selected R topics in MonogRaphs. See also my Writer's Bloc page, details about my latest writing project including R scripts developed for the book.


Skip directly to the 1st topic

R is Open Source
R is Free

Get R at the R Project Page

See my books about R on the Publications page

What is R?

R is an open-source (GPL) statistical environment modeled after S and S-Plus. The S language was developed in the late 1980s at AT&T labs. The R project was started by Robert Gentleman and Ross Ihaka (hence the name, R) of the Statistics Department of the University of Auckland in 1995. It has quickly gained a widespread audience. It is currently maintained by the R core-development team, a hard-working, international team of volunteer developers. The R project web page is the main site for information on R. At this site are directions for obtaining the software, accompanying packages and other sources of documentation.

R is a powerful statistical program but it is first and foremost a programming language. Many routines have been written for R by people all over the world and made freely available from the R project website as "packages". However, the basic installation (for Linux, Windows or Mac) contains a powerful set of tools for most purposes.

Because R is a programming language it can seem a bit daunting; you have to type in commands to get it to work. However, it does have a Graphical User Interface (GUI) to make things easier. You can also copy and paste text from other applications into it (e.g. word processors, but beware of "smart quotes"). So, if you have a library of these commands it is easy to pop in the ones you need for the task at hand. That is the purpose of this web page; to provide a library of basic commands that the user can copy and paste into R to perform a variety of statistical analyses.


Top

Navigation index

Introduction

Getting started with R:

Top
What is R?
Introduction
Data files
Inputting data
Seeing your data in R
What data are loaded?
Removing data sets
Help and Documentation


Data2

More about manipulating data and entering data without using a spreadsheet:

Making Data
Combine command
Types of Data
Entering data with scan()
Multiple variables
More types of data
Variables within data
Transposing data
Making text columns
Missing values
Stacking data
Selecting columns
Naming columns
Unstacking data


Help and Documentation

A short section on how to find more help with R

 

Basic Statistics

Some statistical tests:

Basic stats
Mean
Variance
Quantile
Length

T-test
Variance unequal
Variance Equal
Paired t-test
T-test Step by Step

U-test
Two sample test
Paired test
U-test Step by Step

Paired tests
T-test: see T-test
Wilcoxon: see U-test

Chi Squared
Yates Correction for 2x2 matrix
Chi-Squared Step by Step

Goodness of Fit test
Goodness of Fit Step by Step


Non-Parametric stats

Stats on multiple samples when you have non-parametric data.

Kruskal Wallis test
Kruskal-Wallis Stacked
Kruskal Post-Hoc test
Studentized Range Q
Selecting sub-sets
Friedman test
Friedman post-hoc
Rank data ANOVA

 

Correlation

Getting started with correlation and a basic graph:

Correlation
Correlation and Significance tests
Graphing the Correlation
Correlation step by step


Regression

Multiple regression analysis:

Multiple Regression
Linear regression models
Regression coefficients
Beta coefficients
R squared
Graphing the regression
Regression step by step


ANOVA

Analysis of variance:

ANOVA analysis of variance
One-Way ANOVA
Simple Post-hoc test
ANOVA Models
ANOVA Step by Step

 

Graphs

Getting started with graphs, some basic types:

Introduction
Bar charts
Multi-category
Stacked bars
Frequency plots
Horizontal bars

Histograms

Box-whisker plots
Single sample
Multi-sample
Horizontal plot


Graphs2

More graphical methods:

Scatter plot

Stem-Leaf plots

Pie charts


Graphs3

More advanced graphical methods:

Line Plots
Plot types
Time series
Custom axes


Top

Navigation Index

 

 

R maintains a list of previous commands. Use the up and down arrows to scroll through them. You can then use the left and right arrows to edit and modify the command.

Introduction

Once you have installed R and run the program you will see an opening window and a message along these lines:

R : Copyright 2006, The R Foundation for Statistical Computing
Version 2.3.1 (2006-06-01)
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

[Previously saved workspace restored]

>

The > is the "prompt", this is the point where you type in commands (or paste them in from somewhere else). The window you see is part of the GUI and some operations are possible from the menus (including quit). You will generally be asked if you wish to save the workspace. R stores a list of commands and any data sets that are loaded. It can be pretty useful to say "yes" and to save the workspace. The command history is available by using the up and down arrows. You can easily scroll back through previous commands and edit them if needed. You can copy items from previous commands or in fact from any window on the screen and paste them into the current command line. You can also use the left and right arrow keys to move through the current command.


Top

Navigation Index

Data files

You are going to need some data to perform your analyses on. You can type your data into R directly but it is usually much better to use a separate program to hold the information. A spreadsheet is an invaluable tool for this as you can manipulate the data quite easily. R can read plain text files in various formats (e.g. tab delimited, space delimited, comma delimited) and most spreadsheets can save data in these ways. The most useful is comma delimited (.CSV), which R can handle quite easily.

The layout of the data file will depend upon the analysis you are going to run:

You can create a CSV file in a spreadsheet or a word processor. A spreadsheet is the most useful tool as you can easily manipulate the data later on.
In this case you have multiple variables arranged in columns. The rows are the replicates. This sort of arrangement is useful for analysis of variance and multiple regression. However, it can also be used for comparing just two factors (you don't need to use all the information) as in a t-test.
Multiple variables
Count
Site
Sward
Temp
Grass%
12
a
23
18
44
17
b
11
21
75
In this case you have heading on both columns and rows. You have the same information as above and a bit extra. The data may be used for the same kinds of analysis as before but could also be used for tests of association (e.g. Chi-squared) or for ordination.
Rows and columns
 
Site a
Site b
Site c
Site d
Spp 1
23
17
9
11
Spp2
47
19
22
15
In this instance you have two columns (samples) but the number of replicates is different. R reads the file as a rectangular frame and blank cells are recorded as NA. This may have to be taken account of in some analyses but for now we can assume it is not a problem.
Simple two-factor
Upper
Lower
12
7
9
6
7
3
6
 
You may also have data merely as numbers without any labels at all. This is not really to be recommended although R will assign row and column numbers to the data.

Top

Navigation Index

 

R stores everything as variables. Your variable names can contain letters and numbers but the only puctuation mark allowed is a full stop.

Inputting data

The next step is to get your data into R. If you have saved your data in a .CSV file then you can use the read.csv(filename) command to get the information. You need to tell R where to store the data and to do this you assign it a name. All names must have at least one letter (otherwize it is a number of course!). You can use a period (e.g. test.1) but no other punctuation marks. R is case sensitive so the variable test is different from Test or teSt.

What you need to do is to copy the appropriate command into the clipboard. Then paste into R at the > prompt. You can then edit the command as you like and when ready press the enter key.

Reading data files
This command reads a .CSV file into R. You need to specify the exact filename. variable = read.csv(filename)
This command reads a .CSV file but the file.choose() part opens up an explorer type window that allows you to select a file from your computer. By default R will take the first row as the variable names. variable = read.csv(file.choose())
This reads a .CSV file, allowing you to select the file, the header is set explicitly. If you change to header=F then the first row will be treated like the rest of the data and not as a label. variable = read.csv(file.choose(), header=T)
In this case you can tell R that a specified column contains row names. This is likely to be the first so edit the # to 1. variable = read.csv(file.choose(), row.names=#)

To get a file into R with basic columns of data and their labels use:
> variable = read.csv(file.choose(), header=T)

To get a file into R with column headings and row headings use:
> variable = read.csv(file.choose(), row.names=1)

N.B. There are occasions when R won't like your data file. Check the file carefully. In some cases the addition of an extra linefeed at the end will sort out the issue. To do this open the file in a word processor and make sure that non-printing characters are displayed. Add the extra carriage return and save the file.


Seeing your data in R

Once you have persuaded R to read your data you will naturally want to check it is there! To view data stored in R you merely type the name of the variable that you stored it as.

Top

Navigation Index

In the case on the right you had both row and column headers. When you type in the variable name you see the data framed more or less like this.
 
Hedge
River
Wood
Pip
21
43
77
Daub
23
11
32
Noct
26
9
11
Fruit
54
15
8
Leaf
54
43
7
 

In this case you only had column headings. When displayed R adds a simple number to each row.

If you had neither row or column headings then the columns would also be numbered (in square brackets).

 
Upper
Lower
Old
New
1
3.0
5.1
7.0
8.0
2
4.0
4.7
6.8
7.0
3
5.0
4.3
5.2
6.0
4
3.0
3.8
3.8
7.0
5
2.9
5.2
NA
6.5
6
NA
6.4
NA
5.8
7
NA
NA
NA
6.1
 

If you wish to view only a single variable (i.e. column) from your data set then you can. Simply add the variable name to the end of the data name along with a dollar sign so: bats$Hedge or field$Upper might be examples from the above two data sets.

It is not terribly convenient to have to append the $variable every time you want to do something on a data set. R provides a way to read these variables directly. Here is an example:

 
This reads in a .CSV file and assigns it to the variable field. The header is set to "True" by default and you don't have row names so you can use this short version. The file.choose() part opens up an explorer type window and allows you to pick the file from your computer (unless you use Linux – you have to type the filename in full). field = read.csv(file.choose())
This looks at the data set field and reads the names of the variables. It then sets each one as a variable in its own right. So in the example above you would now have new additional variables called Upper, Lower, Old, New. attach(field)

Now you can look at the overall data set e.g.

> field

You can look at a single factor e.g.

> Upper

 

So, it is a good habit to get into to read in your data set and then use the attach(data) function immediately. Use meaningful factor names and avoid single letters (e.g. x, y). If you already have a variable called by the same name it will be overwritten. You can avoid confusion by only working on one set of data at once.


Top

Navigation Index

What data are loaded?

To see what data, variables etc. are loaded in R you can type a simple command:

> ls()

This lists the variables in memory.

In Windows you can list all the "objects" in memory from the Misc menu on the GUI toolbar.
On a Mac you can do something similar using the Workspace menu. The Mac version also has a "workspace browser". This shows all the variables and their properties (you can also view the items).

In both operating systems you can save the current workspace to a file (you can also read in a previously saved workspace). This will save any data and variables currently in memory (on Windows use the File menu and on the Mac use Workspace).

You can also get a list of the variables for each dataset by typing:

> names(dataset)


Top

Navigation Index

Removing data sets

To remove a variable you can type a simple command:

> rm(variable)

This will remove the variable (in this case called variable) from the memory. If you have variables that are attached to your data they don't show up. You can do the opposite of attach(data) and detach(data), which removes them if and when the data are removed with rm(data).

In Windows you can remove all the "objects" in memory from the Misc menu on the GUI toolbar.
On a Mac you can do something similar using the Workspace menu.
This should be used with caution!
The Mac version also has a "workspace browser". This shows all the variables and their properties as well as allowing you to remove them.




 

Top

Navigation Index

Help and Documentation

My Publications

See my books about R at my Publications Page:

Statistics for Ecologists using R and Excel. Published December 2011

Beginning R: The Statistical Programming Language. Available wherever great books are sold in June 2012

The Essential R Reference. Published November 2012

Documents

There are plenty of sources of help and information regarding R. Most are to be found on the R-Project website. Look under the 'Documentation' section. In the manuals section the "Introduction to R" document is a good start (available as HTML or a PDF). Also very good are:

“Using R for Data Analysis and Graphics - Introduction, Examples and Commentary” by John Maindonald [PDF].
“Simple R” by John Verzani [PDF]

These are available via the 'Contributed Documentation' section.

Courses

I run courses in using R as well as basic statistics and data management – check the Courses page for more details.


Help within R

The help system within R is comprehensive. There are several ways to access help:

Click on the 'Help' menu. There are a number of options available (depending upon your OS) but the main documentation is in the form of HTML.

If you want help on a specific command you can enter a search directly from the keyboard:

> help(keyword)

A shortcut is to type:

> ?keyword

This is fine if you know the command you want. If you are not sure of the command you can try the following:

> apropos("part.word")

You type in a part.word and R will list all commands that contain that string of letters. For example:

> apropos("rank")
[1] "count.rank" "dsignrank" "psignrank" "qsignrank" "rsignrank" "rank"

This shows that there are actually 6 commands containing "rank"; we can now type help() for any of those to get more detail.

If you run the HTML help you will see a heading entitled "Packages". This will list the packages that you have installed with R. The basic package is 'base' and comes with another called 'stats'. These two form the core of R. If you navigate to one of those you can browse through all the commands available.

R comes with a number of data sets. Many of the help topics come with examples. Usually these involve data sets that are already included. This allows you to copy and paste the commands into the console and see what happens.


  Back to Data Analysis page | R Tips & Tricks | MonogRaphs | Forward to More about data
Top
Main Homepage