Dr. Mark Gardener

 
About

Statistics for Ecologists Edition 2 Cover
Available soon from
Pelagic Publishing


Statistics for Ecologists Using R and Excel (Edition 2)

Data Collection, Exploration, Analysis and Presentation

by: Mark Gardener

Available soon from Pelagic Publishing

Welcome to the support pages for Statistics for Ecologists. These pages provide information and support material for the book. You should be able to find an outline and table of contents as well as support datafiles and additional material.

Support Index | Exercises Index | Outline & TOC | Data files


 

Exercise 7.1.2

Pelagic Publishing Logo

Table of Contents


Section 7.1.2

Use the t.test() command for the t-test.

Input can be in several forms.

Get exercise data here:

ridge furrow.RData

Top

7.1.2 Using R for the t-test, the t.test() command

This exercise is concerned with how to carry out the the t-test (Chapter 7) using R (Section 7.1.2).

Introduction

The t-test is used to compare the means of two samples that have a normal (parametric or Gaussian) distribution. The t.test() command carries out the t-test in R. The default is to compute the Welch two-sample test (unequal variances).

You can have your data in several forms:

  • Two separate samples as two data vectors.
  • Two seaprate samples but in a single data.frame object (i.e. sample format).
  • A response variable and a predictor variable (i.e. scientific recording format)

In any event you can use the t.test() command to carry out the t-test. The example data for this exercise are the same as in the book and you can get the data in the three forms as an RData file: ridge furrow.RData.

Once you have the data you can type the commands shown here for yourself and so follow along.


If you have separate vector samples name them in the t.test() command:

t.test(x, y)

Default carries out Welch 2-sample test

Add var.equal = TRUE to assume equal variance.

Top

Separate data objects

When you have two separate samples (probably as vector objects), you can just name them in the t.test() command.

> ridge ; furrow
[1] 4 3 5 6 8 6 5 7
[1]  9  8 10  6  7

> t.test(ridge, furrow)
 Welch Two Sample t-test
data:  ridge and furrow
t = -2.7584, df = 8.733, p-value = 0.02279
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-4.5598309 -0.4401691
sample estimates:
mean of x mean of y 
      5.5       8.0

The default carries out the Welch two-sample test (with modified degrees of freedom).

To carry out a t-test with the assumption that the variances are equal, you need to add var.equal = TRUE.

> t.test(ridge, furrow, var.equal = TRUE)

The result gives a slightly different value for t, df and of course p-value.


If data samples are contained inside a data.frame you need to:

Use $ syntax
Use attach()
Use with()

Top

Sample format

If your data are separate samples but contained within a data.frame, you'll need to alter your aproach very slightly so that you can "get at" the variables in the data.frame.

There are three main ways:

  • Use $ syntax to specify the data.frame and sample name explicitly.
  • Use attach() to place the variables in the search path.
  • Use with() to open the data.frame temporarily.

Here are the example data:

> rf2
Ridge Furrow
1 4 9
2 3 8
3 5 10
4 6 6
5 8 7
6 6 NA
7 5 NA
8 7 NA

Note that the shorter sample is padded with NA items.


Use data$variable syntax to specify a variable explicitly from a data.frame

Top

Use $ syntax

You can specify a variable by using the name of the enclosing object, a $ and the variable name:

> t.test(rf2$Ridge, rf2$Furrow)


Welch Two Sample t-test

data: rf2$Ridge and rf2$Furrow
t = -2.7584, df = 8.733, p-value = 0.02279

Note that R presents the name exactly as you typed it in the command.


Use attach() to open a data object and place its variables on the search() path.

Use detach() to close the object afterwards.

Any objects in the workspace with the same name are masked whilst attach() is "active"

Top

Use attach()

If you try to use a variable that is "inside" a data.frame you get an error:

> Ridge
Error: object 'Ridge' not found

One way around this is to use attach() to "open" the data.frame and allow the separate variables to be found in the search path. Once you have attached an object its contents appear when you use the search() command and can be used without needing the $ syntax.

Type search() to see the current search path (this is mine):

> search()
[1] ".GlobalEnv" "tools:RGUI" "package:stats"
[4] "package:graphics" "package:grDevices" "package:utils"
[7] "package:datasets" "package:methods" "Autoloads"
[10] "package:base"

Use attach() to open the data object you want:

> attach(rf2)

The rf2 object now appears in the search path:

> search()
[1] ".GlobalEnv" "rf2" "tools:RGUI"
[4] "package:stats" "package:graphics" "package:grDevices"
[7] "package:utils" "package:datasets" "package:methods"
[10] "Autoloads" "package:base"

Now you can use the variables within rf2 in your t-test:

> t.test(Ridge, Furrow)

Welch Two Sample t-test

data: Ridge and Furrow
t = -2.7584, df = 8.733, p-value = 0.02279

Note that R presents the names exactly as you typed them.

You should use detach() after you are done. This removes the item from the search() path. You can get confusion if you have data objects with the same name as those contained within data.frames.

> detach(rf2)

The attach() command will not overwrite any data objects, but if you open a data.frame and it contains items with the same names as existing objects, the attach()ed ones mask the others until you use detach().


The with() command opens a data object temporarily and so allows the variables to be "seen" by R commands.

Top

Use with()

The attach() command is useful but you do need to be careful to use detach() after you are done. An alternatice approach is to use the with() command, which acts like attach() but only for the duration of one command line.

with(data.name, ...)

So, you give the command the name of the data object you want to "open", followed by the command you want to execute. In that command you can give the variable names as they are and don't need the $ syntax.

> with(rf2, t.test(Ridge, Furrow, var.equal = TRUE)

In the example the variance is consiered equal.

So, you don't have to use detach() afterwards.


Use formula syntax y ~ x to specify
response ~ predictor
when you have data in scientific recording format

Top

Recording format

If your data are in scientific recording format then you'll have the data in a different form from that shown previously (sample format). You will have response variables and predictor variables. For a t-test you will have one response and one predictor e.g.

> rf1
count area
1 4 Ridge
2 3 Ridge
3 5 Ridge
4 6 Ridge
5 8 Ridge
6 6 Ridge
7 5 Ridge
8 7 Ridge
9 9 Furrow
10 8 Furrow
11 10 Furrow
12 6 Furrow
13 7 Furrow

This layout is more flexible than sample format but you need a slightly different way to specify the variables in your t-test. Essentially you give a formula y ~ x where y is the response (count) and x is the predictor (area). You can also give the name of the enclosing data object.

> t.test(count ~ area, data = rf1)

   Welch Two Sample t-test
data:  count by area
t = 2.7584, df = 8.733, p-value = 0.02279    

Note that R presents the names of the variables as they appear in the enclosing data object.

The formula syntax is very powerful and is used in many statistical and graphical commands. You can extend the formula for more complicated scenarios, such as analysis of variance and multiple regression.


Top
Support Index | Exercises Index | Outline & TOC | Data files
 
My Publications

My Publications

Managing Data Using Excel, Cover

See my personal pages at GardenersOwn

Follow me...
Facebook Twitter Google+ Linkedin Amazon

See also:

Writer's Bloc
MonogRaphs
Tips & Tricks

Keywords

Here is a list of keywords: it is by no means complete!

T-test, U-test, Kruskal-Wallis, Analysis of Variance, Spearman Rank, Correlation, Regression, Logistic Regression, Curved linear regression, histogram, scatter plot, bar chart, box-whisker plot, pie chart, Mean, Median, Mode, Standard Deviation, Standard Error, Range, Max, Min, Inter-quartile Range, IQR

Top DataAnalytics Home
Publications
Contact GardenersOwn Homepage