Dr. Mark Gardener

Home
About
 

Index of MonogRaphs

R: MonogRaphs

A series of essays on random topics using R: The Statistical Programming Language

R is a powerful and flexible beast. Getting started using R is not too difficult and you can learn to start using R in an afternoon. However, mastering R takes rather longer! These monographs are my way of exploring various topics in a completely unstructured manner.

Tips & Tricks for R | An Introduction to R| Writer's Bloc | Courses


The 3 Rs:

Reading data
Writing Data
Doing math (Arithmetic)

Top

The 3 Rs: Reading, wRiting and aRithmetic

Part 1. Reading

I remember reading about the 3 Rs as the most important topics in education: reading writing and arithmetic. I thought it would be interesting to examine these in an R context e.g. reading data, writing data and doing maths. This page is all about reading, parts 2 and 3 are about wRiting and aRithmetic respectively.

Index for Part 1.

 

Reading

Reading from disk files

Reading R objects from disk

Using the keyboard

Using scan() to read a disk file
Use the OS to open R objects

Using the clipboard

Comments in data files
If R is not running
Data in a spreadsheet
The working directory
If R is already running
Multiple paste
Reading from URL
Get binary files with load()
Rows of data
Reading multi-column files
Get text with dget() or source()
End of line characters
Column headings
Un-named R objects as text
Converting separators
Row headings
Named R objects as text
 
read.xxxx() commands
R script files
 
Column formats
 
   

Reading other types of file

   
Package foreign
   
Package xlsx

MonogRaphs Index | Part 2. wRiting


Reading:

Data
Objects
Scripts

Top

Reading

You can probably think of "reading matter" as being one of three sorts:

  • data
  • R objects
  • scripts

Data is a general term and by this I mean numbers and characters that you might reasonably suppose could be handled by a spreadsheet. Your data are what you use in your analyses.

R objects may be data or other things, such as custom R commands. They are usually stored (on disk) in a format that can only be read by R but sometimes they may be in text form.

Scripts are collections of R commands that are designed to be run "automatically". They are generally saved (on disk) in text format.


Use the keyboard:

c()

Use commas to separate elements

Top

Reading – using the keyboard

The simplest way to "read" data into R is to type it yourself using the keyboard. The c() command is the simplest method. Think of c for "combine" or "concatenate". To make some data you type them in the command and separate the elements with commas:

> dat1 <- c(2, 3,4,5, 7, 9)
> dat1
[1] 2 3 4 5 7 9

Notice that R ignores any spaces.


Use the keyboard:

c()

Enclose elements in "quotes" to make text (character) values

Items made using c() are usually vector objects (i.e. 1-D)

Top

If you want text values you will have to enclose each element in quotes, you can use single or double quotes as long as they "match".

> dat2 <- c("First", 'Second', "Third")
> dat2
[1] "First" "Second" "Third"

When you "display" your text (character) data R includes the quotes (R always shows double quotes). You can use the c() command to join many items, not just numbers or character data. However, here you'll see it used simply for "reading" regular data. The result of the c() command is generally a vector, a 1-dimensional data object. You can add to existing data but if you "mix" data types you may not get what you expect:

> dat3 <- c(1, 2, 3, "First", "Second", "Third")
> dat3
[1] "1" "2" "3" "First" "Second" "Third"

R has converted the numbers to text (character) values.


Use the keyboard:

scan() can save typing commas to separate items

Character values can be typed without quotes by using the what = "character" parameter

The scan() command can read keyboard, clipboard or disk files

Specify: type of data, separator, decimal point character

Top

Typing the commas and/or quotes can be a pain, especially when you have more than a few items to enter. This is where the scan() command comes in useful. The scan() command is quite flexible and can read the keyboard, clipboard and files on disk. The simplest way to use scan() is for entering numbers. You invoke the command and then type numerical values, separated by spaces. You can press the ENTER key at any time but the data entry will not stop until you press ENTER on a blank line (i.e. you press ENTER twice).

> dat4 <- scan()
1: 3 6 7 12 13 9 8 4
9: 7 7 8 12 5
14: 4
15:
Read 14 items > dat4
[1] 3 6 7 12 13 9 8 4 7 7 8 12 5 4

In the preceding example eight values were typed and then the ENTER pressed. You can see that the R cursor changes from the >. This helps you to keep track of how many items you've typed. After the first eight items we typed five more (the line beginning with 9:) and then pressed ENTER again. R displayed 14: to show us that the next item would be the 14th. One value was typed and then ENTER pressed twice to complete the process.

If you want to type character data you must add the what = "character" parameter to the command, then R "knows" what to expect and you do not need to type quotes:

> dat5 <- scan(what = "character")
1: Mon Tue Wed Thu Fri
6: Sat Sun
8:
Read 7 items
> dat5
[1] "Mon" "Tue" "Wed" "Thu" "Fri" "Sat" "Sun"

You can specify the separator character using the sep = parameter, although the default space seems adequate! If you are so inclined you can also specify an alternative character for the decimal point via the dec = parameter.

> dat6 <- scan(sep = "", dec = ",")
1: 34 4,6
3: 1,4 7,34
5:
Read 4 items
> dat6
[1] 34.00 4.60 1.40 7.34

Notice that the default is actually two quotes with no space between! The sep and dec parameters are more useful for reading data from clipboard or disk files, as you will see next.


Use the clipboard:

Copy columns by column in a spreadsheet

Mac can also copy row data

Use scan() command to get data into R

Top

Reading – using the clipboard

The scan() command can read the clipboard. This can be useful to shift larger quantities of data into R. You'll probably have data in a spreadsheet but any program that can display text will allow you to copy to the clipboard.

Data in a spreadsheet

If you open the data is a spreadsheet then you can copy a column of data and use scan() as if you were going to type it from the keyboard. You do not need to specify the separator character (the default "" will suffice) when using a spreadsheet but you can specify the decimal point character if necessary.

Excel & EOL
Opening a text file in Excel allows you to see the data separator

If you need to copy items by row then you cannot do it using a spreadsheet unless you have a Mac. To get a column of data into R you simply copy the data to the clipboard:

Copying data in Excel
Copying a column of data in Excel

Then switch to R and use the scan() command:

> dat7 <- scan()
1: 34
2: 32
3: 30
4: 36
5: 36
6: 36
7: 41
8: 35
9: 30
10: 34
11:
Read 10 items

To finish you press ENTER on a blank line, this means you could copy more data and append it if you wish.

Multiple paste

Once you've pasted some data into R using scan() you finish the operation by pressing ENTER on a blank line. Usually this means that you have to press ENTER twice. If you want to add more data you can simply press ENTER once and copy/paste more stuff.


Rows can be copied if in space or comma separated format

Mac OS can handle TAB separated data in rows

Top

Rows of data

If the data are in a spreadsheet or are separated by TAB characters you cannot read rows of data (unless you have a Mac). If you open the data in a text editor (Notepad or Wordpad for example) then you'll be able to select and copy multiple rows of data as long as the items are not TAB separated. In the following example Notepad is used to open a comma separated file. Several rows are copied to the clipboard:

Copying rows in Notepad
A comma separated file can have its rows copied

The scan() command can now be used, using the sep = "," parameter as the data are comma separated:

> dat8 <- scan(sep = ",")
1: 34,35,33,26,35,35,35,32,38,29
11: 32,34,33,29,37,37,31,37,31,28
21: 30,38,34,33,33,32,29,35,29,30
31:
Read 30 items

In total 30 data elements were read (note that the data are read row by row). ENTER was pressed after the first paste operation but you could add more data, completing data entry by pressing ENTER on a blank line.


Different OS use different end of line (EOL) characters

Linux/Mac = LF
Windows = CR/LF
Old Mac = CR

Notepad cannot handle LF only EOL but Wordpad can

Notepad++ can alter EOL characters in Windows

TextWrangler can alter EOL characters in OSX

Top

End of Line characters

Text files can be split into lines using several methods. In general Linux and Mac use LF whereas Windows uses CR/LF (older Mac used CR). This can mean that if you use Windows and Notepad to open a file the data are not displayed "neatly" and everything appears on one line.

Noetpad and EOL
Notepad cannot deal with Unix/Linux EOL characters

Wordpad is built in to Windows and this will handle alternative line endings quite adequately.

Wordpad & EOL
Wordpad can deal with Unix EOL characters

The Notepad++ program will also handle alternative end of line characters and allows you to alter the EOL setting for a file. It also supports syntax highlighting.

On a Mac the default TextEdit app will deal with alternative line endings but TextWrangler is a useful editor, which also supports syntax highlighting.

If you use Linux then it is likely that your default text editor will deal with any EOL setting and also allow you to alter the encoding.


To convert data separator characters open the file in a spreadsheet and use File > Save As.. to alter the format

Top

Converting separators

If you have data in TAB separated format and want to convert it to simple space separated or comma separated format then the easiest way is to open the data in your spreadsheet. Then use File > Save As.. to save the file in a new format.

Save As options in Excel
Excel has various options for saving files

The preceding example shows Excel 2010 but other versions of Excel have similar options. OpenOffice and LibreOffice also allow saving of files in various formats.

The TAB separated format is useful for displaying items for people to read but not so good for copy/paste. Comma separated files are generally most useful and can be produced by many programs. Space separated files are somewhere in-between.

Of course there is no need to convert TAB separated files, they are only a problem if you are going to use copy/paste. R can handle TAB separators quite easily if the file is to be read directly from disk.


Reading: from disk

The scan() command makes vector objects

The read.table() makes data.frame objects

Top

Reading – from disk files

If you have a data file on disk you can read it into R without opening it first. There are two main options:

  • the scan() command
  • the read.table() command

Generally speaking you'd use scan() to read a data file and make a vector object in R. That is, you use it to get/make a single sample (a 1-D object). You use the read.table() command to access a file with multiple columns, each one representing a separate sample or variable. The result is a data.frame object in R (a 2-D object with rows and columns).

The read.table() command has many options and there are several convenience functions, such as read.csv(), allowing you to read some popular formats with appropriate defaults.


Reading from disk:

The scan() command can read disk files

Use file = file.choose() to open an interactive file chooser in Windows or Mac

Top

Use scan() to read a disk file

To get scan() to read a file from disk you simply specify the filename (in quotes) in the command. You can set the separator character and decimal point character as appropriate using sep = and dec = parameters. If you are using Windows or Mac you can use file.choose() instead of the filename:

> dat8 <- scan(file.choose())
Read 100 items

The file.choose() part opens a browser-like window and permits you to select the file you require.

File choose in Windows
Interactive file selection in R (via Windows)

The file selected in the preceding example was called L1.txt and you can view this in your browser. It is simply separated with spaces. The result is a vector of 100 data items even though the original file seems composed of ten columns.

If you have a file separated with commas, like L2.txt, then you simply specify sep = "," like so:

> dat9 <- scan(file.choose(), sep = ",")
Read 100 items

The result is the same, a vector of 100 items. When the separator character is a TAB (e.g. L3.txt) you can use the sep = "\t" parameter.

> dat10 <- scan(file.choose(), sep = "\t")
Read 100 items

Sometimes your data may include comments, these are helpful to remind you what the data were but you need to be able to deal with them.


Use the comment.char parameter to "ignore" comments in data files
Comments in data files

It is helpful to include comments in plain data files. This helps you (and others) to determine what the data represent without having to look elsewhere. The scan() command can "read and ignore" rows that are "flagged" as comments.

You can set rows that begin with a special character to be ignored using the comment.char = parameter. Of course you could use any character as a comment marker but in R the # is used so that would seem sensible.

> dat11 <- scan(file.choose(), comment.char = "#")
Read 100 items

Have a look at the L1c.txt file. This has three comment lines at the start. The file itself is space separated.


R uses a default working directory

Use getwd() to see the current directory

Use setwd() to set a new working directory

Filepaths in quotes

Use "~" to reset working directory

In Windows choose.dir() allows interactive directory selection

Top

Working directory

If you do not use file.choose() or you have Linux OS then you'll have to specify the filename exactly (in quotes). R uses a working directory, where it stores files and where it looks for items to read. You can see the current working directory using getwd():

> getwd()
[1] "/Users/markgardener" > getwd()
[1] "C:/Users/Mark/Documents"

You can set the working directory to a new folder by specifying the new filepath (in quotes) in the setwd() command:

> getwd() # Get the current wd
[1] "/Users/markgardener"
> setwd("Desktop") # Set to new value
> getwd() # Check that wd is set as expected
[1] "/Users/markgardener/Desktop"
> setwd("/Users/markgardener") # set back to original
> getwd()
[1] "/Users/markgardener"

You can "reset" your working directory using the tilde:

> setwd("~")
> getwd()
[1] "/Users/markgardener"

If you use Windows you can use choose.dir in the setwd() command to interactively set a working directory like so:

> setwd(choose.dir())

Interactive directory selection
In Windows you can choose a new working directory interactively


Read files direct from a URL by specifying the URL instead of the filename (in quotes)

Top

Reading from URL

You can use scan() to read a file directly from the Internet if you have the URL, use that instead of the filename. The URL needs to be complete and in quotes:

Note that you cannot use spaces in the URL. If you copy/paste the URL from the website any spaces will be shown as %20:

> dat12 <- scan(file = "http://www.mysite/data%20files/test.txt")
Read 100 items

You can use all the other parameters that you've seen before. Have a go yourself using these files:

  • L1.txt - a space separated file
  • L1c.txt - a space separated file with comments
  • L2.txt - a comma separated file
  • L3.txt - a TAB separated file

If you right-click a hyperlink you can copy the URL to the clipboard.


The read.table() command reads multi-column files and makes data.frame objects as a result

By default column headings are not used

The default separator is a space, i.e. sep = "" a space is not needed between the quotes!

Top

Reading multi-column files

The scan() command reads files and stores the result as a vector. However, many data files will contain multiple columns that represent different variables. In this case you don't want a single vector item but a data.frame, which reflects the rows and columns of the original data.

The read.table() command is a general tool for reading files from disk (or URL) and making data.frame results. You can set decimal point characters and separators in a similar way to scan(). However, you can also set names for columns and/or rows if required.

There are several convenience commands to help in reading "popular" formats:

  • read.csv() - this reads comma separated files
  • read.csv2() - uses semi-colon as the separator and the comma as a decimal point
  • read.delim() - uses TAB as a separator
  • read.delim2() - uses TAB as a separator and comma as decimal point

The read.csv() command is probably the most generally useful command as CSV format is the most widely used and can be saved by most programs.

The files L1.txt, L2.txt, L3.txt and L1c.txt are all ten columns (and 10 rows) in size. Previously you saw how the scan() command can read these files and create a single sample, as a vector object, from the data. By contrast, the read.table() command treats each column as a separate variable:

> dat12 <- read.table(file = "L1.txt")
> dat12
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 34 35 33 26 35 35 35 32 38 29
2 32 34 33 29 37 37 31 37 31 28
3 30 38 34 33 33 32 29 35 29 30
4 36 32 31 33 35 35 31 34 29 29
5 36 34 37 28 27 32 28 32 36 32
6 36 31 33 38 33 32 36 33 34 31
7 41 34 30 27 34 34 29 27 37 31
8 35 34 27 35 28 31 31 36 32 32
9 30 33 36 34 29 33 28 34 37 34
10 34 33 31 30 36 26 29 35 34 35

In the preceding example the L1.txt file was used, this is space separated so the default sep = " " need not be specified. Notice how R has "invented" names for the columns (the original file does not have column names). If the data are comma separated you can use the sep = "," parameter:

> dat13 <- read.table(file = "L2.txt", sep = ",")

Any comment lines are dealt with by the comment.char parameter:

> dat14 <- read.table(file = "L1c.txt", sep = "", comment.char = "#")

The read.table() command assumes that there are no column names in the original data. You can assign names as part of the read.table() command using the col.names parameter. You need to specify exactly the correct number of names.

> cn = paste("X", 1:10, sep = "") # Make some names
> cn
[1] "X1" "X2" "X3" "X4" "X5" "X6" "X7" "X8" "X9" "X10"
> dat15 <- read.table(file = "L1.txt", col.names = cn)
> head(dat15, n = 3)
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 34 35 33 26 35 35 35 32 38 29
2 32 34 33 29 37 37 31 37 31 28
3 30 38 34 33 33 32 29 35 29 30

You can always alter the names afterwards if you prefer.


Use header = TRUE to read column headings from a file with read.table()

Use sep = "\t" for TAB separated data

Top

Column headings

In most cases your data will contain column headings. You can set these column headers using header = TRUE in the read.table() command.

> dat16 <- read.table(file = "L6.txt", header = TRUE)
> dat16
Smpl1 Smpl2 Smpl3 Smpl4 Smpl5 Smpl6 Smpl7 Smpl8 Smpl9 Smpl10
1 34 35 33 26 35 35 35 32 38 29
2 32 34 33 29 37 37 31 37 31 28
3 30 38 34 33 33 32 29 35 29 30
4 36 32 31 33 35 35 31 34 29 29
5 36 34 37 28 27 32 28 32 36 32
6 36 31 33 38 33 32 36 33 34 31
7 41 34 30 27 34 34 29 27 37 31
8 35 34 27 35 28 31 31 36 32 32
9 30 33 36 34 29 33 28 34 37 34
10 34 33 31 30 36 26 29 35 34 35

The file L6.txt is space separated. If you have TAB or comma separated you simply alter the sep parameter as appropriate. Try the following files for yourself:

  • L4.txt - a comma separated file
  • L5.txt - a TAB separated file
  • L6.txt - a space separated file

Use "\t" to represent a TAB character:

> dat17 <- read.table(file = "L5.txt", header = TRUE, sep = "\t")

Note that R will check to see that the column names are "valid", and will alter the names as necessary. You can over-ride this process using the check.names = FALSE parameter. This can be useful to allow numbers to be used as headings (such as years, e.g. 2001, 2002). However, it is a good idea to check the original datafile first.

Sometimes your data will have row headings, if so you will need an additional parameter, row.names.


Use row.names = n to read column n as row headings for your data with read.table()

Top

Row headings

If your data contains row headings you can use the row.names parameter to instruct the read.table() command which column contains the headings (usually the first).

> dat18 <- read.table(file = "L7.txt", sep = "", header = TRUE, row.names = 1)
> dat18
Smpl1 Smpl2 Smpl3 Smpl4 Smpl5 Smpl6 Smpl7 Smpl8 Smpl9 Smpl10
Obs1 34 35 33 26 35 35 35 32 38 29
Obs2 32 34 33 29 37 37 31 37 31 28
Obs3 30 38 34 33 33 32 29 35 29 30
Obs4 36 32 31 33 35 35 31 34 29 29
Obs5 36 34 37 28 27 32 28 32 36 32
Obs6 36 31 33 38 33 32 36 33 34 31
Obs7 41 34 30 27 34 34 29 27 37 31
Obs8 35 34 27 35 28 31 31 36 32 32
Obs9 30 33 36 34 29 33 28 34 37 34
Obs10 34 33 31 30 36 26 29 35 34 35

Notice that the default is sep = "", this is used for space separated files but you do not need to put a space between the quotes! The row.names parameter needs a single numerical value, to identify the column that contains the headings. If the separator character is different you simply change the sep parameter. Try the following files for yourself:

  • L7.txt - a space separated file
  • L8.txt - a comma separated file
  • L9.txt - a TAB separated file

All these files contain both row and column names.

Usually you give a column number in the row.names parameter. However, there are several ways to specify the row headings:

  • A number corresponding to the column containing the names
  • The name of the column header (in quotes) containing the names
  • A vector of names to use as row headings/names

If the data are simple space delimeted and contain column headings in all columns except the first (so column 1 is 1 element shorter than the other columns) it will be treated as a column of row headings (that is you do not need to specify row.names). In practice it is best to use a definite value for row.names. You can "force" simple numbered rows by setting row.names = NULL.


The read.csv() command uses defaults likely for CSV files

The read.delim() command uses defaults for TAB separated files

Top

The read.xxx() convenience commands

There are four convenience functions, designed to make things a bit easier when dealing with some common formats.

  • read.csv() - this reads comma separated files
  • read.csv2() - uses semi-colon as the separator and the comma as a decimal point
  • read.delim() - uses TAB as a separator
  • read.delim2() - uses TAB as a separator and comma as decimal point

All these commands have header = TRUE and comment = "", set as default. This makes things simpler, the following commands achieve the same result:

> read.csv("L8.txt", row.names = 1)
> read.table("L8.txt", sep = ",", header = TRUE, row.names = 1)
> read.delim2("L8.txt", sep = ",", header = TRUE, row.names = 1, dec = ".")

You use any of the read.xxxx() commands as long as you over-ride the defaults. See the help entry for read.table() by typing help(read.table) in R.


The read.table() command "interprets" text columns as factor variables

Top

Column formats

When you read a multi-column file into R the various columns are "inspected" and converted to an appropriate type. Generally any character columns are converted to factor variables whilst numbers are kept as numbers. This is usually an acceptable process, since in most analytical situations you want character values to be treated as variable factors.

However, there may be times when you want to alter the default behaviour and read columns in a particular format. Here is a data file with several columns:

     count treat day int
obs1    23    hi mon 1
obs2    17    hi tue 1
obs3    16    hi wed 1
obs4    14    lo mon 2
obs5    11    lo tue 2
obs6    19    lo wed 2

The first column can act as row headers/names. The "count" column is fairly obviously a numeric value. The "treat" column is an experimental factor whilst "day" is simply a label (text). The "int" column is intended to be a treatment label. It is common in some branches of science to use plain numerical values as treatment labels.

When you read this file into R the columns are "interpreted" like so:

> df1 <- read.csv("L11.txt", row.names = 1)
> df1
count treat day int
obs1 23 hi mon 1
obs2 17 hi tue 1
obs3 16 hi wed 1
obs4 14 lo mon 2
obs5 11 lo tue 2
obs6 19 lo wed 2
> str(df1) # Examine object structure
'data.frame': 6 obs. of 4 variables:
$ count: int 23 17 16 14 11 19
$ treat: Factor w/ 2 levels "hi","lo": 1 1 1 2 2 2
$ day : Factor w/ 3 levels "mon","tue","wed": 1 2 3 1 2 3
$ int : int 1 1 1 2 2 2

The file L11.txt now appears to have the column "day" converted to a factor variable and the "int" column is a number rather than a factor.


Control column class in read.xxxx() using:

  • stringsAsFactors
  • as.is
  • colClasses

Top

You can control the interpretation in several ways using parameters in the read.xxxx() commands:

  • stringsAsFactors - set this to FALSE to keep character columns as character variables
  • as.is - give the column numbers to keep "as is" and not convert
  • colClasses - give a vector of character strings corresponding to the classes required

The stringsAsFactors parameter is the most "simple" and is over-ridden if one of the others is used. It provdes "blanket coverage" and operates on the entire data file being read:

> df2 <- read.csv("L11.txt", row.names = 1, stringsAsFactors = FALSE)

> str(df2)
'data.frame': 6 obs. of 4 variables:
$ count: int 23 17 16 14 11 19
$ treat: chr "hi" "hi" "hi" "lo" ...
$ day : chr "mon" "tue" "wed" "mon" ...
$ int : int 1 1 1 2 2 2

The as.is parameter operates only on the columns you specify – don't forget that the column containing row names is column 1:

> df3 <- read.csv("L11.txt", row.names = 1, as.is = 4)
> str(df3)
'data.frame': 6 obs. of 4 variables:
$ count: int 23 17 16 14 11 19
$ treat: Factor w/ 2 levels "hi","lo": 1 1 1 2 2 2
$ day : chr "mon" "tue" "wed" "mon" ...
$ int : int 1 1 1 2 2 2

All other parameters work as they did before so if you have a space or TAB separated file for example you can use the sep parameter as necessary. Try these files if you want some practice:

  • L10.txt - a TAB separated file with row names
  • L11.txt - a comma separated file with row names
  • L12.txt - a space separated file with row names

Using the colClasses parameter gives you the finest control over the columns. You must specify (as a character vector) the required class for each column. In the example here you would want "count" to be an integer (or simply numeric), "treat" to be a factor, "day" to be character and "int" to be a factor (remember we said this represented an experimental treatment). Don't forget the column of row names – use character for these:

## Define the column classes
> cc <- c("character", "integer", "factor", "character", "factor")
## Read the file and set column classes > df4 <- read.csv("L11.txt", row.names =1, colClasses = cc)
> str(df4)
'data.frame': 6 obs. of 4 variables:
$ count: int 23 17 16 14 11 19
$ treat: Factor w/ 2 levels "hi","lo": 1 1 1 2 2 2
$ day : chr "mon" "tue" "wed" "mon" ...
$ int : Factor w/ 2 levels "1","2": 1 1 1 2 2 2

You can of course set the columns of the resulting data.frame afterwards but it makes sense to do this right at the start.


View R objects in the workspace using ls() or objects() commands

.RData files are binary and readable only by R

R objects can be stored as plain text files, often with .R extension

Top

Reading R objects from disk

R objects can be stored on disk in one of two formats:

  • Binary - usually .RData files that are encoded and only readable by R
  • Text - plain text files that can be read by text editors (and R)

Generally speaking an R object is something you see when you use the ls() command, the objects() command does the same thing and the two are synonymous. Any R object can be saved to disk.

Usually text files will be "raw data" files and scripts (collections of R commands that are executed when the script is run). However, it is possible to have text versions of other R objects.

Binary files are generally used to keep custom commands, data and results objects. Often you'll have several objects bundled together.

There are several ways to get these objects into R.


Open R files from the OS by double clicking the data file icon

Top

Use the OS to open R objects

In Windows and Mac you can usually open .RData and .R files directly from the operating system. Usually .RData files are associated with R so if you double click a file it will open R and load the objects it contains. There are two scenarios:

  • R is already running
  • R is not running

You get a different result in each case.


R data files .RData can be opened by double clicking in Windows or Mac

If R is closed then data opens and working directory set to folder containing .RData file

R text files .R only open in text editors from the OS

Top

If R is not running

If R is not running and you double click a .RData file, R will open and read the objects contained in the file. In addition, the working directory will be set to the folder containing the .RData file.

Windows knows R files
Windows usually knows what files contain R data objects

So, if you double click the file or drag its icon to the R program icon (shortcut on desktop, taskbar or dock) then R will open. You can check the working directory, which should point to the folder containing the file you opened:

> getwd()
[1] "Y:/Dropbox/Source data files/R Source files"

In Windows if you drag the icon to the R program icon you can "pin" it to the taksbar, this can allow you to open a data file later with a right click of the taskbar icon:

Pin a datafile to the taskbar
In Windows you can pin a data file to the taskbar

Files that are stored in text format, usually with the .R extension, are handled differently. Since they are plain text the default program may well be a text editor, depending what is installed on your computer. You can right click a file or (in Windows) click the Open button on the menu bar to see what programs are available.

Windows file opening options
Right clicking a file can show what options are available to open it

In Windows you will probably have to open the file with a text editor of some kind. If you have the RStudio IDE you can select to open the file in this, RStudio will run and your file will appear in the script window. Note that if R has saved a text representation of objects the file will have Unix-style EOL characters (LF), and Notepad will not display the data properly. Wordpad or Notepad++ are fine though.

An R text file in Wordpad
Wordpad opens R text files

In Mac you may have the file associated with a text editor or with R. If you open the file using R then the text appears in the script editor window.

If you use Linux you cannot open .RData files from the OS. R text files can be opened in a text editor. If you use RStudio IDE then .RData or .R files can be opened (in RStudio) from the OS.

So, .RData objects can be read into R directly via the OS but text representations of R objects (.R files) cannot.


If R is already running then drag the file icon to the console to append data to the workspace

The working directory is unchanged

Top

If R is already running

If R is already running when you try to open an .RData file by double clicking, what happens depends on the OS.

  • In Windows R opens in a new window
  • In Mac R appends the data to the current workspace

In Windows you get another R window, the data in the .RData file is loaded and the working directory set to the folder containing the file, just as if R had not been open.

In Mac the data are appended to the current workspace and the working directory remains unchanged.

If you are using Windows and want to append the data to your current workspace then you must drag the file icon to the running R program window and drop it in. If you drag to the taskbar and wait a second the program window will open and you can drop the file into the console.

Something similar happens on a Mac, you can drag a file to the dock icon or into the running R console window.


Get binary .RData files using load() command

Windows and Mac GUI have menu

Data is appended to workspace

Items with same name are overwritten

Working directory is unaffected

Top

Get binary files with load() command

You can load .RData files directly from within R using the load() command. You need to supply the name of the file (in quotes) although in Windows or Mac you can use file.choose() instead. Any filenames must include the full file path if the file is not in your working directory.

> ls()
character(0)
> load(file = "Obj1.RData")
> ls()
[1] "data_item_1" "data_item_2"

In Windows and Mac you can also use a menu:

  • Mac: Workspace > Load Workspace File...
  • Win: File > Load Workspace...

Load workspace file in Windows
Load a data file in Windows

The data are appended to anything in the current workspace and any items that are named the same as incoming items are overwritten without warning. The working directory is unaffected.

If you want a practice file try the Obj1.RData file. You'll have to download it, although you could try loading the file using the URL.


R objects as text can be named or unnamed

Use dget() or source()

Top

Get text files with dget() or source() command

R can make text representations of objects. Generally there are two forms:

  • Un-named objects
  • Named objects

You'll need to use the dget() or source() commands to read these text files into R.


Use dget() to get a text object into R

Top

Un-named R objects as text

Un-named objects are created using the dput() command and are text representations of single objects. The text can look slightly odd, see the D2.R example:

structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("high", "low" ), class = "factor")

To read one of these objects into R you need the dget() command. Since the object is un-named in the text file you'll need to assign a name in the workspace:

> ls() # Check what is in workspace
character(0)
> fac1 = dget(file = "D2.R") # Get object
> fac1
[1] high high high low low low
Levels: high low

Note that there is no GUI menu command that will read this kind of text object, you have to use dget(). Other text representations look more like regular R commands (i.e. the objects are named), you need a different approach for these.


Use source() to get named text objects into R

Windows and Mac GUI have menu

Top

Named R objects as text

Some R text objects look more like regular R commands. These can be saved to disk using the dump() command, which can save more than one object at a time (unlike the dput() command). An example is the D1.R file:

dat1 <- c(2, 3, 4, 5, 7, 9)
dat2 <- c("First", "Second", "Third")

The D1.R file was created using the dump() command to save two objects (dat1 and dat2).

To get these into R you need the source() command:

> ls() # Check what is in workspace
character(0)
> source(file = "D1.R") # Get the objects
> ls()
[1] "dat1" "dat2"
> dat1 ; dat2
[1] 2 3 4 5 7 9
[1] "First" "Second" "Third"

In the preceding example the file weas named explicitly but in Windows or Mac you can use file.choose() instead (also in Linux if you are running the RStudio IDE). Notice that you do not need to name the objects, the data file already has assigned names.

It is possible to use menu commands in Windows or Mac to get this kind of file into R:

  • Win: File > Source R code...
  • Mac: File > Source File...

This kind of text file is more like a series of R commands, which would generally be called an R script. Such script files are very useful, as you can use them for many puposes, such as making a custom calculation that can be run whenever you want.


Use source() command to run R script files

Mac GUI script editor has syntax highlighting

Top

R script files

R script files are simply a series of R commands saved in a plain text file. When you use the source() command, the commands are executed as if you had typed them into the R console directly.

Generally speaking you'll want to "look over" the script before you it. This usually means opening the file in a text editor. Some text editors "know" the R syntax and can highlight the text accordingly, making the script easier to follow. Both Windows and Mac R GUI have a script editor but only the Mac version has syntax highlighting.

To open a script file in the R GUI use the File menu:

  • Win: File > Open script...
  • Mac: File > Open Document...

You can open a script in any text editor of course. In Windows the Notepad++ program is a simple editor with syntax highlighting. On the Mac the TextWrangler program is highly recommended. On Linux there are many options, the default text editor will often support syntax highlighting, Geany is one IDE that not only has syntax highlighting but integrates with the terminal.

The RStudio IDE is very capable and makes a good platform for using R for any OS. The script editor has syntax highlighting.


Package foreign allows reading of some other file types e.g. SPSS, Systat, MiniTab, dBase

Package xlsx allows reading of Excel files

Use colClasses parameter in read.xlsx() to force column class(es)

Top

Reading other types of file

R cannot read other file types "out of the box" but there are command packages that will read other files. Two commonly used ones are foreign and xlsx.

Package foreign

The foreign package can read files created by several programs including (but not limited to):

  • SPSS – files made by save or export can be read
  • dBase – .dbf files, which are also used by other programs in the XBase family
  • MiniTab – .mtp MiniTab portable worksheets
  • Stata – .dta files up to version 12
  • Systat – .sys or .syd files
  • SAS – XPORT format files (there is also a command to convert SAS files to XPORT format)

There are various limitations, the package has not been updated for a while. In general it is best to save data in CSV format, which most programs can do. If you get sent a file in an odd format then you can ask the sender to convert it or try the package and see what you can do. The index file of the package help should give you a fair indication of what is available.

Package xlsx

The xlsx package reads Excel files. The main command is read.xlsx(), which has various options for Excel formatted files. There is also a read.xlsx2() command, this is more or less the same but uses Java more heavily and is probably better for large spreadsheets.

The essentials of the command are:

read.xlsx(file, sheetIndex, sheetName = NULL)

You provide the filename (file) and either a number corresponding to a worksheet (sheetIndex) or the name of the worksheet (sheetName = "worksheet_name"). You can use file.choose() to select the file interactively (except on Linux).

There is also a parameter colClasses, which you can use to force various columns into the class you require. You give a character vector containing the classes you want – they are recycled as necessary. In general you should try a regular read and see what classes result. If there are some "oddities" you can either alter them in the imported object or re-import and set the classes explicitly.


  MongRaphs Index | Part 2. wRiting

See my Publications about statistics and data analysis.

Writer's Bloc – my latest writing project includes R scripts

Courses in data analysis, data management and statistics.

Follow me...
Facebook Twitter Google+ Linkedin Amazon
Top Home
Data Analysis
MonogRaphs Index Contact