Dr. Mark Gardener 


Home  
Home > Data Analysis > R Monographs > Dot Charts and Histograms



R: MonogRaphsA series of essays on random topics using R: The Statistical Programming LanguageR is a powerful and flexible beast. Getting started using R is not too difficult and you can learn to start using R in an afternoon. However, mastering R takes rather longer! These monographs are my way of exploring various topics in a completely unstructured manner. Tips & Tricks for R  An Introduction to R Writer's Bloc  Courses 

Use the stem() command to make a stemleaf plot to visualise data distribution 
Dot charts as an alternative to the histogramRecently I saw a message in a forum asking about the difference between dot plots and histograms. This got me thinking and so I decided to work out how to make R produce a dot plot from scratch. stemleaf  frequency tables  bar charts  histograms  towards a dot histogram  the script A histogram is a way of showing the frequency of your numeric data in a visual manner. The histogram looks more or less like a bar chart except that the bars are touching – the xaxis is a continuous scale rather than being discrete categories. Look at the following data: > mydata = c(6, 7, 8, 7, 6, 3, 8, 9, 10, 7, 6, 9) Stemleaf plotYou can visualise the distribution using a stemleaf plot: > stem(mydata) The decimal point is at the  2  0 4  6  000000 8  0000 10  0 The stem() command does not give much flexibility when it comes to the bins separating the data categories but you can use the scale = n instruction. The default is 1 so making the value larger will increase the number of bin categories: > stem(mydata, scale = 2) The decimal point is at the  3  0 4  5  6  000 7  000 8  00 9  00 10  0 Making the scale smaller gives a different impression: > stem(mydata, scale = 0.5) The decimal point is 1 digit(s) to the right of the  0  3 0  6667778899 1  0 The stem() command can be useful but it does not really match the histogram.


Use table() to split integers into frequency categories  Make a frequency table with the table() commandAnother method of looking at the data is to make a frequency table: > table(mydata) Not very visual but it does a job. It splits the data into chunks and shows the frequency for each. The table() command also really only works sensibly on integer values.


Use barplot() to visualize the result of a table() command and get a histogram substitute 
Visualize frequency with a bar chartThe resulting table can be turned into a visual representation of the data if you make a bar chart: > barplot(table(mydata)) The resulting bar chart gives you an impression of the frequency distribution:
The barplot is useful but can be misleading. The bars are discrete categories (bins or size classes) and are discontinuous. In the preceding barplot you can see that there is a jump from the 3bin to the 6bin. The barplot() command is very flexible and you can customize your plot in many ways but you cannot get aeound this problem.


Use the hist() command to make a true histogram with a continuous xscale Use breaks = value to control the breakpoints in a histogram 
A true histogramA true histogram has a continuous xaxis and you can make one using the hist() command: > hist(mydata)
The histogram can be jazzed up and customized in various ways, which I won't delve into at this point. However, one important aspect is the control of the xaxis. The xaxis is a continuous scale and you can see the difference between this and the earlier barplot by looking at the position of the axis labels. In the barplot they are in the middle of each bar but in the histogram they are placed at the edges of the bars. You can control the breakpoints using the breaks instruction. The default is breaks = "sturges", which uses an algorithm to determine the breakpoints. You can also specify the number of breakpoints you want or even specify the "exact" position of the breakpoints by giving the values explicitly.


Developing a custom function to make a dot histogram (or tally plot) Use plot = FALSE to calculate the statistics for a hist() without plotting the histogram 
Developing a script to draw a tally plot or dot histogramWhat I wanted was to make a chart that replaced the bars with dots, the number of dots in each column being equal to the frequency. One feature of the hist() command is that you can make a histogram without actually making the final plot. In other words you can calculate all the required statistics. I started by making a result object of the histogram data like so: > hg = hist(mydata, plot = FALSE) The result contains several elements in a list; useful elements are the midpoints of the columns and the counts (frequency): > hg$mids I reasoned that I could use the $mids as the xvalues in a regular plot. The yvalues would come from the $counts data. A frequency of 3 would get plotted three times, at y = 1, y = 2 and y = 3. This meant I had to replicate the count data to make a sequence, which would have to be matched up to the xdata. A loop of some sort seemed unavoidable and the number of times the loop would need to run would be equal to the number of bins, that is the number of bars. Put another way, it is the number of breaks1. It is simplest to count the number of items in the $counts: > bins = length(hg$counts) To make the yvalues I needed to make each frequency into a series, so a value of 3 would become 1, 2, 3. I also needed to take care of 0 values so I decided to make each frequency a series 0:frequency. Actually it was logical to do this the other way around freqency:0 so the loop becomes: > yvals = numeric(0) > for(i in 1:bins) { + yvals = c(yvals, hg$counts[i]:0) + } The first line simply creates a blank numeric vector. The loop creates the appropriate values and appends them to the vactor. For the data under consideration this produces: > yvals Each count value is a sequence ending in zero, the count that was a zero remains so. The xvalues are derived from the $mids result, since I added an extra 0 to each yvalue each item needed to be repeated a number of times equivalent to the count +1. This has the bonus of dealing with the 0 count, as a repeat of 0 would be "difficult". A loop is needed again and it will run for as many times as there are bin categories. > xvals = numeric(0) > xvals The xvals and yvals cannot be used directly because there are zero items and we don't want points plotted at 0. The simplest way to deal with this is to join up the values in a data.frame and then remove rows where y = 0. > dat = data.frame(xvals, yvals) Now the data are ready to make into a plot. A regular scatter plot will do the job via the plot() command: > plot(yvals ~ xvals, data = dat) However, the points are too small and the plot does not look "tidy". The trick is to remove the axes, allow the points to spill over the plot area a little and to make the points larger. In addition it is helpful to plot each point a little bit higher on the yaxis so that the bottom row do not overlap the axis too much. A few extra tweaks are also necessary to get the axis scales to come out right. After a bit of tweaking I get the fnal plot to appear thus:
The command uses the default breaks = "sturges" to work out the breakpoints, you can specify other breakpoints in exactly the same way as for the hist() command. The plotting symbols are set to pch = 19 (a solid circle) and enlargened somewhat with cex = 3. You can specify other values. The offset = 0.4 instruction plots each point slightly "upwards". You can alter this offset and with the cex and pch instructions can get the appearence you want. The biggest alteration you can make is with the graphics window. It seemed a lot of hassle to attempt to match the plot window size to the other parameters. It is easiest to simply use the mouse to resize the plot window to give the appearence you like. You can easily save the plot to a file once it is completed.


Function hg_dot() produces a dot histogram of numerical data Use breaks = value to control breakpoints just like hist() Alter plot symbol and size using pch and cex Resize graphics window to alter appearence Get the hg_dot() command as a script file 
The hg_dot() commandWhen made up into a function the command lines look like the following: ## Dotplot histogram ## Mark Gardener 2013 ## www.dataanalytics.org.uk hg_dot < function(x, breaks = "sturges", offset = 0.4, cex = 3, pch = 19, ...) { # x = data vector # ... = other instructions for plot hg < hist(x, breaks = breaks, plot = FALSE) # Make histogram data but do not plot bins < length(hg$counts) # How many bin categories are needed? yvals < numeric(0) # A blank variable to fill in for(i in 1:bins) { # Start a loop yvals < c(yvals, hg$counts[i]:0) # Work out the yvalues } # End the loop xvals < numeric(0) # A blank variable for(i in 1:bins) { # Start a loop xvals < c(xvals, rep(hg$mids[i], hg$counts[i]+1)) # Work out xvalues } # End the loop dat < data.frame(xvals, yvals) # Make data frame of x, y variables dat < dat[yvals > 0, ] # Knock out any zero yvalues minx < min(hg$breaks) # Min value for xaxis maxx < max(hg$breaks) # Max value xaxis miny < min(dat$yvals) # Min value for yaxis maxy < max(dat$yvals) # Max value for yaxis # Make the plot, without axes, allow points to overspill plot region plot(yvals + offset ~ xvals, data = dat, xlim = c(minx, maxx), ylim = c(miny, maxy), axes = FALSE, ylab = "", xpd = NA, cex = cex, pch = pch, ...) axis(1) # Add in the xaxis # Make results of original data, histogram and plot data result < list(hist = hg, original = x, plot.data = dat) invisible(result) # Save all the results invisibly } # end
## END Once you run the command your chart will be created in whatever size your default graphics window is set to. Simply drag the window to a new size as appropriate. The command produces a list result that contains the following:
If you assign a named object to the command you can access these results afterwards. > hg = hg_dot(mydata) You can get the R script here. 

See my Publications about statistics and data analysis. Writer's Bloc – my latest writing project includes R scripts Courses in data analysis, data management and statistics. 
My Publications about statistics and data analysis See my personal pages at GardenersOwn 

Follow me... 

Top  Home  MonogRaphs Index  Contact 