Dr. Mark Gardener 



Statistics for Ecologists Using R and Excel (Edition 2)Data Collection, Exploration, Analysis and Presentationby: Mark GardenerAvailable soon from Pelagic Publishing Welcome to the support pages for Statistics for Ecologists. These pages provide information and support material for the book. You should be able to find an outline and table of contents as well as support datafiles and additional material. Support Index  Exercises Index  Outline & TOC  Data files 

Exercise 9.4 

Table of Contents


Goodness of fit test used to compare observed phenotypes with theoretical ratio of types. Get the example data: 
9.4 Using Excel for chisquared goodness of fit testingThis exercise is concerned with association (Chapter 9), in particular goodness of fit testing using Excel (Section 9.4). IntroductionExcel has several functions related to the Chisquared statistic. This allows you to undertake chisquared goodness of fit testing for example. In goodness of fit tests you have one set of frequency data in various categories. You also have a matching set of frequencies that you want to "compare". The comparison set may be a theoretical set of values or perhaps a previous set of observations. The goodness of fit test is often used in genetic studies where you match up observed phenotypes against a theroetical ratio of expected phenotypes. The following data will be used:
The data show the results of crossing two varieties of pea plants. The attributes of colour and coat type are of interest. There are green and yellow coloured varieties and two coat types: wrinkled and smooth. After crossing, the offspring were sorted into categories according to their colour and coat type. The Obs column shows how many offspring there were in each category (a frequency or count). The theoretical ratio of phenotypes is 9:3:3:1. The goodness of fit test can compare the observed frequencies to the theroretical ratios. Does the data match up with the genetic theory? Get the example data: pea genetics.xlsx. 

Calculate expected values from the theoretical ratios and original observations. 
Calculate expected valuesOpen the spreadsheet pea genetics.xlsx. There are two worksheets, Data contains the basic data and Completed version has all the calculations (and a graph) completed for you. First of all you need to fill in the theoretical ratios and then calculate the expected frequencies based on those:
Your spreadsheet should now appear more or less like the following:
You are now ready to move on to calculating the chisquared values and overall significance. 

Chisquared values: 
Calculate Chisquared valuesIt is useful to see the individual chisquared values:
You'll need to compare your total chisquared to a critical value but you may as well compute the Pearson residuals first. 

Pearson residuals: Pearson residuals have a zdistribution. Critical value 1.96 at p = 0.05 Small residuals show that there is a good "fit" between observed and expected values. 
Pearson residualsThe Pearson residuals are more or less the square root of the chisquared values.
The advantage is that the sign of the residual tells you about the direction of the difference between the oberved and expected values. Pearson residuals have a zdistribution so values of approximately 2 (1.96) are significant at p < 0.05. In a goodness of fit test small residuals indicate that there is little difference between the observed and expected values.
You don't really need a sum for the residuals but you could easily copy the adjacent sum of chisquared values across. You can use the residuals in a column chart as a visualization of the results later. Your spreadsheet should now look more or less like the following:
Now you can move on to look at the statistical significance of the results shortly but you can already see from the Pearson residuals than none are > 1.96 and so the result is likely to be not significant. 

Excel functions: Use CHITEST to get exact pvalue from Obs and Exp values. Use CHIINV to get a critical value or to get a chisquared result from a probability. Use CHIDIST to get an exact pvalue from your calculated chisquared result. A value of p > 0.05 means that there is no significant departure between the observed and expected values. 
Critical values and significanceYou can check to see if your result is a statistically significant one in several ways. Since you have a total chisquared value you can compare that to a critical value. You can look up a critical value or use the CHIINV function. You can determine the exact pvalue for your total chisquared value using the CHIDIST function. In either case you'll need the degrees of freedom (df), which is the number of categories (rows) minus 1. In this case 41 = 3. You can also use the CHITEST function to get an exact pvalue from the observed and expected values.
Now you have all the appropriate statistics, some of them computed in alternative ways. In this case the result is not significant. The calculated chisquared result is smaller than the critical value. Also, the pvalue is > 0.05. You have determined that there is no significant departure between the observed and expected frequencies. In other words the ratio of pea phenotypes you observed does not depart significantly from the theoretical ratio (9:3:3:1). 

Use a column chart of Pearson residuals to visualize goodness of fit results. Use control key to highlight noncontiguous blocks of cells for charting. Highlighting two columns of category labels produces multiword labels. 
Visualize the resultsThere are many ways to visualize chisquared results. One option is to plot the Pearson residuals, as this shows the departures from the "goodness" of the fit between Obs and Exp values and shows the significance.
With a bit of formatting and general juggling you can get a plot that resembles the one below.
You could also plot the Obs and Exp values as a column chart, try it by highlighting the cells A1:C5 then with the control key E1:E5. 

Top  
My Publications  My Publications See my personal pages at GardenersOwn 

Follow me... 


See also: 
KeywordsHere is a list of keywords: it is by no means complete! Ttest, Utest, KruskalWallis, Analysis of Variance, Spearman Rank, Correlation, Regression, Logistic Regression, Curved linear regression, histogram, scatter plot, bar chart, boxwhisker plot, pie chart, Mean, Median, Mode, Standard Deviation, Standard Error, Range, Max, Min, Interquartile Range, IQR 

Top  DataAnalytics Home  Contact  GardenersOwn Homepage 