This lesson provides a demonstration of inquiries into differences between groups, specifically by using Student’s t-Test for Independent Samples. Overall, Student’s t-Test is a very common test for determining differences when a singular measured variable (e.g., Systolic Blood Pressure, weight of dairy cow milk production per lactation, length of shark dorsal fin, etc.) is compared to differences between a grouping variable with two breakout groups (e.g., Female v Male humans, Guernsey v Jersey cows, Mako v Great White sharks). The t-Test was developed more than 100 years ago, as part of quality assurance work for a beverage company, but published under the pen name Student. Student’s t-Test is the appropriate test for comparing differences between small samples, typically 30 or fewer. However, it is also common to see Student’s t-Test for Independent Samples used with larger samples.
This is a preview of subscription content, log in via an institution to check access.
eBook EUR 96.29 Price includes VAT (France)
Softcover Book EUR 121.31 Price includes VAT (France)
Hardcover Book EUR 168.79 Price includes VAT (France)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Along with the use of p, you will also see the term alpha in any discussions about the level of probability, but p will be used in this lesson.
As an addition to the Housekeeping syntax in prior lessons, note the addition of the ls(all.names=TRUE) function and argument, which will list hidden files (e.g., files that start with a . character).
The data are in four separate columns, Subject, Breed, PctButterfat, and PctProtein. The data are in stacked (e.g., long) format, as opposed to structuring data in unstacked (e.g., wide) format. The difference between the two data formats, stacked and unstacked, is detailed in later lessons. Once again, this lesson starts with a simple confidence-building approach to data organization, with more detail added as skills with R increase.
All syntax required for replication of this lesson is presented. Most screen prints generated by the syntax in the main body of the lesson are also are presented, with only a few screen prints excluded from presentation. This practice also applies to the figures. The syntax for all figures is presented throughout this lesson, but the output of this graphically-focused syntax is occasionally excluded to keep this lesson at a reasonable length.
Avoid the use of pie charts. Pie charts are of questionable value for communicating the concept of varying degrees of membership by breakout groups and they are not well-received by the professional community, even though their use is common in the mass media.
It is also possible to perform a simple copy and paste against each graphical image or to use R syntax to save a graphical image by using R syntax.
The ggplot2 package and supporting packages are used to produce a variety of figures associated with the concept of Beautiful Graphics. These packages are external to what is available when R is first downloaded and it is necessary to actively download these packages to take advantage of their specialized functionality.
For nearly all statistical tests, the Null Hypothesis is worded in a negative fashion and is typically stated as There is no statistically significant difference between A and B in terms of C. Somewhat different, the Null Hypothesis for a normality test is instead worded in the affirmative and is typically stated as The data follow a normal distribution. Give attention to this different approach to the way the normality tests are worded when interpreting the p-values, significance levels, and outcomes.
The z-Test is similar to the t-Test in that both tests are used to determine if there is a statistically significant difference in the means of two populations. The z-Test and the t-Test are also similar in that, ideally, there is normal distribution (or at least a reasonable semblance of normal distribution) for each population in question. However, there are a few issues where there are differences between the z-Test and the t-Test. An assumption associated with the t-Test is that the standard deviation of each population is unknown whereas for the z-Test the standard deviation of each population should be known. Another key difference is that the z-Test is used for when samples are large while the t-Test is the preferred test when samples are small (typically 30 or fewer datapoints for each sample). Again, the t-statistic begins to closely approximate the z-statistic when sample sizes increases, justifying use of the t-Test for samples that exceed 30 or more subjects.
Syntax is provided throughout this addendum, but of course this syntax is only a suggestion. Experiment and take other approaches to how the data can be analyzed and outcomes presented by using other functions and other arguments. Use this addendum as a confidence-building resource on how R is used with increasingly complex analyses.
There are many different approaches to the construction of a self-generated dataframe when using R. This method was purposely selected to demonstrate a detailed process and to also demonstrate functions such as the replicate() function and the rbind() function.
Again, this dataset was created for teaching purposes and it is not suggested that the data begin to model clinical measurements.
Note the use of Package::Function notation, in an attempt to be fully descriptive.Recall that SBPFM.df was user-created for teaching purposes, largely to press the point of concerns about overall normal distribution for when breakout groups have such widely different standard deviations.
Again, consider how in the social sciences there is a growing trend to report calculated p-values and to give less attention to a rules-based decision to either accept or fail to accept (e.g., reject) the Null Hypothesis.
The terms GTE (e.g., Greater Than or Equal to) and LTE (e.g., Less Than or Equal to) are common means of expressing these conditions of comparison.