|
|
||||||||
TEACHING IN THE LABORATORY
Department of Biology, Swarthmore College, Swarthmore, Pennsylvania
| Abstract |
|---|
|
|
|---|
Key words: controls; randomization; Students t-test; teaching
| Introduction |
|---|
|
|
|---|
Most students know that an experiment should contain a control, but many find it difficult to define exactly what a control is. A high school science teacher in a recent laboratory workshop said that, in general, a control is "the animal doing nothing," by which she meant that the animal is in its home cage where it is not exposed to any features of the experimental treatment, not even the features for which the experiment needs to be controlled. Students often have difficulty both in verbalizing the general way in which a control is related to experimental treatments as well as in specifying the appropriate control(s) for a particular experiment. Many also believe that there must always be a specific treatment group that is designated as a control and that the only permissible way to create these different treatment groups is by random assignment. Students who have taken a statistics course may know more about statistical tests, but they dont always remember how to apply them to an appropriate experimental design. For example, once students are introduced to the statistical power of the paired t-test, they are eager to design experiments that use this test but frequently fail to incorporate adequate controls. A typical experiment design proposed by students is one in which levels of some physiological variable are measured first to get a baseline reading; these baseline measurements are then compared with measurements on the same individuals in response to a later treatment. Students are surprised to learn that such an experiment cannot, in fact, tell us whether the experimental treatment had any effect.
As those of us who are not full-time statisticians know from personal experience, it is far more effective to learn experimental design and statistical analysis in the context of an actual experiment. I find that the most effective incentive is an experiment that students must design to address a question to which they do not know the answer. Thus, I use a student-designed experiment on chicken embryo metabolism as an opportunity to teach the basics of simple experimental design and correct some of these misconceptions.
Classroom Implementation
I use the design process described here as the second 3-h laboratory session in a three-session module in which students learn laboratory techniques (session 1), design an experiment (session 2), and perform the experiment (session 3). Many implementations are possible, however, and some alternatives are discussed in Ref. 10.
The overall goal of the 3-h experiment design session is to reduce teacher-centered instruction and engage students in active learning as much as possible (3, 20, 21). I begin the laboratory session by providing a brief overview of the three types of t-tests (two-sample, paired, and one-sample t-tests). Most students entering the course in which I use this laboratory exercise have had previous practice with at least two of these t-tests. Armed with this information, students in each laboratory section (12 students) are asked to break into groups of two or three students. Each group is asked to arrive at a single design that meets the following criteria:
O2) at different temperatures to determine whether 17-day-old chicken embryos are endotherms or ectotherms. During the student-centered design process, I am prepared to provide handouts with the relevant additional information that the students will need to design a realistic biological experiment. In the case of the chicken embryo metabolism experiment, such information includes the safe range of temperatures for avian embryos (abstract from Ref. 26; 16–41°C) and the time that might be required between measurements for temperature equilibration (egg cooling curve, Fig. 2 in Ref. 10). The rationale for giving this information to students when they ask for it rather than at the beginning of the design session is that in real experiments, investigators are not presented with all the information they need at the outset. Students who learn to think logically about what they need to know before proceeding, and who can formulate specific questions that they can answer by consulting the instructor or literature, are better prepared to launch their own independent projects later in the course.
After each student group has finished this task, one group is chosen to share its design with the class. The design process progresses as this first design is modified to (or replaced by) a mutually agreed upon design in a discussion moderated by the instructor. As a starting point for the group discussion, it is actually instructive to choose a design with flaws, so that the class can engage in a discussion about how to improve the design to meet the requirements of the experiment. In the process of revising the design, I ask students to come up with some "rules" (general principles) for experimental design that they can apply to future experiments as well.
The thought questions presented at the end of this article can be used in a variety of ways to support the design process. They may be assigned, for example, as preparatory or followup homework. However, in my laboratory, I prefer students to work on at least some of these questions in small groups during the class period, when I am available for discussion. This means that relevant student questions can be addressed immediately and can serve as a learning experience for everyone in the class, not just the student who posed the question. Each time I lead an experimental design session, the discussion takes a slightly different course. I assign individual questions when the concept covered by that question becomes relevant to the discussion; students break into small groups to answer the question(s) before reviewing the answers as a class and moving on to the next stage of the design process.
In the weeks following an intensive session on experiment design, do not be surprised if your students need further reinforcement to establish firmly the concepts introduced in this lesson. Practice is essential. In my course, students write a strong-inference protocol after each design session. Writing this short but important document helps students to process what they have discussed in class, prepares them for performing the experiment, and provides ideas and text for the final laboratory report. For a general discussion of the strong-inference protocol and specific examples for the chicken embryo metabolism experiment, see the companion article (10).
Classroom outcomes.
To ensure that students have engaged with the concepts on their own before participating in the small-group discussions, they are asked to bring a draft experimental design to class on the day that we design our chick embryo metabolism experiment. Although I do not grade these draft designs, I collect them because they serve as a useful gauge of students prior knowledge and proficiency in experimental design. In a typical laboratory section of 12 students, only 1 or 2 students will have produced a well-designed experiment in which they have explained controls explicitly. Of the remaining students, roughly one-half will have designed an experiment with unclear controls, with wording such as "of course we would have to include controls" but without explaining what treatment the controls would undergo. The remaining designs typically incorporate obvious flaws, such as measuring all embryos at one temperature and then all of the embryos at another temperature.
Despite the need for periodic reinforcement of the concepts they have learned, students who have completed the 3-h experimental design session for the chicken embryo metabolism laboratory are better able to design future experiments and to do so faster. Three weeks after the chicken embryo metabolism exercise, the entire laboratory section goes on to design a second experiment concerning the effect of temperature on lizard locomotion. This time, it takes only
20 min of class time for students to design a well-controlled experiment that all members of the class can readily agree on. These two laboratory exercises are designed to prepare students for an independent project that they will undertake individually or in small groups in the second half of the semester. As with the lizard locomotion exercise, students appear readily able to apply the general concepts that they have learned from the chicken embryo experiment to other experimental problems. An example is provided by a student who approached me at the very beginning of the semester with the concern that she had never designed an experiment before and that she was therefore worried about her ability to complete the independent project that would be required of her later in the semester. When it came time for the independent project, however, she formulated an interesting question and designed a well-controlled experiment without any instructor assistance.
Experimental Designs for Comparing Two Treatment Groups
Two types of experimental designs typically emerge from the classroom discussion during a design session.
Design 1: control group.
The general design of this experiment (Fig. 1A) calls for dividing subjects into two groups, one of which is designated the control. Baseline measurements on all animals at a starting condition are used to create balanced treatment groups (see Creating treatment groups). The experimental group is then exposed to the experimental treatment, while controls are maintained at the starting condition under which baseline measurements were made. A second set of measurements is then made on all animals. The data are analyzed by comparing this second set of measurements between control and experimental animals with a two-sample t-test.
|
O2) of all 12 eggs is measured at one ambient temperature (usually 38°C, the normal incubation temperature for chicken eggs). Eggs are then divided into balanced groups (see Creating treatment groups) based on baseline
O2. A randomly chosen one of these groups (designated the experimental group) is then placed at a lower temperature (room temperature,
23°C, is convenient). Control eggs are handled similarly but returned to 38°C. After 90 min of equilibration at these two temperatures, the
O2 of all experimental and control eggs is measured again, and a two-sample t-test is used to compare these measurements between the experimental eggs at 23°C and control eggs at 38°C.
Design 2: treatment order control
The general design of this experiment (Fig. 1B), often referred to as a crossover design, calls for each subject to be measured under both conditions, with one-half of the subjects experiencing the conditions in reverse order. A paired analysis is used to compare the measurements obtained in the two conditions. Because the analysis considers only the differences between the two measurements from each individual, this powerful design eliminates the "noise" resulting from differences between individuals.
EXAMPLE FOR THE CHICKEN EMBRYO METABOLISM EXPERIMENT.
The
O2 of each embryo is measured at each of two ambient temperatures (38 and 23°C are convenient, as above). Six of the embryos are measured first at 38°C and then at 23°C, while the other six embryos are measured at the same times but in the reverse order of temperature treatments. As in design 1, the two measurements are separated by 90 min to allow for thermal equilibration.
General Considerations for Both Designs
Regardless of which design is ultimately chosen, several important elements should be included in the classroom discussion. Note that a discussion of statistical analysis cannot wait until after the data are collected. Rather, the experiment is designed specifically with statistical analysis in mind, to insure that the data will be suitable for analysis and that the data will actually answer the question addressed by the experiment. See the companion article, "The strong-inference protocol: not just for grant proposals" (9), for further discussion and specific examples.
Controls.
In both designs, we measure
O2 of some or all of the eggs first at one temperature and then at another. It is important to remember, however, that because the two measurements did not occur at the same point in time, temperature is not the only variable that the embryos will have experienced differently in the two measurements. Other variables that differ between measurements taken at different times include the 1) total amount of handling, 2) time since first handling, 3) learning, 4) time of day, and 5) when the time difference between measurements is large and/or the animals are developing rapidly, stage of development will be different for measurements taken at different times of the day.
Thus, we need to control for the effects of the time-dependent variables in which we are not interested (the extraneous variables) to gain information about the independent variable of interest (temperature). There are two methods for doing so. The first is to run a parallel control group that receives exactly the same treatment as the experimental group except for the variable of interest (design 1; Fig. 1A). In this case, all of the subjects experience exactly the same amount of handling, are being measured again the same number of hours after first handling, and are being measured at the same times of day and at the same developmental stage. Therefore, the only variable that differs between the control and experimental groups at the second measurement is temperature, and if we compare the
O2 between the two groups at the second time point, any differences should be due to temperature alone. We thus define a control group as a group of animals that receives treatment identical to that of the experimental group except for the independent variable of interest.
The second general method for controlling for extraneous variables is to measure each individual in both experimental conditions (here, high and low ambient temperature) but to randomize the order in which subjects experience these conditions so that any directional effects of the four extraneous variables are averaged and therefore cancel one another out when
O2 is compared between the two temperatures (design 2; Fig. 1B). In this design, there is no group designated as a control; rather, the overall design of the experiment controls for the extraneous variables.
Creating treatment groups: is random assignment always best?
Ask any student how individuals should be separated into treatment groups, and he or she will dutifully reply, "Randomly." Yet, most students are intuitively aware that when a group of organisms is divided into two treatment groups, it would not be a good idea to put all the females in one group and all the males in the other or all the largest individuals in one group and all the smallest in the other. By assigning subjects to treatment groups in a way that avoids creating groups with obvious differences, however, they would be violating the edict of random assignment. Which procedure, then, is the correct one and how can we decide which to use?
First, it must be understood that random assignment is not the goal. The goal is to avoid creating unbalanced samples. Under many conditions, random assignment is the preferred method for achieving this goal (e.g., Refs. 4 and 16). When we have little or no information about the individuals we plan to test, random assignment is the only way to proceed. However, because random assignment is random, occasionally this method will result in unbalanced samples (e.g., one group might contain significantly more males or significantly larger individuals than the other). Even if unintentional, such differences pose problems for interpreting the results of the experiment later on. Consider an experiment in which birds are randomly assigned to two treatment groups. One group is given a hormone treatment, and the metabolic rates of all the animals are then measured to determine whether the hormone has any effect on metabolic rate. If random assignment had, by chance, resulted in two groups that differed substantially in average body mass or sex ratio, this experiment would be unable to separate the effects of mass, sex, and hormone treatment, all of which are known to affect metabolic rate. In other words, the effect of the hormone is confounded with the effects of mass and sex in this experiment. Treatment groups can validly be rebalanced after data have been collected (22), but this procedure is undesirable because it reduces sample size, and thus statistical power, and necessitates a more complex subsequent statistical analysis that is beyond the scope of students just beginning to learn the basics. It is far better to balance groups at the beginning of the experiment, and this is a practice we should be teaching our students explicitly so that they too can get the most from every one of their experiments.
Whether or not an investigator knows about or routinely practices preexperiment balancing seems to be related largely to the particular organisms the investigator studies. When inbred strains of rat of a particular sex and age are being studied, for example, the animals are so similar genetically and physiologically that random assignment to treatment groups is highly unlikely to produce groups with systematic differences. Experimenters who work with wild animals, on the other hand, tend to balance treatment groups as a matter of course, particularly when sample sizes are limited because the animals are rare, difficult to catch, or require elaborate holding facilities. In general, randomized group assignment is preferred when sample sizes are large or when the experimental organisms are very similar. When sample sizes are small and there is considerable interindividual variation, however, creating balanced groups greatly reduces the chance that a variable other than the intended independent variable is responsible for the effects observed. Many students are in fact already familiar with the general principle that the risk of selecting a sample that does not represent the population increases as sample size decreases. They may know this phenomenon as "sampling error," and may have encountered it in the form of the founder effect in the evolution of small, isolated populations. This is an excellent opportunity to point out that the same principle is at work here.
Far from being an "advanced" technique unnecessary for undergraduates to learn, balancing groups at the outset of an experiment is a basic procedure that can improve the odds of success for any experiment involving small numbers and/or animals with large interindividual differences. Just as it is important to inculcate students with the principles of ethical animal treatment at the earliest possible stage, it is important to train them to design the most effective experiments with the smallest feasible sample sizes. Treatment group balancing is one of the important tools that we can use to comply with the 3 Rs of research–reduce, refine, and replace–as mandated by the Animal Welfare Act and expanded on in the National Institutes of Health Guide for the Care and Use of Laboratory Animals (13).
WHICH VARIABLES SHOULD BE USED TO BALANCE TREATMENT GROUPS?
If we are interested in the effects of an independent variable on
O2 but we know that individuals vary considerably in their baseline metabolic rates, then baseline
O2 would be the most useful measure for dividing experimental subjects into balanced groups. However, in many experiments, baseline measurements are never made; instead, it is assumed that random assignment will create evenly matched treatment groups. If this assumption is met, such an experimental design is perfectly adequate. The results of such experiments appear simply as the data that are collected in the measurement at time t in experimental design 1 (Fig. 1A).
There are some experiments for which baseline measurements of the variable being studied are not practical. For example, it might be useful to know what an animals baseline response to an injection of endotoxin (lipopolysaccharide) is before assigning animals to treatment groups in a study of how the response to lipopolysaccharide changes with age or is affected by a particular pharmaceutical agent. However, because animals habituate rapidly and strongly to lipopolysaccharide treatment, the baseline test would severely blunt or possibly even obliterate the response of interest in the experimental test. Even in such cases when information about the variable of interest is not available, it is to the experimenters advantage to increase the probability of creating balanced groups by considering other independent variables that are likely to have an impact on an animals response in an experiment. Body mass (1, 2, 5, 19), sex (5, 6, 14, 15, 27), age (e.g., 5, 12, 25), and strain or population (4, 24, 25) are four of the most common, most easily determined, and potentially most important variables.
HOW DO WE KNOW WHEN TREATMENT GROUPS ARE BALANCED?
Mean and variance in any trait potentially affect the mean and variance of the independent variable, which in turn contribute to the outcome of statistical comparisons. Thus, both parameters should be balanced for each variable being considered. A conservative rule of thumb is to create groups with similar SEs (or variances or SDs) in which the ranges for means ± SE in the treatment groups overlap one another (8). For instructions on how to set up and use a spreadsheet to create balanced groups, see Fig. 2. Once balanced groups have been created, the groups should be randomly assigned to the experimental and control treatments (7). Note that the initial measurements used as the basis for balancing groups (e.g.,
O2) should be collected in as unbiased a way as possible; alternatively, potentially biased variables can be added to the list of variables to be balanced. Consider, for example, an experiment in which eggs arriving in one carton are numbered 1–12 and those arriving in another carton are numbered 13–24; the
O2 of the eggs in the first carton is measured first and
O2 of the eggs in the second carton is measured second. In this case, carton number should be one of the variables balanced.
|
DESIGN 1.
The only t-test required is a two-sample comparison between the experimental and control groups at the second measurement, as discussed above. In fact, no other comparison will isolate the effect of the independent variable on the dependent variable (see Fig. 1A for a visual explanation).
DESIGN 2.
In this design, a paired t-test takes advantage of the additional statistical power offered by measuring each individual in both conditions, thus eliminating from the analysis any differences due to an intrinsically high or low
O2 in any particular individual. The paired t-test computes the difference between the two measurements for each embryo; it then computes the mean of these differences and compares that mean with zero, the mean difference that would result if there were no consistent effect of temperature on
O2 (see Fig. 1B for a visual explanation). The power of the paired t-test is illustrated in Fig. 3.
|
For the two-sample t-test, measurements obtained in each condition should be roughly normally distributed (there should be no obvious outliers, the data should not be bimodally distributed, and the data should not be strongly skewed to high or low values). Students frequently make the mistake of plotting frequency histograms of all the data, which will produce a strikingly bimodal distribution if there is a significant difference between the two treatment groups; they must be sure that the data are plotted separately for each group being compared. In the chicken embryo metabolism experiment, this would mean checking the distributions for the data for control embryos separately from the data for experimental embryos. t-Tests that do not assume equal variances are easiest to use; otherwise, any of several tests offered by your statistical software package can be used to test for heterogeneity of variance; note that P values of <0.05 indicate that variances are significantly different.
To test the assumptions of the paired t-test, data from each individual are reduced to a single value: the difference between the two measurements. The resulting set of differences should be plotted to determine whether they are normally distributed.
If the assumptions of the t-test are not met, the easiest solution is to substitute a nonparametric test. For example, the Mann-Whitney U-test may be substituted for the two-sample t-test or the Wilcoxon matched-pairs test may be substituted for the paired t-test; your statistical software package may have slightly different choices. Nonparametric tests are less powerful but make no assumptions about the form of the data; instead of relying on distributions around means, which can be strongly affected by outliers, many nonparametric tests use the relative ranks of the data, which are only minimally affected by outliers. Other solutions, such as transforming the data so that the distribution becomes normal, can also be used (23). It is not possible to include a more thorough discussion of the statistical tests and principles referred to here; readers unfamiliar with basic statistics should consult a textbook or other source, such as a Biostats Basics by Gould and Gould (8), which includes references to a website where statistical tests can be performed online.
General Principles
These are some of the general principles that I hope my students will derive from the exercise of designing a simple experiment. Some of the thought questions included at the end of this article are designed to prompt students to state these principles for themselves. I find that this student-centered method is generally more effective in instilling the principles than simply providing them for students to memorize.
Principle 1.
To control for time-related variables, one must always compare experimentally treated animals with controls that, with the exception of the independent variable that is being tested, have experienced identical conditions throughout the experiment. A simple before-after comparison within individuals is not valid because it does not control for extraneous time-dependent factors (see thought questions 1, 4, and 6)
Principle 2.
A control group is a group of animals that receives treatment identical to that of the experimental group except for the independent variable of interest. Similarly, a control treatment is identical to the experimental treatment except for the independent variable of interest (see thought questions 1, 2, and 6–8).
Principle 3.
A well-designed experiment does not necessarily have a "control group." If each individual receives all the treatments in random order, there is no specifically designated control group, but the experiment as a whole is controlled (see thought question 6).
Principle 4.
Some variables, such as temperature, do not have an obvious control condition because it is not possible to design a treatment lacking this variable. In such cases, a controlled experiment compares two treatments that are identical except for the variable of interest (e.g., temperature) (see thought questions 6 and 7).
Principle 5.
The goal of random assignment to treatment groups is the creation of balanced groups. However, when sample sizes are small and/or there is large variation among individuals, actively balancing groups before the experiment starts is a more reliable method than random assignment for creating balanced groups (see thought questions 2 and 4).
Principle 6.
Experiments that can be analyzed with a paired statistical test are powerful because they eliminate baseline differences among individuals (see thought question 8).
Thought Questions
These questions are designed to help students discover some of the general principles of experimental design and analysis in an active way (see Classroom Implementation for specific suggestions on using these questions to supplement your classroom discussion). Depending on the way in which you use these questions, you may want to change the order of the questions or omit some of them. Suggested answers are shown in italics.
Question 1.
The
O2 of 10 frogs was measured shortly after the frogs had been captured in the wild and brought into the laboratory. An insecticide was then added to the aquarium water of all 10 frogs at a dose similar to what the frogs might experience in their natural habitat. After 4 wk of insecticide treatment, the
O2 of the 10 frogs was measured again.
O2 of the frogs? Answer: results suggest that the insecticide has no effect on the
O2 of the frogs.
O2 of all 20 frogs was measured shortly after the frogs had been captured. However, the aquarium water for the second group of 10 frogs was not treated with the insecticide. The
O2 of this control group of 10 frogs was also remeasured after 4 wk. The complete set of results obtained in this experiment is shown in Fig. 4B. Assuming that no mistakes were made in the experiment or in the collection of the data, and that all of the frogs were in equally good health when captured, propose a biological explanation for the results shown in Fig. 4B. Answer: the control animals demonstrate that over time in captivity,
O2 increases. A possible explanation is that the animals may have been affected by the stress of capture and the new captive environment when they were first brought into the laboratory and that as they habituated to this new environment, this response abated and
O2 increased. However, the
O2 of insecticide-treated frogs did not change over time; possible explanations are that the insecticide induced a continuous stress state or that the insecticide directly inhibited
O2 in some other way. Students with more detailed knowledge of the molecular pathways in cellular respiration could be asked to follow up their hypothesis by proposing particular molecular targets of the insecticide.
O2 of the frogs? Answer: the insecticide reduces the
O2 of the frogs.
|
|
Question 4.
Figure 6 shows the results of a study in which
O2 was first measured in all animals at 30°C (time = 0 min). These baseline
O2 were used to separate the animals into two groups. One of the groups (experimental) was then subjected to a new temperature (20°C), while the other group (control) remained at the original temperature. After 90 min, when the animals had equilibrated at the new temperatures, the
O2 of all animals was measured again. (Note that although error bars would normally be included in such a graph, they have been omitted for the sake of clarity.)
|
O2.
Question 5.
Figure 7 shows the results of an experiment testing the effect of a mild sedative on activity in a nocturnal rodent. Baseline activity data were recorded at time 0 and used to create balanced treatment groups. Activity was then recorded again after each group had received its treatment: the experimental group was treated with sedative plus vehicle, whereas controls were treated with vehicle only.
|
Question 6.
Question 7.
What is a control? Write a general definition that could apply to any experiment in which a group of subjects is designated as a control group. How would you describe the control condition (or treatment) in an experiment in which all subjects experience the control condition but in different orders? Answer: a control group is a group of animals that receives treatment identical to that of the experimental group except that they are not exposed to the independent variable of interest (general principle 2). Similarly, a control condition or treatment is identical to the experimental treatment except that the subjects are not exposed to the independent variable of interest. Note that what constitutes the control condition for an omnipresent, continuous variable such as ambient temperature is not always obvious. The experiment may simply be comparing what happens at two temperatures, but the rules for setting up and analyzing the results of the experiment are the same.
Question 8.
Does every well-designed experiment have a group that is designated the "control?" Describe a controlled experimental design in which no single group is the control group. Answer: see experimental design 2 (Fig. 1). See also the final note in the suggested answer to thought question 7.
| Acknowledgments |
|---|
Received for publication May 13, 2006. Accepted for publication September 4, 2006.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
E. Tarnus and E. Bourdon Exploring the glycemic response to food intake with undergraduate students at the University of La Reunion Advan Physiol Educ, June 1, 2008; 32(2): 161 - 164. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. M. Hiebert and J. Noveral Are chicken embryos endotherms or ectotherms? A laboratory exercise integrating concepts in thermoregulation and metabolism Advan Physiol Educ, March 1, 2007; 31(1): 97 - 109. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||