## Abstract

Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This third installment of *Explorations in Statistics* investigates confidence intervals. A confidence interval is a range that we expect, with some level of confidence, to include the true value of a population parameter such as the mean. A confidence interval provides the same statistical information as the *P* value from a hypothesis test, but it circumvents the drawbacks of that hypothesis test. Even more important, a confidence interval focuses our attention on the scientific importance of some experimental result.

- estimation
- R
- software

this third article in *Explorations in Statistics* (see Refs. 11 and 12) provides an opportunity to explore confidence intervals. A confidence interval estimates our uncertainty about the true value of some population parameter.^{1}
For example, when we construct a confidence interval for the mean of some population, we expect, with some level of confidence, that the true value of the population mean will fall within that interval. A confidence interval provides the same statistical information as the *P* value from a hypothesis test, it circumvents the drawbacks inherent to that hypothesis test, and it provides information about scientific importance. The routine reporting of confidence intervals is recommended (1–3, 9, 13, 15, 18), but the meaning of a confidence interval is difficult to understand (7, 16). To be blunt, the meaning of a confidence interval is nearly impossible to understand unless you observe the development of its underlying concept. In this exploration, we will.

### A Brief History of Confidence Intervals

Unlike hypothesis tests whose origins can be traced to 1279 (25), confidence intervals are a recent development: Jerzy Neyman derived them in the 1930s (20–22). There would be a 50-year lag before medical journals advocated the use of confidence intervals (4, 5, 17, 18). It would be just 10 years before George Snedecor added confidence intervals to his historic *Statistical Methods* (24).^{2}
In 1913, 6 years before Fisher went to Rothamsted Station (11, 26), Snedecor arrived at Iowa State College as an assistant professor of mathematics (6, 8, 10, 19). In his courses, Snedecor derived examples based on agricultural and biological data from researchers at Iowa State. These collaborations led Snedecor to create the Mathematics Statistical Service (1927) and then the Statistical Laboratory (1933) at what is now Iowa State University. The hallmark of *Statistical Methods* is its focus on the application of statistical methods to actual scientific problems and data.

### R: Basic Operations

In the inaugural article (12) of this series, I summarized the freeware package R (23) and outlined its installation. For this exploration, there is just one additional step: download the script Advances_Statistics_Code_CI.R^{3}
to your Advances folder (see Ref. 12).

If you use a Mac, highlight the commands in Advances_Statistics_Code_CI.R you want to submit and then press (command key+enter). If you use a PC, highlight the commands you want to submit, right-click, and then click Run line or selection. Or, highlight the commands you want to submit and then press Ctrl+R.

### The Simulation: Observations and Sample Statistics

For these explorations (11, 12), we drew a total of 1000 random samples–each with 9 observations–from our population, a standard normal distribution with mean μ = 0 and standard deviation σ = 1 (see Ref. 12, Fig. 2). These were the observations–the data–for *samples 1*, *2*, and *1000*: `> # Sample Observations`

Each time we drew a random sample, we calculated the sample statistics listed in Table 1. These were the statistics for *samples 1*, *2*, and *1000*:
The commands in *lines 35–63* of Advances_Statistics_Code_ CI.R generate the observations and compute the sample statistics. These commands are identical to those in the first two scripts (11, 12).

With these 1000 sets of sample statistics, we are ready to explore confidence intervals.

### Confidence Intervals

When we began these explorations, we wanted in part to estimate μ = 0, the mean of our population. In the first iteration of our simulation, the sample mean *ȳ* = 0.797 estimated the population mean. In the second iteration, the sample mean 0.517 estimated the population mean. All told, we have 1000 sample estimates of the population mean: 900 of them are between −0.523 and +0.552 (Fig. 1). ^{4}
We can generalize from this empirical distribution of sample means to the theoretical distribution of the sample mean, a normal distribution with mean μ and standard deviation σ/ (12, 15), where *n* is the number of observations in the sample.

In the theoretical distribution of the sample mean (Fig. 2), 100(1 − α)% of the possible sample means are covered by the interval
where the allowance *a* is
(1) In *Eq. 1*, *z*_{α/2} is the 100[1 − (α/2)]th percentile from the standard normal distribution, and SD{*ȳ*} is the standard deviation of the sample means, σ/. The standard deviation of the distribution of the sample mean is also called the standard error of the sample mean SE{*ȳ*}.

Suppose we want the interval [μ − *a*, μ + *a*] to cover 90% of the possible sample means for 9 observations. In this situation, μ = 0, α = 0.10, and *z*_{α/2} = 1.645. Because we defined the population standard deviation σ to be 1 (see Ref. 12),
and the resulting allowance *a* is
Therefore, the interval [−0.548, +0.548] covers 90% of the sample means for 9 observations. This theoretical interval agrees with the empirical interval of [−0.523, +0.552].

When we calculate the interval [μ − *a*, μ + *a*], we use the population mean μ to learn about possible values of the sample mean *ȳ*. Is this what we do when we calculate a confidence interval for the mean of some population? Sadly, no. When we calculate a confidence interval, we use the sample mean *ȳ* to learn about possible values of the population mean μ. Happily, we can use the interval [μ − *a*, μ + *a*] to derive a confidence interval.

First, we write the interval [μ − *a*, μ + *a*] as the probability expression
What does this expression mean in words? It means that the probability is 1 − α that a sample mean is covered by–lies within–the interval [μ − *a*, μ + *a*] (see Fig. 2). Then, we rearrange the joint inequality portion of the expression to get
In this form, the interval
(2) is called the 100(1 − α)% confidence interval for the population mean μ. Now we have what we want.

In an actual experiment, we do not know the population standard deviation σ. Therefore, we use the sample standard deviation *s* to estimate the population standard deviation σ and *s*/ to estimate the standard error of the sample mean. In addition, when we calculate a 100(1 − α)% confidence interval for some population mean μ, we handle our uncertainty about the actual value of σ by replacing *z*_{α/2} (*Eq. 1*) with *t*_{α/2,v}, the 100[1 − (α/2)]th percentile from a Student *t* distribution with *v* = *n* − 1 degrees of freedom. As a result, the allowance we apply to the sample mean to obtain the 100(1 − α)% confidence interval (*Eq. 2*) becomes
where SE{*ȳ*} = *s*/.^{5}
This allowance is bigger than the allowance in *Eq. 1*: we are more uncertain about the value of the population mean μ. This happens because if *v* < ∞, then *t*_{α/2,v} > *z*_{α./2} for all values of α.

Suppose we want to calculate a confidence interval for the population mean μ = 0 using the observations 0.422, 1.103,…, 1.825 of the first sample. The mean and standard deviation of these 9 observations are *ȳ* = 0.797 and *s* = 0.702, and the estimated standard error of the mean is
Because *n* = 9, there are *v* = 8 degrees of freedom. If we want a 90% confidence interval, then α = 0.10, *t*_{α/2,v} = 1.860, and the allowance *a* = 1.860 × 0.234 = 0.435. Therefore, the 90% confidence interval is
What does this expression mean in words? We can declare, with 90% confidence, that the population mean is included in the interval [0.36, 1.23]. Because 0 is outside this interval, we can state, with 90% confidence, that 0 is not a plausible value of the population mean. This inference is consistent with our second exploration in which we rejected the null hypothesis *H*_{0}: μ = 0 and concluded that the sample observations were consistent with having come from a population that had a mean other than 0 (see Ref. 12).

But now we have a problem: a single confidence interval either does or does not include the true value of some population parameter. In a real experiment, we do not know which outcome has occurred. So the question is, where does the notion of confidence in a confidence interval come from? The answer: not from a single confidence interval but from a theoretical process of calculating a whole bunch of confidence intervals. For these explorations, we drew a total of 1000 random samples. Each time we drew a random sample, we calculated its mean and standard deviation. Because the population from which we drew the samples was distributed over a range of possible values, the sample means (see Fig. 1) and standard deviations (see Ref. 12, Fig. 3) varied among our 1000 samples. Therefore, we calculated 1000 different confidence intervals. We expect about 100(1 − α)% of these confidence intervals to include the actual value of the population mean (Fig. 3). This is the underlying concept of confidence in a confidence interval. The next question is, how do we use a confidence interval to help us make an inference about scientific importance?

In a manner similar to Ref. 14, suppose you find three articles in *Physiological Genomics* that investigated independently the impact of three different drugs on the expression of some gene. Suppose also that a fractional change of 0.25 (25%) results in an altered phenotype. Each study involved a sample of 9 subjects, and each reported a 90% confidence interval for the fractional change in expression of the gene. For each drug, these are the sample mean *ȳ*, sample standard deviation *s*, *P* value, and 90% confidence interval:
How do you interpret these results?

*Drug A* increased expression by 80%, a change that differed from 0 (P = 0.005). The confidence interval suggests the true impact of *drug A* is probably a 36–123% increase in expression, a change that is scientifically meaningful. *Drug A* produced a convincing change of scientific importance.

*Drug B* increased expression by 1%, a change that also differed from 0 (P = 0.005). The confidence interval suggests the true impact of *drug B* is probably a 0.4–1% increase in expression, a change that is scientifically trivial but quite precise. *Drug B* produced a convincing change of no scientific importance.

*Drug C* increased expression by 80%, a change consistent with 0 (P = 0.14). The confidence interval suggests the true impact of *drug C* could range from a 51% decrease to a 210% increase in expression. Either would be scientifically meaningful. Because it is relatively long, the confidence interval for *drug C* is an imprecise estimate of the true impact of *drug C* on expression of the gene. *Drug C* bears further study using a larger sample size.

Note that the scientific importance of the upper and lower bounds of a confidence interval depends on scientific context.

### Summary

As this exploration has demonstrated, a confidence interval is a range that we expect, with some level of confidence, to include the true value of a population parameter such as the mean. For example, when we construct a confidence interval for the mean of some population, we assign numerical limits to the expected discrepancy between the sample mean *ȳ* and the population mean μ. A confidence interval is useful because it focuses our attention away from a singularly statistical *P* value and toward the scientific importance of some experimental result.

In the next installment of this series, we will explore bootstrapping, a statistical technique even more recent than confidence intervals. Bootstrapping gives us an approach we can use to assess whether the inferences we make from hypothesis tests and confidence intervals are justified.

## Footnotes

↵1 A parameter is a numerical constant: for example, the population mean.

↵2 Snedecor published the early editions of

*Statistical Methods*in 1937, 1938, and 1940. William Cochran contributed a chapter to the 1956 edition and helped author the 1967, 1980, and 1989 editions.↵3 This file is available through the Supplemental Material link for this article at the

*Advances in Physiology Education*website.↵4 The command in

*line 75*of Advances_Statistics_Code_CI.R returns these values. Your values will differ slightly.↵5 The standard error of the sample mean SE{

*ȳ*} is identical to the standard deviation of the theoretical distribution of the sample mean SD{*ȳ*} in*Eq. 1*.

- Copyright © 2009 the American Physiological Society