we scientists rely on statistics. In part, this is because we use statistics to report our own science and to interpret the published science of others. For some of us, reporting and interpreting statistics can be like reading an unfamiliar language: it is awkward to do, and it is easy to misinterpret meaning. To facilitate these tasks, in 2004, we wrote an editorial (8) in which we proposed specific guidelines to help investigators analyze data and communicate results. These guidelines for reporting statistics can be accessed through the American Physiological Society (APS) Information for Authors (4).

In this followup editorial, we report initial reactions to the guidelines and the subsequent impact of the guidelines on reporting practices. We also revisit three guidelines. In 2004, we hoped the guidelines would improve and standardize the caliber of statistical information reported throughout journals published by the APS. We still do.

### Initial Reactions

Initial reactions to the guidelines were mixed. What we heard, however, was quite positive. This is really very helpful indeed. I wish all journals would adopt this as standard.

Physiologist

^{1}

We were delighted that many of the people who congratulated—even thanked—us were statisticians. I have just read the guidelines published in

*Physiological Genomics* and wish to congratulate you on a nice job![About] 6 years ago I reviewed a paper for an APS journal … I offered several trivial biological suggestions and then asked [the authors] to report the [standard deviation] rather than [standard error] to represent the variability about their sample mean. The authors adopted all of the biological suggestions but rejected the one statistical critique stating that it was standard policy to report the [standard error] and their colleagues all expected it.

I am quite certain that these guidelines will prompt much grousing from the biologists about being too theoretical and unnecessary, as well as from the statisticians that you left out the rules nearest and dearest to their hearts.

Statistician

If statisticians groused about the guidelines, they never groused to us. Good guidelines … Quite reasonable without being overly fussy. The interpretation of

*P* values is much more sensible than one often gets with medical journals, where *P* < 0.05 is all they care about.Statistician

Every now and then, a biologist did grouse: I do not agree with the edict about presenting data as [standard deviations] rather than [standard errors of the mean]. These presentations are for visual effect only … To me, this edict is silly, particularly since showing [standard deviations rather than standard errors of the mean] is a cosmetic issue only.

Physiologist

One biologist was moved to write a Letter to the Editor (15). For the most part, when someone did complain about the guidelines, it was Guideline 5. Report variability using a standard deviation.

about which they complained.

### Subsequent Impact

Within a year, the Editor of *Clinical and Experimental Pharmacology and Physiology* had solicited a critique of the guidelines (18). The critique set the stage for the journal to revise its own guidelines for reporting statistics (J. Ludbrook, personal communications).^{2}

In May 2006, we received an email from Australia that hinted the guidelines had little immediate impact on reporting practices: Many of us were pleased to see this article come out, particularly in a major, trend-setting publication. Early optimism faded as authors and editors did not appear to want to reinforce your message…

Most of my colleagues [report standard errors and

*P* < 0.05] and, when questioned, tell me that they are using the appropriate method: that is what I will find in the journals, that is what their predecessors used, they know what it means (so why don't I?), and in any case [the result is obvious].Physiologist

To estimate the actual impact of the guidelines on reporting practices, we sampled original articles published from August… 2003 through July 2004, the year before the guidelines, and we reviewed all original articles published from August 2005 through July 2006, the second year after the guidelines. If the guidelines affected the reporting of statistics, we expected the incidence of standard errors to decrease and the incidence of standard deviations, confidence intervals, and precise *P* values to increase.

What did our literature review reveal? That the guidelines had virtually no impact on the occurrence of standard errors, standard deviations, confidence intervals, and precise *P* values (Table 1). There were two exceptions: in one journal, the use of standard errors decreased from 88% to 81%; in another journal, the use of precise *P* values increased from 4% to 17%.

### The 2004 Guidelines Revisited

These guidelines addressed the reporting of statistics in the results section of a manuscript: Guideline 5. Report variability using a standard deviation.

Guideline 6. Report uncertainty about scientific importance using a confidence interval.

Guideline 7. Report a precise P value.

Our 2004 editorial (8) summarized the theoretic rationale for each of these guidelines. The subsequent publication of Scientific Style and Format (6) reinforced their application. Because they offer the greatest benefit to authors and readers of results sections, we revisit each of these guidelines.

#### Guideline 5. Report variability using a standard deviation.^{3}

The distinction between standard deviation and standard error of the mean is far more than cosmetic: it is an essential one. These statistics estimate different things: a standard deviation estimates the variability among individual observations in a sample, but a standard error of the mean estimates the theoretical variability among sample means (8, 9).

Individual observations in a sample differ because the population from which they were drawn is distributed over a range of possible values. The study of this intrinsic variability is important: it may reveal something novel about underlying scientific processes (12). The standard deviation describes the variability among the observations we investigators measure; it characterizes the dispersion of sample observations about the sample mean.

In contrast, the standard error of the mean provides an answer to a theoretical question: If I repeat my experiment an infinite number of times, by how much will the possible sample means vary about the population mean?

The fundamental difference between standard deviation and standard error of the mean is reflected further in how these statistics are defined. The standard deviation *s* and the standard error of the mean SE {*ȳ*} are (1) where *n* is the number of observations in the sample, *y*_{i} is an individual observation, and *ȳ* is the sample mean. Because it incorporates information about sample size, the standard error of the mean is a misguided estimate of variability among observations (Fig. 1 and Ref. 8). By itself, the standard error of the mean has no particular value (9). Even with a large sample size (*n* = 35), the interval (2) is just a 68% confidence interval. In other words, we can declare, with modest 68% confidence, that the population mean is included in the interval [*ȳ* − SE{*ȳ*}, *ȳ* + SE{*ȳ*}].

To summarize, in most experiments, it is essential that an investigator reports on the variability among the actual individual measurements. A sample standard deviation does this: it describes the variability among the actual experimental measurements. On the other hand, if an investigator were to repeat an experiment many times, and each time calculate a sample mean, the average of those sample means will be the population mean; the standard deviation of those sample means will be the standard error of the mean (9). A standard error is simply the standard deviation of a statistic: here, the sample mean. In nearly all experiments, however, a single sample mean is computed. Therefore, it is inappropriate to report a standard error of the mean–a theoretical estimate of the variability of possible values of a sample mean about a population mean–as an estimate of the variability among actual experimental measurements.

#### Guideline 6. Report uncertainty about scientific importance using a confidence interval.^{4}

A confidence interval focuses attention on the magnitude and uncertainty of an experimental result. In essence, a confidence interval helps answer the question, is the experimental effect big enough to be relevant? A confidence interval is a strong tool for inference: it provides the same statistical information as the *P* value from a hypothesis test, it circumvents the drawbacks inherent to a hypothesis test, and it provides information about scientific importance (9).

In biomedical research, the routine use of confidence intervals is recommended (1–3, 8, 9, 13), and, in clinical medicine, the use of confidence intervals is indeed widespread (2, 3). In journals published by APS, however, the incidence of confidence intervals was as rare in 2006 as it was in 1996 (see Table 1).

#### Guideline 7. Report a precise P value.^{5}

In 2004 we wrote that a precise *P* value communicates more information with the same amount of ink, and it permits readers to assess a statistical result according to their own criteria (8). You would think most authors would report a precise *P* value. Most do not (see Table 1).

On occasion, an author who reported a precise *P* value did so with unnecessary precision: The guidelines for rounding *P* values to sufficient precision are listed in Table 2.

### The Next Step

All of us–authors, reviewers, editors–gravitate toward the familiar. For most of us, reporting standard deviations, confidence intervals, and precise *P* values is quite unfamiliar. To make matters worse, these reporting practices have been unfamiliar for decades. There is considerable inertia to overcome before standard deviations, confidence intervals, and precise *P* values become commonplace in journals published by the APS.

Reform is difficult (5, 7, 10, 11, 14, 16). The question is, how can we help it along? In part, the answer is that all of us–authors, reviewers, editors, Publications Committee–must make a concerted effort to use and report statistics in ways that are consistent with best practices. With these guidelines (8), we summarized the best practices of statistics.

In 2004, we hoped that the guidelines would improve and standardize the caliber of statistical information reported throughout journals published by the APS. It is clear that the mere publication of the guidelines failed to impact reporting practices. We still have an opportunity.

## Acknowledgments

We thank Margaret Reich (Director of Publications and Executive Editor, American Physiological Society) for a 1-yr print subscription to APS journals, and we thank Matthew Strand (National Jewish Medical and Research Center, Denver, CO) and the Editors of the American journals for comments and suggestions.

## Footnotes

↵1 Because these comments reflect unsolicited personal correspondence, we have elected to withhold the names.

↵2 It has: see http://www.blackwellpublishing.com/submit.asp?ref=0305-1870&site=1.

↵3 This guideline is mirrored in

*Scientific Style and Format*Sections 12.5.2.3 and 12.5.2.4 of Ref. 6.↵4 This guideline is mirrored in

*Scientific Style and Format*Section 12.5.2.2 of Ref. 6.↵5 This guideline is mirrored in

*Scientific Style and Format*Section 12.5.1.2 of Ref. 6.

- © 2007 American Physiological Society