recently, Curran-Everett and Benos (5) developed guidelines for reporting statistics in journals published by the American Physiological Society. In this brief commentary, I will comment on those guidelines as well as present my own personal view of this discipline to make this meaningful for readers of Advances in Physiology Education.

My interest in statistics was sparked by a rather unfortunate remark made by one of my teachers in medical school over 40 years ago. Emphatically stressing the wrong syllable and with a characteristic jiggle of the head, he declared “Statistics is an extremely boring subject, but I'm afraid I have to teach it to you.” I was cussed enough to find inspiration in that remark.

Statistics is far too exciting to be labeled dull or boring. The essence of any inquiry is to tease out meaningful information from a welter of noise. Proper use of statistics can give us the confidence that any statements we make at the end of our research can be justified or have “warranted assertibility,” to use Dewey's expressive term (7). The guidelines prepared by Curran-Everett and Benos (5) provide a checklist of do's and don'ts. In several earlier reports (3, 4), Curran-Everett and others have provided the appropriate background material, and reading those reports is crucial to appreciate the logic underlying the guidelines.

Modern science, which is built on 17th century foundations, seeks to infer events in the real world by manipulating it. Scientists provoke changes and assess the meaning and directions of those changes with calibrated instruments. They can select, at best, a small sample from a larger population (rats, mice, dogs, or humans) and attempt to infer from the effects observed in the small sample, what the effects would be if the entire population had been subject to the same manipulations. Knowledge of statistics helps them to extrapolate the information they gather from these limited samples to the larger population with varying degrees of confidence.

This helps them deal with two fundamental questions: *1*) Has there been a change in any of the quantities measured and *2*) Is the change large enough to be meaningful? The first question is answered by hypothesis testing and the second by estimation. It is important to establish whether or not a change has occurred that cannot be accounted for by random variations. However, hypothesis testing is fairly limited, and the authors point out it that is largely an artificial construct. The more important issue is whether the magnitude and direction of the change have any relevance. Unfortunately, considerable misunderstanding exists as to what statistics can and cannot do, and the authors deal with some very important elements in their short review. There is an unwarranted obsession with *P* values, in particular, and this may lead to either exaggerated or understated claims. This point is made more trenchantly by Goodman (8), when he refers to the *P* value fallacy. The authors emphasize that there is a clear distinction between statistical significance and scientific significance, with hypothesis testing pointing to the first, but only estimations revealing the latter.

The later set of guidelines is based on the information presented in this brief review. The guidelines are clear and provide very useful information for both authors and reviewers. Their suggestion that authors report a precise *P* value is a good one, since it does provide the reader with the options of considering or discarding the results presented. However, this point could be disputed, since some texts would state that there really is no point of quibbling once you have set a significance value that is acceptable. Daniel (6) also noted that authors have a tendency to place “more and more zeroes to the right of the decimal place to make a calculated *P* value more noteworthy,” so that the reported probabilities look like the scoreboard for a no-hitter in baseball, even though this may have “absolutely nothing to do with the practical significance of the results.”

There are other instances where the authors appear to be excessively prescriptive, particularly for contributors to this journal.

Consider, for example, the injunction to report variability using a standard deviation. This is excellent advice. However, for some studies reported in this journal, it may not be the best approach. Authors should be able to provide their data in the format that best suits their purposes without being constrained. In their review, Curran-Everett et al. (3) do agree that although a standard deviation is useful, it could be a deceptive index of variability, since even subtle departures from a normal distribution can make it useless. In educational settings, deviations from a normal distribution may occur fairly often when one is dealing with selected samples of students. As the authors are not unaware of this issue, it is surprising, then, that in their guidelines they are so prescriptive. Several years ago, I published a report (11) in this journal, where I reported on a course designed to teach students the elements of scientific discovery. I included a table that showed student evaluations of the course, where I showed data as median, mode, and range of responses to each question on a 5-point scale. I could well have given just the mean and standard deviation, but as the responses were skewed toward the higher end, giving the data in the format I chose was more meaningful.

Curran-Everett and Benos (5) also cautioned authors from presenting data giving variability in terms of standard errors of the mean rather than standard deviations. This is a reasonable position but could, again, be more rigid than necessary. I would argue that what we are really trying to do in most studies is to gauge what the population parameters are based on our sample characteristics. Thus, giving the standard error of the mean allows the reader to have some idea as to how closely the mean values reported approximate the population mean. Furthermore, giving standard errors of the mean and *n* values helps readers gauge the confidence limits of the mean.

Another point on which authors should be given leeway is to choose not to do any statistical tests at all. The authors point out, in an earlier report (3), the problem with tests of significance. This particular issue has been debated at length (1, 2, 6). Daniel (6) made much the same point when he suggested that editors should “require authors to avoid using SSTs (statistical significance tests) where not appropriate.” Carver (1), in an earlier commentary, was less conciliatory, stating bluntly that “Statistical significance testing has involved more fantasy than fact. The emphasis on statistical significance testing over scientific significance in educational research represents a corrupt form of the scientific method. Educational research would be better off if it stopped testing its results for statistical significance.” I would have liked the authors in their guidelines to make an explicit statement, by stating that when changes are clear enough, there is really no need to do any tests at all or fall back on what pharmacologists call the “the bloody obvious test” (9). Motulsky (10) made this point quite explicit when he said that “If your data speak for themselves, don't interrupt.” Although I am not disputing the need for clear guidelines and that the authors have succeeded admirably in their aims, I feel a bit uncomfortable that these may be interpreted in a narrow sense to exclude good information from being published in this journal.

- © 2007 American Physiological Society