## Abstract

Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This ninth installment of *Explorations in Statistics* explores the analysis of ratios and normalized–or standardized–data. As researchers, we compute a ratio–a numerator divided by a denominator–to compute a *proportion* for some biological response or to derive some *standardized variable.* In each situation, we want to control for differences in the denominator when the thing we really care about is the numerator. But there is peril lurking in a ratio: only if the relationship between numerator and denominator is a straight line through the origin will the ratio be meaningful. If not, the ratio will misrepresent the true relationship between numerator and denominator. In contrast, regression techniques–these include analysis of covariance–are versatile: they can accommodate an analysis of the relationship between numerator and denominator when a ratio is useless.

- analysis of covariance
- model II regression
- ordinary least-squares regression

this ninth paper in *Explorations in Statistics* (see Refs. 13–20) explores the analysis of ratios. As researchers, we compute a ratio–a numerator divided by a denominator–to compute a *proportion* for some biological response or to derive some *standardized variable*.^{1} In each situation, we want to control for differences in the denominator when the thing we really care about is the numerator: that is, we compute a ratio to normalize–or standardize–the numerator to the denominator. Often, we want to know if a ratio differs between groups: for example, between old and young or between treated and control.

Physiology is rife with ratios (Table 1). No wonder. A ratio is seductively simple. There is just one problem: the analysis of a ratio is complex. In this exploration, we will see why. First, we need to review the software we will use to investigate ratios.

### R: Basic Operations

The first paper in this series (13) summarized R (46) and outlined its installation. For this exploration there are two additional steps: download Advances_Statistics_Code_Ratios.R^{2} to your Advances folder and install the extra packages lmodel2 and smatr.^{3}

To install these packages, open R and then click Packages | Install package(s). . . .^{4} Select a CRAN mirror close to your location and then click OK. Select lmodel2 and smatr and then click OK. When you have installed these packages you will see
in the R Console.

#### To run R commands.

If you use a Mac, highlight the commands you want to submit and then press (command key+enter). If you use a PC, highlight the commands you want to submit, right-click, and then click Run line or selection. Or, highlight the commands you want to submit and then press Ctrl+R.

### The Trouble with Ratios: an Overview

Unless there is reason to believe that the regression passes through the origin, the ratio is of dubious value. It may do for rough work, but careful experimentation deserves [a] more efficient statistical method.George Snedecor (1946)

If a ratio controls for the impact of the denominator on the numerator, then there will be no relationship between the ratio and the denominator: the magnitude of the ratio will remain constant across all observed values of the denominator. But there is a paradox: if there is no relationship between the numerator and the denominator, then the mere calculation of the ratio will create a relationship between the ratio and the denominator (Fig. 1). If there *is* a relationship between the numerator and the denominator, a ratio will be effective only when the relationship between the numerator and the denominator is a straight line that intersects the origin (Fig. 2). This inescapable phenomenon is the foundation of Snedecor's understated warning.

In a 1949 paper published in the *Journal of Applied Physiology*, Tanner (57) illustrated the perils associated with ratios using physiological metrics like stroke volume and glomerular filtration rate. People in other disciplines have followed suit (1–4, 6, 35, 42, 43, 48).

A recent paper (5) in the *Journal of Applied Physiology* reinforced the notion that ratios can be troublesome, and it advocated transformation as a remedy. When we explored the bootstrap (16), we discovered that transformation can indeed be useful. If we transform a ratio, however, we simply rescale it: that fails to address its fundamental mathematical flaws. In other words, if we transform a ratio, we try to finesse a square peg into a round hole.

Now that we have a sense there is trouble lurking in the shadows of a ratio, let us explore the havoc a ratio can wreak.

### The Trouble with Ratios: an Example

In order to see the trouble ratios can cause, we need some data. Unlike most of our earlier explorations (13–19), however, this time we want to choose our data. Because the pig has played a conspicuous role within physiology,^{5} suppose we want to estimate–in pigs–the relationship between weight gain and the amount of feed consumed (Table 2). It turns out this relationship may provide critical information about some metabolic pathway, or it may provide essential economic information for pig farmers.

If we compute the ratio *y*/*x* to standardize the numerator *y*, weight gain, to the denominator *x*, the amount of feed consumed, then we discover that our sample of 10 pigs gained, on average, 0.20 lb for every pound of feed consumed (Fig. 3).^{6} In Fig. 3, the line that depicts this relationhip intersects the origin.

On the other hand, if we estimate the relationship between weight gain and the amount of feed consumed using ordinary least-squares regression (see Ref. 19), then we obtain (1)In *Eq. 1*, ŷ represents the predicted weight gain, *b*_{0} estimates the weight gain when no feed is eaten, and *b*_{1} estimates the weight gain when 1 lb of feed is eaten.^{7} How do we interpret this result? Our pigs gained 0.26 lb for every pound of feed consumed. This approach assumes there is no appreciable measurement error in *x*, the amount of feed consumed.

When we explored regression, we saw that if *x* does include measurement error, then our estimate of the slope between *y* and *x* will be depressed (19). If we estimate the relationship between weight gain and the amount of feed consumed using model II regression, a form of regression that can account for measurement error in *x* (38, 55, 61), then we obtain (2)Our pigs gained 0.41 lb for every pound of feed consumed. The commands in *lines 135–145* of Advances_Statistics_Code_Ratios.R return these values.

The theoretical flaws of a ratio have now shown themselves. In most cases, a ratio perverts the relationship between the numerator, the thing we really care about, and the denominator, the thing for which we want to adjust. Because many denominators in physiology are subject to measurement error, we will continue our exploration using model II regression. We will revisit ordinary least-squares regression in *Practical Considerations*.

### Regression: Another Approach

Suppose we want to study the relationship between glomerular filtration rate and kidney weight in two groups of pigs: a control group and a treated group. We randomly assign 30 animals to each group. We define our null hypothesis to be that, after treatment, glomerular filtration rate per gram of kidney weight will be similar in the two groups, and we establish a critical signficance level of α = 0.01 (21). Table 3 lists the observations from this simulated experiment.

If we plunge ahead and calculate a ratio to standardize glomerular filtration rate to kidney weight, we find that glomerular filtration rate per gram was 1.20 units in the treated group and 1.15 units in the control group (Fig. 4). If we assess our null hypothesis using either a two-sample *t* test or a permutation method (20), we reject the null hypothesis *(P* < 0.001) and conclude that glomerular filtration rate per gram differs between the two groups. The commands in *lines 301–309* of Advances_Statistics_Code_Ratios.R return these values and results.

Let us now step back, however, and consider model II regression as a tool to study the relationship between glomerular filtration rate and kidney weight in these two groups of pigs. First, we need to see how we generated the observations in Table 3.

To simplify our lives, suppose that the first-order model (3)defines the true relationship between glomerular filtration rate *Y* and kidney weight *X.* In *Eq.* 3, β_{0} represents the *Y*-intercept, β_{1} represents the slope of the relationship between *Y* and *X*, and ε represents random error in *Y* at each value of *X* (19). Suppose also that the random error ε is distributed normally with a mean of 0 and a standard deviation σ_{ε} = 1.

It turns out that we defined the coefficients β_{0} and β_{1} as in *lines 166–173* of Advances_Statistics_Code_Ratios.R. By using these values in *Eq. 3*, we generated the observed values of glomerular filtration rate *Y* for the control (*group 0*) and treated (*group 1*) groups listed in Table 3.

We generated the measured values of kidney weight by adding random measurement error to each known value of kidney weight *X*:
where ξ, the random error associated with the measurement of *X*, is distributed normally with a mean of 0 and a standard deviation σ_{ξ} = 1. The commands in *lines 193–218* of Advances_Statistics_Code_Ratios.R generate the observed values of *y* and the measured values of *x* listed in Table 3. Your values will differ slightly.

If we use model II regression (60, 61) to estimate the true relationship between glomerular filtration rate *Y* and measured kidney weight *X* + ξ in our two groups, then we obtain In addition to these estimates of β_{0} and β_{1}, we also learn that the slope of the relationship between glomerular filtration rate and kidney weight is similar in the two groups (*P* = 0.62) but that the elevation of the relationship differs (*P* < 0.001; see Fig. 5). The commands in lines 337–468 of Advances_Statistics_Code_Ratios.R execute the model II regressions and return these values. Your values will differ slightly.

Now we have trouble. If we analyze the ratio *y*/*x*, then we conclude that glomerular filtration rate per gram of kidney weight is greater in the treated group. But if we analyze the relationship between glomerular filtration rate and kidney weight using regression (see Fig. 5), then we conclude that the relationship between changes in glomerular filtration rate and changes in kidney weight–the slope Δ*y*/Δ*x*–is identical in the two groups. We conclude also that, at some kidney weight, glomerular filtration rate in the treated group is about 3 units greater than in the control group.

How do we make sense of these disparate conclusions? By recalling that a ratio is sound only when the relationship between the numerator *y* and the denominator *x* is a straight line through the origin. Here, because the group relationships between glomerular filtration rate and kidney weight fail to meet that requirement, our conclusion based on the ratio is meaningless.

Suppose we define the coefficients β_{0} and β_{1} in *Eq.* 3 as If we compute a ratio, we find that glomerular filtration rate per gram was 1.40 units in the treated group and 1.15 units in the control group (Fig. 6). Using either a two-sample *t* test or a permutation method, we conclude that glomerular filtration rate per gram differs between the two groups (*P* < 0.001). If we use model II regression, we learn that

These slopes also differ convincingly (*P* < 0.001). Once again, however, the ratio *y*/*x* distorts the magnitude of the relationship between the numerator *Y* and the denominator *X.*

When we explored regression (19), we discovered that residual plots help us decide if our statistical model of the relationship between *Y* and *X* is appropriate. Residual plots confirm that our model II regression models are appropriate (Fig. 7).

### Practical Considerations

Let us pretend we have some real data that others express typically as ratios (see Table 1). How do we proceed to analyze these data? If we are wedded to the notion of a ratio, then the first thing we must do is plot the data (see Ref. 19, p. 351, *Rule 1*).^{8} If the relationship between the numerator *y* and the denominator *x* is a straight line through the origin, which means the magnitude of the ratio *y*/*x* remains constant across observed values of the denominator *y*, then the ratio can be a meaningful quantity (see Fig. 2). What happens if this condition is not satisfied? The ratio will misrepresent the true relationship between *Y* and *X* (see Figs. 3 and 5) and will render meaningless between-group comparisons of the ratio.

In contrast, regression techniques are versatile: they can accommodate an analysis of the relationship between a numerator *Y* and a denominator *X* when the ratio *Y*/*X* is useless. When we used ordinary least-squares regression to estimate the relationship between weight gain and the amount of feed consumed by pigs (see Fig. 3), our approach was tantamount to analysis of covariance, a technique that blends regression with analysis of variance (12, 25–27, 40, 51) and one that others (1–8, 30, 37, 42, 43, 48, 49, 57, 59) have advocated for the analysis of ratios. Like ordinary least-squares regression, analysis of covariance assumes there is no measurement error in the denominator *X.*

If we use ordinary least-squares regression to estimate the relationship between a numerator *Y* and a denominator *X* when there is measurement error in *X*, then our estimate of the slope of the relationship between *Y* and *X* will be smaller by a factor of (4)In *Eq. 4*, σ_{X} is the standard deviation of *X* and σ_{ξ} is the standard deviation of the random error associated with the measurement of *X* (10, 19). If σ_{X} swamps σ_{ξ}, then the slope estimated using ordinary least-squares regression will resemble the slope estimated using model II regression.

If we want to compare in two groups the relationship between *Y* and *X*, the question is, does analysis of covariance (ordinary least-squares regression) provide a reasonably accurate estimate of the group difference in slopes even though it may underestimate the actual magnitudes of the individual group slopes? The answer appears to be yes (Fig. 8). For different values of σ_{ξ}, the standard deviation of the random error associated with the measurement of *X*, analysis of covariance returns about the same group difference in slopes as does model II regression (Table 4). This is useful: analysis of covariance is more common in statistical packages than is model II regression. Of course, you can always collaborate with a statistician (Ref. 21, *guideline 1*).

### Summary

Any criticism of the use of ratios by biologists [can elicit] a Pavlovian response . . . similar to that resulting from [a criticism] of motherhood, America, and apple pie.William Atchley and Dwane Anderson (1978)

Researchers love ratios. Statisticians loathe them.Michal Jasienski and Fakhri Bazzaz (1999)

As this exploration has demonstrated, the computation of a ratio, a quantity designed to normalize a numerator to some denominator, either as a *proportion* or as a *standardized variable*, is deceptively simple. The value of that ratio, however, is often suspect. Why? A ratio accounts for differences in the denominator only if the relationship between numerator and denominator is a straight line through the origin. If this condition is not satisfied, then all bets are off: the ratio will misrepresent the true relationship between numerator and denominator. As a result, we may fail to find a group difference that does exist, or we may find a group difference that does not exist. Who wants to gamble like this?

In contrast, regression techniques^{9} that estimate the numerator as a function of the denominator can accommodate an analysis of the relationship between numerator and denominator when the ratio is useless.

In the next installment of this series, we will explore the analysis of a change in some physiological response. Often, we express actual change as percent change so we can account for different initial values. But this creates a problem: percent change is just another ratio. In the next exploration, we will investigate how we can analyze the change in some physiological response.

## ACKNOWLEDGMENTS

I thank Gerald DiBona (University of lowa College of Medicine, Iowa City, IA), Gordon Drummond (University of Edinburgh, Edinburgh, UK), John Ludbrook (Department of Surgery, The University of Melbourne, Victoria, Australia), and Matthew Strand (National Jewish Health, Denver, CO) for the helpful comments and suggestions.

## Footnotes

↵1 When we compute a

*proportion*, the numerator and denominator represent the same biological response; we measure them at different times but in the same units. When we derive a*standardized variable*, the numerator and denominator represent different biological responses; we measure them at the same time but in different units.↵2 This file is available through the Supplemental Material link for this article at the

*Advances in Physiology Education*website.↵3 The lmodel2 package executes regression in situations where the predictor variable

*X*includes measurement error, and the smatr package executes standardised major axis estimation and testing routines. This exploration requires also the beeswarm and coin packages from our preceding exploration of permutation methods (20).↵4 The notation click

*A*|*B*means click*A*, then click*B*.↵6 The command in

*line 99*of Advances.Statistics_Code_Ratios.R returns this value.↵7 The commands in

*lines 124–125*of Advances_Statistics_Code_Ratios.R return these values.↵8 No matter how we analyze the data, it is essential to plot the data.

↵9 These regression techniques include analysis of covariance.

- Copyright © 2013 the American Physiological Society