Ann Thorac Surg 2007;83:1240-1244
© 2007 The Society of Thoracic Surgeons
The Statistician's Page
What Are the Odds?
Gary L. Grunkemeier, PhD,
YingXing Wu, MD*
Providence Health System Cardiovascular Study Group, Providence Health & Services, Portland, Oregon
* Address correspondence to Dr Wu, 9205 SW Barnes Rd. Suite 33, Portland, OR 97225 (Email: yingxing.wu{at}providence.org).
 |
Introduction
|
|---|
This article coincides with the publication of two reports from The Society of Thoracic Surgeons (STS) Quality Measurement Task Force concerning the quality measurement in adult cardiac surgery [1, 2]. As described in their Technical Appendix [2], the authors analyses incorporate, appropriately, odds instead of probabilities. Here we compare the familiar concepts of probability and risk ratio with the less familiar concepts of odds and odds ratio (OR). In the process, we argue that the risk-adjusted OR is a technically better measure for comparing providers, such as hospitals and physicians, than the more commonly used observed-to-expected (O/E) risk ratio. Moreover, unlike the O/E ratio, the computation of the OR and its confidence interval is available in most statistical software programs. Before discussing their ratios, we introduce the measures themselves.
 |
Measuring Risk: Odds Versus Probability
|
|---|
The procedure-related risks that we will be considering are binary events such as death, in which each patient has either 1 or 0 events. Repeated events or composite events can be converted to this data type by the all or none scoring approach to data reduction [2]. Probabilities are widely used to describe such risks (eg, operative mortality). The probability of death is estimated as the number of deaths divided by the total number of patients. For example, if 1 of 5 patients dies, the probability of dying is 1/5 = .20, which is usually converted to a percentage, 20%. The odds is the number of deaths divided by the number of survivors. In the above example, the odds of dying is 1/4 = .25, sometimes written as 1:4 and spoken as "1 to 4".
The solid line in Figure 1
displays the relationship between probability and odds. The odds is always larger, since its denominator (number of survivors) is smaller than the denominator of the probability (total number of patients). The dashed line of identity in Figure 1 shows their similarity for probabilities up to about 20%, which includes most cardiac procedural complication rates. The example mentioned above (probability = .20 and odds = .25) is shown by the black circle in Figure 1. In this case, the probability of not dying would be .80 (80%), and the odds of not dying (odds in favor of living) would be "4 to 1," 4:1, or simply, 4 (Figure 1, gray circle).

View larger version (18K):
[in this window]
[in a new window]
|
Fig 1. The relationship between odds and probabilities (solid line). Probabilities range from 0 to 1 (0% to 100%). Odds range from 0 to infinity and are always larger than probabilities. The dashed line is the line of identity. For low-risk events, odds and probabilities are similar; for example, 0.25 and 0.20 (20%), respectively, shown by the black circle. For high-risk events, odds are much larger; for example, 4 and 0.80 (80%), shown by the gray circle.
|
|
Gamblers speak in terms of odds, in horse racing and other sporting events, whereas physicians and their patients generally prefer to use probabilities. Why not leave this "odd" language to gamblers? Well, although probability may be a more familiar concept, odds is a more natural metric for analysis, and especially for the comparison of risk-adjusted probabilities [2].
Most physicians have been exposed to odds because logistic regression is so widely used in medical research, and its output is usually presented as odds ratios. In logistic regression, it is not the probability itself, but the logarithm of the odds is used as the dependent variable in an otherwise linear-looking regression [3]. The reason for this is that the odds, and particularly its logarithm (the log odds, also called logit), is more natural to model, has no range restrictions as does probability, is symmetric about 0, and has a well-behaved likelihood function. The coefficients produced by logistic regression are transformed into ORs by exponentiation to provide intuitive measures of effects of the risk factors.
 |
Comparing Providers: Odds Ratio Versus Observed/Expected Ratio
|
|---|
It is well accepted that risk-adjustment should be used when comparing results among providers to allow for variations in case mix. An easy to interpret and widely used statistic for measuring a providers risk-adjusted performance is the O/E ratio. In this ratio, O is the observed number of events, and E is the expected number of events, the sum of the event probabilities for each patient derived from a risk model such as the STS [4] or European System for Cardiac Operative Risk Evaluation (EuroSCORE) [5]. The same ratio is obtained if O and E are expressed as proportions or percentages instead of numbers of events. If a providers observed rate is exactly what is predicted, then O/E will be exactly 1. O/E less than 1 indicates better performance than expected, and O/E greater than 1 indicates worse performance than expected.
Technical Problems With Observed/Expected Ratios
Although widely used, the O/E ratio has some technical defects. A providers O/E ratio should serve as a calibration factor, the number by which a patients expected risk is multiplied to produce her true risk if treated by that provider. For example, if a patient has an expected risk of 5%, her true risk should be 7.5% if treated by a provider with an O/E ratio of 1.5, and 2.5% if treated by a provider with an O/E ratio of 0.5. But this statistical requirement cannot hold for all combinations of O/E values. If a provider with an O/E ratio of 2.0 treats a patient with an expected risk of 60%, then that patients true risk would be 120%, a mathematic impossibility. This value is shown by the gray circle in Figure 2. All of the points on the dashed lines in Figure 2 that exceed 100% represent mathematic violations that can occur with O/E ratios.

View larger version (26K):
[in this window]
[in a new window]
|
Fig 2. The relationship between a patients expected mortality (horizontal axis) and that patients mortality (vertical axis) at the hands of providers with various observed-to-expected (O/E) ratios (dashed lines) and odds ratios (ORs) (solid lines). At a provider with O/E = 1 and OR = 1, the patients mortality risk is exactly as expected. At providers with O/E greater than 1, a statistical impossibility can arise because the mortality risk can exceed 100%, revealing a defect in the O/E measure itself. Unlike O/E ratios, all combinations of patient risk and provider ORs result in valid, that is, less than 100%, risks. The patient with an invalid risk of 120% based on O/E = 2 (gray circle) has a valid risk of 75% based on OR = 2 (black circle).
|
|
Another problem with O/E ratios is their lack of symmetry. If death is the event of interest, we should reach the same conclusions whether death or its complement event, survival, is chosen for analysis. For example, if the O/E ratio for death is 2.0 (say, if O = 40% and E = 20%) then the "risk" of survival should be 1/2; however, in this case, the O/E for survival is .60/.80 = .75, not .5. Conversely, if the O/E ratio for dying is .5, then the O/E ratio for living should 2, but it is not (except for one special case).
Solution: Odds Ratio
The provider-specific OR is the ratio of the observed-to-expected odds instead of probability. The OR is a more natural measure of provider performance and does not suffer from the technical problems described for the O/E ratio. Using the OR instead of the O/E ratio solves the problem of defective (greater than 100%) probabilities (Fig 2), because an OR can inherently accommodate any combination of patient risk and provider risk. The solid lines in Figure 2 show the relationship between expected patient risk and the provider-specific risk, for various provider ORs. The patient mentioned above, with an expected risk of 60%, now has a risk of 75% (black circle) in the hands of a provider with an OR of 2, a legal probability, rather than the impossible 120% (gray circle) in the hands of a provider with an O/E ratio of 2. Also, symmetry holds for the OR. If the OR for death is A, the OR for survival will always be 1/A, and vice versa, of course.
A providers OR can be obtained using logistic regression with the specification of an offset term (see Appendix), which unlike the O/E calculation, is a component of most statistical packages. This use of logistic regression was described by Cox as a technique for "testing of the agreement between an observed binary sequence and a corresponding sequence of probabilities" [6]. His method can be used to validate risk models by studying their discrimination and calibration properties [79]. For estimating a providers OR, the analog to O/E, only the calibration parameter of the Cox technique is needed to "calibrate" the provider: it estimates the providers OR; that is, the ratio of the patients observed odds at that provider to the patients expected odds (Appendix).
 |
Confidence Intervals
|
|---|
Because of random variation, a provider performing as expected will not have OR or O/E ratio exactly equal to 1. A common way of accounting for chance variability is to provide confidence intervals (CIs) around the point estimates. The point estimate is the best single estimate, but the CI contains a range of estimates that are also consistent with the data. As with most statistical estimates, there are several methods available to construct these intervals, each with advantages and disadvantages.
Observed/Expected Ratio
Hosmer and Lemeshow compare three CI methods for an O/E ratio derived from an external model: the commonly used normal approximation method that produces an interval two standard errors on each side of the point estimate, and two bootstrap methods, percentile and accelerated bias-corrected (BCa) [10]. The normal approximation method can produce a negative lower end point, although a logarithmic transformation is often used to deal with this deficiency [10, 11]. The bootstrap methods will not work if the observed mortality is 0. These methods usually require additional programming and are not directly obtainable from most statistical programs.
Odds Ratio
Of the many CIs methods available for various estimates, arguably the best method uses the likelihood function directly. The interval is formed by inverting the likelihood ratio test, and naturally provides estimates within the parameter value range. When CI methods are compared in various settings, this method is usually found to have superior properties [1215], but it cannot be applied to the O/E ratio because of the previously mentioned defect. It can, however, be applied to the OR (Appendix). When the ORs are produced by logistic regression, CIs are automatically part of the output. Likelihood-based intervals may be available, but if not, normal approximation intervals based on the logarithm of the odds are similar, as will be demonstrated in the next section.
 |
Clinical Example
|
|---|
To compare the OR and O/E ratios, we will examine the mortality associated with percutaneous coronary interventions performed from June 2001 through December 2004 at six Providence Health System (PHS) hospitals. The unadjusted mortality was 1.1% to 2.3%, but the patient characteristics also differed among the hospitals (Table 1). Is there evidence that these hospitals are performing differently than expected? Could the observed differences, after adjusting for differences in risk factors (patient mix), be due to chance alone? To obtain the expected probability of dying for each patient, we used the Clinical Outcomes Assessment Program risk model developed during a concurrent timeframe from hospitals in a geographically overlapping area [16]. This model was previously validated by using PHS hospital data [17].
View this table:
[in this window]
[in a new window]
|
Table 1 Characteristics of Patients Undergoing Percutaneous Coronary Interventions at Six Providence Health System Hospitals
a
|
|
 |
Results
|
|---|
Figure 3
portrays all of the statistics mentioned in this article for each of the six hospitals. Note that the risk-adjusted order of performance is quite different than the raw order of performance (Table 1). The O/E ratios are shown by wide horizontal bars with the three CI methods recommended by Hosmer and Lemeshow (from left to right: normal approximation, bootstrap percentile, and bootstrap BCa, using 1000 bootstrap resamples) [10]. ORs are shown by solid circles with two CI methods: likelihood-based confidence limits (vertical lines) and normal approximation limits applied to the logarithm (short horizontal bars). Note that the OR and O/E point estimates are almost equal for each hospital.

View larger version (18K):
[in this window]
[in a new window]
|
Fig 3. Risk-adjusted mortality after percutaneous coronary intervention (PCI) for the six Provident Health System hospitals (Table 1), as measured by observed-to-expected (O/E) ratios (horizontal bars) and odds ratios (ORs; solid circles). Confidence intervals are computed by three methods for the O/E ratios: normal approximation, bootstrap percentile, and bootstrap accelerated bias-corrected (BCa), and by two methods for the ORs: the likelihood-based confidence limits (vertical lines) and normal approximation limits applied to the logarithm of the OR (short horizontal bars).
|
|
 |
Comment on Interpretation
|
|---|
The chief complaint of those who advocate using risk ratios instead of ORs [1820] is the issue of interpretation. Investigators often interpret ORs as if they were O/E ratios by claiming, for example, that an OR of 4 means that "patients are four times more likely to die than expected." This is not quite true. Figure 4
shows the relationship between OR and O/E, and the dependence of this relationship on the baseline probability. An OR is always more extreme (further from 1) than the corresponding O/E ratio. But the differences are small when the baseline probability is small or the OR itself is close to 1. Both of these conditions often hold in cardiac surgery applications, so the OR is very close to the O/E (Figure 3). But in extreme cases, the OR can be much larger than the O/E ratio. For example, at a provider with an OR of 4, for an event with 10% prevalence, the event is only "3 times more likely" (Figure 4, black circle), and for an event with 20% prevalence, the event is only "2.5 times more likely" (Figure 4, gray circle).

View larger version (24K):
[in this window]
[in a new window]
|
Fig 4. The relationship between odds ratios (ORs) and observed-to-expected (O/E) ratios depends on the expected event rate. For ratios less than 1, they are very close (ORs are slightly smaller). For ratios greater than 1, they are fairly close when the expected event rates are 5% or less. The greater the expected mortality, the greater the difference. An OR of 4 is equivalent to an O/E ratio of 3.1 when the expected risk is 10% (black circle), and to an O/E ratio of 2.5 when the expected risk is 20% (gray circle).
|
|
 |
Conclusions
|
|---|
The O/E ratio is widely used to compare providers. For most cardiac events, the OR is very similar. The OR is a preferred measure because (1) it has superior technical properties, (2) it submits to a superior method of confidence interval construction, and (3) it can be obtained from most standard statistical packages, via the logistic regression routine.
 |
Appendix
|
|---|
Confidence Intervals for the Odds Ratio
A. Likelihood-based confidence intervals
The likelihood is the probability of the observed data, considered as a function of the unknown parameter(s). The value of the parameter that maximizes the likelihood is called the maximum likelihood estimate (MLE). Values of the parameter in the neighborhood of the MLE that produce likelihood values close to its maximum can be used to form a confidence interval [21]. If L(m) represents the likelihood as a function of the parameter m, and M is the maximum likelihood estimate of m, then the likelihood ratio is R = L(m)/L(M), and the likelihood ratio statistic, 2 · log(R), has a
2 distribution with 1 degree of freedom (
2). Thus, a 95% confidence interval around M includes all of the m for which 2 · log(R) < 3.84 (the 95th percentile of the
2 distribution), that is for which L(m) > .147 · L(M).
In the present application, the likelihood is expressed in terms of a providers OR, A = O'/O, where: O = p/(1 p) is the expected odds for the patient from the risk model, and O' = p'/(1 p') is the patients odds in the hands of that provider.
Using algebra, the patients provider-specific risk (probability) is p' = p
·
A/(p
·
A +1 p), and the logarithm of the likelihood (LL) is LL =
[d
· log(p') + (1 d) · log(1 p')], where
represents summation, d = 1 if the patient died and d = 0 if not. LL is maximized numerically to solve for A, and can then be used to find a range of values that form the likelihood-based confidence interval for A, as described above.
B. Using logistic regression
Using the above notation, the Cox equation is [6]: log(O') = a + b
· log(O), where a measures calibration and b measures discrimination. To estimate the OR, we use the calibration parameter only. To force b = 1, that is, to prevent the program from producing a coefficient for b, specify that log(O) is an offset term, so the above equation becomes: log(O') = a + log(O)
For a single provider, the logistic regression yields a single term, an estimate of a. Its exponential, A = exp(a), is the (point) estimate of the OR, the ratio of the odds at that provider to the predicted odds, as can be seen by exponentiating and rearranging the last equation: O' = A
· O, or A = O'/O. In some programs, likelihood-based confidence intervals for A can be requested. If not, the usual normal approximation confidence limits (l, u) for a can be exponentiated to get the corresponding confidence limits [L = exp(l), U = exp(u)] for A.
For n (= 2 or more) providers, one could use logistic regression as above for each provider, or the same result can be achieved with a single regression by entering "provider" as a categoric variable (x), with one level for each provider, specifying the offset term as above and suppressing the intercept term, so the equation becomes: log(O') = a
1
x
1 + a
2
x
2 + ... + a
n
x
n + log(O), where x
i = 1 if provider = i, and x
i = 0 otherwise. This will generate one coefficient a
i for each provider, and A
i = exp(a
i) will estimate that providers OR, with confidence limits given by the program, as above.
| For related articles, see page 1237 and April 2007 Supplement (Ann Thorac Surg 2007;83:S126)
|
 |
Acknowledgments
|
|---|
These hospitals contributed to the PHS Cardiovascular Study Group PCI database: Providence Anchorage Medical Center, Providence Everett Medical Center, Providence St. Peter Hospital, Providence Yakima Medical Center, Providence Portland Medical Center, Providence St. Vincent Medical Center.
 |
References
|
|---|
- Shahian DM, Edwards FH, Ferraris VA, et al. Quality measurement in adult cardiac surgery: Part 1Conceptual framework and measure selection Ann Thorac Surg 2007;83:S3-S12.[Medline]
- OBrien SM, Shahian DM, DeLong ER, et al. Quality measurement in adult cardiac surgery: Part 2Statistical considerations in composite measure scoring and provider rating Ann Thorac Surg 2007;83:S13-S26.[Medline]
- Anderson RP, Jin R, Grunkemeier GL. Understanding logistic regression analysis in clinical reports: an introduction Ann Thorac Surg 2003;75:753-757.[Free Full Text]
- Shroyer AL, Coombs LP, Peterson ED, et al. The Society of Thoracic Surgeons: 30-day operative mortality and morbidity risk models Ann Thorac Surg 2003;75:1856-1864discussion 18645.[Abstract/Free Full Text]
- Nashef SA, Roques F, Michel P, Gauducheau E, Lemeshow S, Salamon R. European system for cardiac operative risk evaluation (EuroSCORE) Eur J Cardiothorac Surg 1999;16:9-13.[Abstract/Free Full Text]
- Cox DR. Two further applications of a model for binary regression Biometrika 1958;45:562-565.[Free Full Text]
- Miller ME, Hui SL, Tierney WM. Validation techniques for logistic regression models Stat Med 1991;10:1213-1226.[Medline]
- Miller ME, Langefeld CD, Tierney WM, Hui SL, McDonald CJ. Validation of probabilistic predictions Med Decis Making 1993;13:49-58.[Abstract/Free Full Text]
- DeLong ER, Peterson ED, DeLong DM, Muhlbaier LH, Hackett S, Mark DB. Comparing risk-adjustment methods for provider profiling Stat Med 1997;16:2645-2664.[Medline]
- Hosmer DW, Lemeshow S. Confidence interval estimates of an index of quality performance based on logistic regression models Stat Med 1995;14:2161-2172.[Medline]
- Smith DW. Evaluating risk adjustment by partitioning variation in hospital mortality rates Stat Med 1994;13:1001-1013.[Medline]
- Maldonado G, Greenland S. A comparison of the performance of model-based confidence intervals when the correct model form is unknown: coverage of asymptotic means Epidemiology 1994;5:171-182.[Medline]
- Faraggi D, Izikson P, Reiser B. Confidence intervals for the 50 per cent response dose Stat Med 2003;22:1977-1988.[Medline]
- Goodall RL, Dunn DT, Babiker AG. Interval-censored survival time data: confidence intervals for the non-parametric survivor function Stat Med 2004;23:1131-1145.[Medline]
- Moerbeek M, Piersma AH, Slob W. A comparison of three methods for calculating confidence intervals for the benchmark dose Risk Anal 2004;24:31-40.[Medline]
- Maynard C, Goss JR, Malenka DJ, Reisman M. Adjusting for patient differences in predicting hospital mortality for percutaneous coronary interventions in the Clinical Outcomes Assessment Program Am Heart J 2003;145:658-664.[Medline]
- Wu Y, Jin R, Grunkemeier GL. Validating the Clinical Outcomes Assessment Program risk model for percutaneous coronary intervention Am Heart J 2006;151:1276-1280.[Medline]
- Montreuil B, Bendavid Y, Brophy J. What is so odd about odds? Can J Surg 2005;48:400-408.[Medline]
- Shrier I, Steele R. Understanding the relationship between risks and odds ratios Clin J Sport Med 2006;16:107-110.[Medline]
- Katz KA. The (relative) risks of using odds ratios Arch Dermatol 2006;142:761-764.[Abstract/Free Full Text]
- Pawitan Y. In all likelihood: statistical modelling and inference using likelihood. Oxford: Oxford University Press; 2001. pp. 528.
Related Article
-
An Innovative New Concept for Quality Measurement in Adult Cardiac Surgery
- Frederick L. Grover
Ann. Thorac. Surg. 2007 83: 1237-1239.
[Extract]
[Full Text]
[PDF]
This article has been cited by other articles:

|
 |

|
 |
 
G. L. Grunkemeier and Y. Wu
Reply.
Ann. Thorac. Surg.,
July 1, 2008;
86(1):
349 - 349.
[Full Text]
[PDF]
|
 |
|