|
|
||||||||
Ann Thorac Surg 2005;79:16-20
© 2005 The Society of Thoracic Surgeons
The National Heart and Lung Institute, Imperial College of Science, Technology, and Medicine, Department of Cardiothoracic Surgery, St Mary's Hospital, London, United Kingdom
* Address reprint requests to Dr Athanasiou, Robotic and Minimally Invasive Cardiothoracic Surgery, 70 St Olaf's Rd, Fulham, London, SW6 7DN UK; (E-mail: tathan5253{at}aol.com).
| Abstract |
|---|
|
|
|---|
| Background |
|---|
|
|
|---|
The SROC analysis is applied to data which have been pooled from multiple sources. Why use SROC if simple averages will suffice? Data pooling can produce misleading results if the data sets vary between each other in terms of size, or study quality [9]. Poorly conducted or reported studies are more likely to produce outlying results, which skew the overall pooled data. A weighted average can be biased towards large studies, or studies comprised of very similar results. It can be difficult to identify outlying data and exclude it. On the other hand, more data mean wider conclusions can be reached. The SROC analysis deals with pooled data without these pitfalls.
| Guidelines for Systematic Review of Diagnostic Tests |
|---|
|
|
|---|
The simplest method for analyzing pooled data from multiple studies is calculating sensitivities and specificities (Table 1) and their averages. This is valid when the same criteria for a positive result have been used in each study, and each study is of similar size and quality. If different criteria, or thresholds, have been used, there will be a relationship between sensitivity and specificity across the studies. As sensitivity increases, specificity will generally drop. This is the threshold effect. In these cases, weighted averages will not reflect the overall accuracy of the test, as the extremes of threshold criteria can skew the distribution.
|
|
| Principles of ROC |
|---|
|
|
|---|
Once the test and true results are known for a diagnostic threshold, sensitivity and specificity are calculated. Another threshold is chosen, and sensitivity and specificity calculated again. Eventually there is a number of (sensitivity, specificity) pairs for the test, each corresponding to a different diagnostic threshold. (Sensitivity, 1-specificity) pairs are then calculated and plotted. Sensitivity is on the vertical axis, and (1-specificity) on the horizontal. These points make up the basis of the ROC graph (Fig 1).
|
To demonstrate its use, consider the contrived VB example. An airway stenosis is distinguished from a normal airway by the degree of airway narrowing. The investigator nominates a minimum degree of narrowing required for a positive diagnosis. By changing the threshold, the number of positive and negative results on VB changes. For example, if a minimum of 5% narrowing is required for diagnosis of a stenosis, many airways will be positive for stenosis (high sensitivity, low specificity), whereas a minimum threshold of 90% narrowing produces far fewer positive results (low sensitivity, high specificity).
The advantage of ROC is that accuracy is plotted for different thresholds and compared. Overall test accuracy is measured by the closeness of the graph to the top left corner, which represents high sensitivity and specificity. This is more easily visualized by a curve placed over the points. The closer the curve to the unit square, the better the overall accuracy. The curve is made either by fitting the points together in a straight line, or using a smoothing function [1115].
Sensitivity and specificity values lie between zero and one inclusive. This makes the area under the curve (AUC) for a perfect test equal to one. The random test, allocating positive results half the time, has an AUC of 0.5.
The AUC can be calculated for different diagnostic tests, and then compared to each other [1113]. An AUC closer to one indicates a better test. It is the probability of a randomly selected pair of a true positive and a true negative being ranked as such by the diagnostic test [1618].
| Principles of SROC Curves |
|---|
|
|
|---|
The SROC curve is placed over the points to form a smooth curve. It is calculated from a number of possible formulas. The most commonly used is a regression model [10] where sensitivity and (1-specificity) are transformed into complex logarithmic variables and graphed (for purposes of discussion, log variables 1 and 2). A regression equation is calculated, and the variables are manipulated to achieve sensitivity as a function of (1-specificity). This is the equation for the SROC curve, which is then plotted over the original (sensitivity, 1-specificity) points on the original axes.
The VB example demonstrates these steps. The sensitivity and (1-specificity) values in Table 2 are plotted on the normal axes (Fig 2). The points are transformed into the complex logarithmic variables and plotted with a regression line (Fig 3). The variables in the regression equation are transformed back into TPR and FPR. The regression line is transformed from a straight regression line (Fig 3) into the SROC curve on the original axes (Fig 4). The SROC curve shows that the points do not all lie on the curve. The pooled sensitivity and specificity values (the X in Figs 2 and 4) do not lie on the curve; they appear to underestimate the accuracy of the test.
|
|
|
The AUC is calculated for SROC as for ROC. The diagnostic test is constant throughout the studies, so the AUC reflects overall performance of that test. The perfect test will again have an AUC of one. The SROC is reproducible and thus the AUC can be used to compare accuracy of different diagnostic tests [18].
The AUC for the VB example is 0.82. While reasonable, it shows that in this example, VB accuracy needs improvement before being adopted as first line investigation in these patients. Weighted analysis to assess the impact of study numbers showed no difference in AUC values. This implies that there was no significant bias of study size.
If the FPR values are limited to part of the range, the SROC curve and AUC calculation will only be accurate for this range. A partial AUC (for the range of experimental FPR points) is one possible solution [14, 15, 18], although it cannot be used for comparison with other tests.
| Q and Diagnostic Odds Ratio |
|---|
|
|
|---|
Q is appropriate provided high sensitivity and high specificity are equally desirable. If one is clinically more important than the other, Q does not address the clinical usefulness of the test. In these cases, overall accuracy is not as relevant as overall sensitivity or specificity.
For the VB example, a positive result may lead to further investigation or surgery, while a negative result may prompt no further immediate or invasive action. High sensitivity is clinically more important than high specificity, although that is also desirable. Generally, failing to recognize a potentially malignant airway lesion has greater implications than performing an unnecessary investigation. As well as overall accuracy, we are interested in overall sensitivity. In this case, the antidiagonal crosses the SROC curve at (0.25, 0.75), giving a Q of 0.75. This shows overall accuracy is reasonable, but it does not indicate its overall sensitivity.
Diagnostic odds ratio (DOR) is calculated from sensitivity and specificity (Table 1). It is another measure of the overall diagnostic power of the test. A high DOR implies that the test shows good diagnostic accuracy in all patients. For the VB example, DOR was 8.98, significantly greater than one. This implies that VB is better than the random test at diagnosing airway lesions.
| How Many Studies? |
|---|
|
|
|---|
| Comment |
|---|
|
|
|---|
Study numbers are important, small studies are prone to producing outlying results, and shift the overall outcome. Calculations can be weighted for study size and the results compared to nonweighted calculations. If there is a difference, then study size may be contributing to the different sensitivity and specificity results. Heterogeneity of the studies may contribute to the different sensitivity and specificity results. Its significance can be assessed through graphical exploration (Galbraith plots), or metaregression of DOR against relevant study or patient variables.
A fair test shows better than average accuracy, and has an AUC above 0.5. To demonstrate excellent accuracy, the AUC should be in the region of 0.97 or above. An AUC of 0.93 to 0.96 is very good; 0.75 to 0.92 is good. Less than 0.75 can still be reasonable, but the test has obvious deficiencies in its diagnostic accuracy, and is approaching the random test. It is important to remember that the AUC must be interpreted according to the context of the individual analysis and that these guidelines are not absolute.
A major advantage of SROC analysis is that data pooling problems are overcome. It takes a greater effect to shift a curve than to change an average. Outliers are easier to spot on a graph. It can be used for different tests, and the same statistic (AUC, Q, DOR, as appropriate) used to compare their accuracies. Finally, other variables can be used to weight the analysis if clinically indicated.
Limitations of SROC
The SROC modeling has several limitations. When the calculations are weighted, there is a bias towards studies with lower DOR [8], leading to underestimation of accuracy. The variables in the regression for the SROC curve are themselves functions of sensitivity and specificity. They are not independent of each other, and theoretically should not be used to calculate the SROC curve formula without accounting for this interdependence.
The most important drawback of SROC is its assumption that the primary studies are random samples of one large common study, and that differences in results are random error. It does not account for patient variables, study variables, physician experience and training, and institutional characteristics. One solution to this is hierarchical SROC, which accounts for variation both within and between studies [19]. Both threshold and accuracy are included in the model. In practice, it is rarely used because the calculations and interpretations are complex and few software packages include it. Its use may increase as software is developed and general understanding of SROC analysis widens.
Sometimes the sensitivity and specificity will be available for different thresholds within the same study. Depending on the predetermined diagnostic threshold, and amount of literature available, the most appropriate threshold should be chosen for the analysis. With enough literature available, it is possible to perform SROC analysis for different thresholds of the same test. The AUC or Q would be used, where appropriate, to compare the accuracy of the same test for different thresholds. This requires multiple analyses which are often published separately.
The range of commercial software calculating SROC statistics is limited and difficult for nonstatisticians. Freeware software such as Meta-Test [20], developed by Dr Joseph Lau, are commonly used to generate curves and statistics. Important steps forward would include user-friendly software, individual patient data analysis [21], and methods to allow conditional dependence between multiple test results in individuals.
| Conclusion |
|---|
|
|
|---|
There is a lack of understanding by clinicians of the concepts and interpretation of SROC. It is our hope that as SROC becomes more popular and understanding grows, interpretation of SROC analysis will become easier.
| Acknowledgments |
|---|
|
|
|---|
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
P. Cortellazzi, L. Minati, C. Falcone, M. Lamperti, and D. Caldiroli Predictive value of the El-Ganzouri multivariate risk index for difficult tracheal intubation: a comparison of Glidescope(R) videolaryngoscopy and conventional Macintosh laryngoscopy Br. J. Anaesth., December 1, 2007; 99(6): 906 - 911. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Di Mauro, M. Gagliardi, A. L. Iaco, M. Contini, A. Bivona, P. Bosco, S. Gallina, and A. M. Calafiore Does Off-Pump Coronary Surgery Reduce Postoperative Acute Renal Failure? The Importance of Preoperative Renal Function Ann. Thorac. Surg., November 1, 2007; 84(5): 1496 - 1502. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. R Steingart, M. Henry, S. Laal, P. C Hopewell, A. Ramsay, D. Menzies, J. Cunningham, K. Weldingh, and M. Pai A systematic review of commercial serological antibody detection tests for the diagnosis of extrapulmonary tuberculosis Postgrad. Med. J., November 1, 2007; 83(985): 705 - 712. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. R Steingart, M. Henry, S. Laal, P. C Hopewell, A. Ramsay, D. Menzies, J. Cunningham, K. Weldingh, and M. Pai A systematic review of commercial serological antibody detection tests for the diagnosis of extrapulmonary tuberculosis Thorax, October 1, 2007; 62(10): 911 - 918. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. Russo, R. R. Davies, R. A. Sorabella, T. P. Martens, I. George, F. H. Cheema, S. Mital, R. S. Mosca, and J. M. Chen Adult-age donors offer acceptable long-term survival to pediatric heart transplant recipients: An analysis of the United Network of Organ Sharing database. J. Thorac. Cardiovasc. Surg., November 1, 2006; 132(5): 1208 - 1212. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. S. Bryant, R. J. Cerfolio, K. M. Klemm, and B. Ojha Maximum Standard Uptake Value of Mediastinal Lymph Nodes on Integrated FDG-PET-CT Predicts Pathology in Patients with Non-Small Cell Lung Cancer Ann. Thorac. Surg., August 1, 2006; 82(2): 417 - 423. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Lee, L. T. Y. Fan, T. Gin, M. K. Karmakar, and W. D. Ngan Kee A systematic review (meta-analysis) of the accuracy of the mallampati tests to predict the difficult airway. Anesth. Analg., June 1, 2006; 102(6): 1867 - 1878. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. M. Jones, T. Athanasiou, P. P. Tekkis, V. Malinovski, S. Purkayastha, A. Haq, J. Kokotsakis, and A. Darzi Does Doppler echography have a diagnostic role in patency assessment of internal thoracic artery grafts? Eur. J. Cardiothorac. Surg., November 1, 2005; 28(5): 692 - 700. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ANN THORAC SURG | ASIAN CARDIOVASC THORAC ANN | EUR J CARDIOTHORAC SURG |
| J THORAC CARDIOVASC SURG | ICVTS | ALL CTSNet JOURNALS |