|
|
||||||||
1From the Manchester Royal Eye Hospital, Manchester, United Kingdom; and the 2Research Group for Eye and Vision Sciences, University of Manchester, Manchester, United Kingdom.
| Abstract |
|---|
|
|
|---|
METHODS. HRT images were obtained from one eye of 121 patients with glaucoma (median age, 70.2 years; median mean deviation [MD], 3.6 dB, range, +2.0 to 9.9 dB) and 95 healthy control subjects (median age, 59.7 years; median MD 0.1 dB, range +2.5 to 3.7). The diagnostic performances of GPS and MRA were evaluated by including borderline classifications, either as test negatives (most specific criteria) or as test positives (least specific criteria). Agreement between global and sectoral data of both analyses was established. Logistic regression analyses were performed to evaluate the effect of covariates such as optic disc size and age on the classification outcomes of both the GPS and the MRA.
RESULTS. In 8 (7%) patients with glaucoma and 10 (11%) control subjects, the GPS failed to provide a complete global and sectoral optic disc classification. Although we could not identify a single distinct cause of this failure in the glaucoma group, failures in the control subjects occurred most often (7/10) with small and crowded optic discs. In subjects who were successfully classified at least globally by the GPS (117 patients with glaucoma, 88 control subjects), the diagnostic performances of GPS and MRA were similar (areas under the receiver operating characteristic [ROC] curve of 0.78 and 0.77, respectively; P > 0.1). With the GPS, sensitivity and specificity were 59% and 91% (most specific criteria) and 78% and 63% (least specific criteria), respectively. Combining GPS and MRA did not increase diagnostic performance significantly (ROC area of combined classifiers, 0.81). Both GPS and MRA were affected by disc size. In patients with glaucoma as well as healthy control subjects, the odds of a positive GPS classification (borderline or outside normal limits) increased by 21% (95% confidence interval [CI], 12%30%) for each 0.1 mm2 increase in optic disc area. With the MRA, the corresponding increase was 15% (95% CI, 7%23%). Optic disc area alone accounted for approximately 30% and 22% of the explained variance with the GPS and MRA, respectively (P < 0.001). The proportional-odds logistic regression confirmed that optic disc size affected mainly the tradeoff between true- and false-positive classifications (criterion) rather than the absolute performance of the analyses (area under the ROC curve). There was some evidence of an age effect with the MRA, which showed a 53% (95% CI, 16%102%) increase in the odds of a positive test (borderline or outside normal limits) associated with each decade of age (P = 0.002), but no age effects were observed with the GPS (P > 0.1).
CONCLUSIONS. The diagnostic performance of the contour lineindependent GPS analysis is similar to that of the MRA. However, clinicians should be aware of the strong size dependence of both GPS and MRA. In large optic discs, both GPS and MRA are likely to produce many false-positive classifications. Correspondingly, the sensitivity to early damage is likely to be low in small optic discs. There is a need for automated classification systems that explicitly address the size dependence of current analyses.
The Heidelberg Retina Tomograph (HRT; Heidelberg Engineering GmbH, Dossenheim, Germany) is a confocal scanning laser ophthalmoscope that acquires three-dimensional topography images of the optic disc and the surrounding retina. Until recently, all diagnostic analyses of the HRT depended on the position of a manually placed contour line to outline the area of the optic disc. However, contour lines drawn by different observers vary, sometimes considerably.
A fully automated diagnostic decision-support system that does not rely on a manually drawn contour line has recently been incorporated into the software of the HRT. This analysis, referred to as the glaucoma probability score (GPS), is based on an innovative technique proposed by Swindale et al.7 who fit a surface over the area of the optic disc and parapapillary retina. In the implementation incorporated in the commercial HRT software, a Bayesian machine-learning classifier then compares the parameters of the fitted surface to those obtained in healthy and glaucomatous optic discs and derives a numerical index for the likelihood of damage.
In this study, we applied the GPS analysis to an independent sample of healthy and glaucomatous eyes from the Manchester Imaging Study. We estimated the diagnostic accuracy of the technique and compared it with that of the MRA. To establish how these analyses are affected by optic disc size, age, and image quality, we performed logistic regression analyses to determine the effect of these covariates on the classifications.
| Methods |
|---|
|
|
|---|
The patients with glaucoma were consecutively recruited from the clinics at the MREH. Inclusion criteria were a clinical diagnosis of open-angle glaucoma, age >40 years, refractive error within ±5.00 D equivalent sphere and ±3.00 D astigmatism, best corrected visual acuity (VA) of 6/18 (+0.5 logMAR; logarithm of the minimum angle of resolution), and a repeatably detectable visual field defect. Visual fields were examined using program 24-2 of the Humphrey Field Analyzer (HFA; Carl Zeiss Meditec, Dublin, CA) with the fullthreshold strategy, and a defect was defined as a glaucoma hemifield test (GHT) result outside normal limits and/or a corrected pattern standard deviation (CPSD) significantly elevated beyond the 5% level. For the analyses reported herein, eyes with mean deviation [MD] worse than 10 dB were excluded. If both eyes were eligible, one eye was randomly selected as the study eye. Of those patients who participated in the longitudinal arm of the study, we selected the image in which the mean pixel height standard deviation (MPHSD), a measure of image quality, was closest to the median value observed during the entire follow-up; thus, we analyzed the most representative image of the available series. Three patients and three healthy control subjects were removed from the analysis because their median MPHSD exceeded 50 µm, the recommended cutoff for image quality (HRT user manual; Heidelberg Engineering GmbH). Normal control subjects were recruited from patients spouses and by advertising through leaflets and posters distributed in local medical centers, universities, and other communal areas. Inclusion criteria were age >40 years, intraocular pressure below 22 mm Hg, refractive error within ±5.00 D equivalent sphere and ± 3.00 D astigmatism, best corrected VA of 6/18 (+0.5 logMAR), and normal findings in a visual field examination (HFA 24-2 full-threshold test, both CPSD >10% and GHT results within normal limits).
Data Collection
All participants were imaged with the HRT1 (Heidelberg Engineering GmbH). In each eye, five individual 10° scans were obtained. If the pupil diameter was smaller than 3 mm or if image quality was thought to be affected by media opacity, the pupil was dilated with 0.5% tropicamide. The three best images, judged by the examiner, were used to generate a mean topographic image that was subsequently recalculated and converted to HRT3 format by version 3.0.2 of the HRT viewer module. Contour lines were placed on the margin of the optic disc by experienced users, according to the instructions provided on the HRT Web site (www.heidelbergengineering.com). All contour lines were reviewed by the authors.
Glaucoma Probability Score
The principles of the analyses underlying the GPS have been described by Swindale et al.7 Briefly, a geometric model is used to approximate the optic disc topography with a three-dimensional surface, described by five parameters of optic disc and peripapillary retinal shape. With a standard nonlinear least-squares fitting technique, these parameters are adapted to the individual topography globally as well as in six separate sectors of the optic disc (Fig. 1) . The obtained parameters are then interpreted by a relevance vector machine, a state-of-the-art machine learning classifier operating on Bayesian principles,9 which provides a numerical index ranging from 0 (low probability of disease) to 1 (high probability of disease), to describe the estimated probability of finding similar data in the glaucoma group of the training data. Although details of the training data are not in the public domain, the score can be interpreted as an ordinal index of abnormality. Accordingly, sectors with a value of >0.28 or >0.64 (Volz D, Heidelberg Engineering GmbH, personal communication, November 2005) are classified as borderline or outside normal limits and flagged with yellow exclamation marks or red crosses. The overall outcome of the GPS analysis is determined by the sector with the highest probability score (worst result of global and sectoral analysis).
|
Analysis
Both GPS and MRA have a borderline category for optic discs not clearly identified as either within or outside normal limits. Clinically, this graded approach may be more useful than any somewhat arbitrary division into healthy and diseased categories necessary for the calculation of sensitivity and specificity. We therefore evaluated diagnostic performance in two separate ways, first by considering borderline cases as test negatives (most specific but least sensitive criterion) and second by considering borderline cases as test positives (least specific but most sensitive criterion). Similarly, the agreement between the overall classification of the GPS analysis and the MRA was analyzed by a contingency table with three categories (within normal limits, borderline, and outside normal limits).
To evaluate differences in the topographical information provided by the GPS and MRA, we compared the classifications in each of the six optic disc sectors. This analysis was performed separately in patients with glaucoma and healthy control subjects. A combined classification of GPS and MRA was derived to assess whether such a combination improves diagnostic performance over that achieved with either GPS or MRA alone.
Several groups have reported a difference in sensitivity and specificity of optic disc assessment in small versus large optic discs.12 13 Because we wanted to evaluate such effects quantitatively, while accounting for other covariates, we performed proportional odds logistic regression (POLR).14 15 Similar to the more familiar binary logistic regression, POLR models the effect of several covariates, providing odds ratios and measures of relative importance. Whereas standard logistic regression models binary outcomes, POLR models ordinal outcomes (such as the classification as within normal limits, borderline, and outside normal limits in the present analysis). In preliminary analyses, separate logistic regression models were fitted to establish the effect of visual field loss, measured by the MD and PSD global indices, in the patients with glaucoma and the healthy control group, and to confirm that the assumptions of a combined POLR model were met. To obtain the final POLR model, disease status (glaucoma or control) was entered as a factor alongside the continuous explanatory variables (optic disc size, age, and image quality). Stepwise backward analyses were performed manually to remove those covariates that did not appear to contribute significantly to the model (likelihood ratio
2-test, P > 0.1). Nagelkerke R2 values were used to express the proportion of variance explained by the model. The predicted probabilities of a borderline and outside normal limits classification in patients with glaucoma and healthy control subjects with various optic disc sizes were combined into a "pseudoROC" curve for comparison with the empiric data. All analyses were performed in the open-source statistical environment R.15 16 17 Anonymized data and code for this analysis are available from the corresponding author on request.
| Results |
|---|
|
|
|---|
|
|
|
|
|
Topographical Differences
Whereas, with the MRA, the inferior sectors of the optic disc were more often classified as borderline and outside normal limits compared with superior sectors, topographical differences were much less apparent with the GPS (Fig. 4) . To compare the association between each of the six sectors, we established the Cramer V statistic19 with the GPS and the MRA in both groups of subjects (Fig 5) . The Cramer V statistic expresses the association between two categorical variables as a proportion relative to their largest possible variation; a value of 0 means no association, and 1 stands for a perfect association. In both groups of subjects, GPS classifications of individual disc sectors were closely associated with each other (Fig. 5) . For example, the association between the temporal and nasal sectors of glaucomatous optic discs was 0.95 with the GPS, whereas the corresponding value with the MRA was 0.47. In the glaucomatous eyes, the associations between sectors classified with the GPS ranged from 0.86 to 0.99, whereas those of the MRA ranged from 0.26 to 0.57. A high between-sector association with the GPS was also observed in the healthy control eyes.
|
|
|
2). To illustrate the effects of disc size on the discrimination between patients with glaucoma and healthy control subjects and to assess graphically the fit of the models to the empiric data, we derived the estimated probabilities for positive classifications (borderline, outside normal limits) in patients with glaucoma and healthy control subjects, with optic disc sizes ranging from 1.0 to 3.0 mm2, at an mean age of 65 years (Figs. 6a 6b 6c 6d) . A "pseudoROC" curve was then constructed from the predicted probabilities in patients with glaucoma (sensitivity) and healthy control subjects (false-positive rate) at the given range of optic disc sizes (Figs. 6e 6f) . To compare the predictions of the model with the empiric data, the entire study sample was arbitrarily split into six approximately equal-sized groups according to optic disc size, and sensitivity and specificity estimates were derived within each of those subgroups (Figs. 6e 6f) .
|
| Discussion |
|---|
|
|
|---|
GPS and MRA had similar diagnostic performances and agreed in most eyes (complete agreement in
70% of eyes and at least partial agreement in
95% of cases). When the large proportion of borderline classifications (19% in the glaucomatous eyes, 28% in the control eyes) were included as test negatives, the GPS analysis performed with specificity close to 90% and sensitivity of just below 60%. When borderline classifications were included as test positives, sensitivity increased to
80%, whereas specificity dropped to
60%. With respect to the MRA, our findings were similar to those reported by Ford et al.13 in a different dataset in which the glaucomatous eyes had a similar degree of glaucomatous visual field damage (median MD, 4.8 dB). Combining the two classifiers did not produce a meaningful gain in performance (i.e., the two analyses did not appear to provide complementary information).
Similar to the MRA, the GPS attempts to localize damage to six sectors of the optic disc. We therefore compared the frequency with which borderline or outside-normal-limits classifications are made with GPS and MRA in each of the six sectors. Whereas with the MRA the nasalinferior and temporalinferior optic disc sectors of the glaucomatous eyes were classified as borderline or outside normal limits more often than other sectors, no distinct spatial predominance was seen with the GPS. In fact, the GPS classifications within the individual disc sectors appeared strongly associated with each other, adding little if any information to that already available from the global classification.
GPS models the shape of the optic disc on a simple geometric model from which it extracts several shape parameters.7 The simplicity of its underlying model is an attractive feature of the analysis; each of the parameters has a straightforward morphologic interpretation. However, in a sizable proportion of eyes in our study, both glaucomatous (n= 4, 3%) and healthy (n= 7, 7%), the GPS algorithm did not find a satisfactory surface fit compatible with the optic disc topography and therefore failed to provide a classification. In the glaucoma group we did not find a single distinct cause for failure, but most of the unclassified discs in the healthy control group were small and crowded (Table 2) . The Manchester Glaucoma Imaging Study,8 from which our data were drawn, had excluded participants with refractive error outside the range of ±5.00 D equivalent sphere and ±3.00 D astigmatism. It is likely that subjects with high axial hyperopia and myopia have a relatively larger prevalence of optic discs outside the spectrum of typical appearances.22 23 In these subjects, automated (GPS) or semiautomated (MRA) analyses are likely to be less reliable. If, as we believe, decision-support systems are particularly important in those cases that pose a challenge to clinicians, the performance of automated systems in atypical optic discs (for example in those with particularly small or large size) has not as yet been satisfactorily addressed by research.
The GPS derives its numerical index by a relevance vector machine, a state-of-the-art machine learning classifier potentially capable of solving complex classification problems.9 Machine learning classifiers have previously been applied, for example by Bowd et al.24 (who trained support vector machines to discriminate between healthy and diseased eyes based on the conventional stereometric parameters), and by Hothorn and Lausen25 who used classification trees to discriminate between optic disc surfaces in patients with glaucoma and healthy control subjects imaged with the HRT. Machine learning tools may be more robust and flexible compared with the traditional (and more familiar) statistical analyses whose validity may stand or fall with assumptions that may not be met in clinical populations. The MRA, for example, performs statistical hypothesis testing by comparing the measured rim area with that expected in an age-matched healthy subject with similar disc size, but there is evidence that at least three assumptions (linearity of the relationship between log rim and disc area, equality of variance across the entire range of values, Gaussian properties of the underlying distributions) of this analysis may not be met in clinical data. One of the potential advantages of the particular Bayesian classifier applied in the GPS is that it provides a probabilistic interpretation of optic disc status, rather than a simple binary classification into normal or abnormal.9 However, in the absence of published details on the training sample, the precise statistical interpretation of the GPS, beyond that of an ordinal index of optic disc abnormality, remains somewhat unclear.
The most striking, though not unexpected, finding of our study is the large dependence of both GPS and MRA on optic disc size. Both GPS and MRA showed poor sensitivity to damage in small optic discs and poor specificity in large optic discs (Fig 6) . These findings mirror those previously reported from others with the MRA13 26 and are similar to those reported from expert observers.27 28 A classification as borderline or outside normal limits in a small optic disc is much more likely to be a true finding than is the same result in a large optic disc (which is relatively more likely to be a false positive). With increasing optic disc size, the probability of a borderline or outside-normal-limits result increases in both the patients with glaucoma and the healthy control subjects, to a similar extent. Consequently, optic disc size primarily affects the criterion (i.e., the threshold used to discriminate between healthy and glaucomatous discs), rather than the absolute accuracy of the analyses (Fig. 6) . We believe that the most likely explanation for the apparent criterion shift of the GPS is a sampling bias in the training data. If early damage is difficult to detect in small optic discs by ophthalmoscopy, such discs are likely to be relatively underrepresented among the patients with glaucoma attending secondary centers. Similarly, if damage is more readily detectable in large discs, eyes with large discs may be relatively overrepresented. A more representative sample of glaucomatous optic disc damage could be drawn from subjects exhibiting glaucomatous visual field damage in a screening study, so that disc-related features do not influence selection. However, the number of subjects necessary to obtain a sufficiently large sample makes this approach impractical. Sampling biases in the source population are likely to persist in study populations despite strict inclusion and exclusion criteria. The small but statistically significant difference in optic disc size between patients with glaucoma and healthy control subjects seen in our sample is compatible with this explanation. Although we believe that sampling biases provide the most likely, as well as the most parsimonious, explanation for the size dependence of both GPS and MRA, at least two alternative explanations should be considered. First, larger discs may be at a greater risk of the structural damage characteristic of glaucoma.29 Second, glaucomatous damage itself may lead to expansion of the scleral canal30 and therefore be responsible for the slightly larger optic disc size of the patients with glaucoma.
Independent of whether any disc size differences between patients with glaucoma and healthy control subjects are due to sampling bias, our data provide evidence for a large size-dependent criterion shift, with both the GPS and the MRA. Since even experts may find it more difficult to ascertain glaucomatous changes in small optic discs, and more difficult to rule out glaucomatous changes in a large optic disc, this issue may present a considerable problem in clinical practice. Ideally, an objective analysis would complement the clinicians judgment and would be particularly valuable in those cases that pose a known challenge to subjective evaluation (for example particularly large or small discs). If our findings are confirmed by other centers, it would be worthwhile (and probably not difficult) to remove any size-related criterion shifts in the analysis.
An unexpected finding, to our knowledge not previously reported, was the apparent age dependence of the MRA results. Our data suggest that, with each decade of increasing age, the odds of a borderline or outside-normal-limits result with the MRA increase by roughly 50%, in patients with glaucoma and in healthy control subjects. Although the CIs around the odds ratio were large (16%102%), age accounted for nearly 12% of the explained variance in the data and was confirmed as a significant predictor, even when glaucomatous and control eyes were evaluated in separate logistic regressions. Because the MRA explicitly accounts for the subjects age in calculating the expected neuroretinal rim area,10 we can only speculate that the age-related decline in rim area factored into the MRA is smaller than that observed in our sample. Because the age coefficient of the MRA was estimated by linear regression, it may have been attenuated by the large biological variation in neuroretinal rim area between subjects. It is important for our findings to be confirmed by other centers.
Finally, our findings underscore the importance of estimating covariate effects when evaluating the performance of diagnostic tests. Although the authors of the STARD initiative (Standards for the Reporting of Diagnostic Accuracy) have called for more complete reporting of relevant covariates (e.g., measures of disease severity) that may influence diagnostic performance,31 reporting of such details has been incomplete in many previous studies on the HRT.32 Because relevant covariates often vary between study samples, the quantitative modeling of their effects both on diagnostic performance33 and on the tradeoff between sensitivity and specificity, may make it easier to compare findings between different centers. Moreover, covariate modeling may occasionally reveal important features of the tests that may otherwise not have been apparent.
| Acknowledgements |
|---|
| Footnotes |
|---|
Supported by Nova Scotia Health Research Foundation Project Grant Med-727 (PHA) and National Health Service Research and Development Grant 95/18/04 (DBH).
Submitted for publication May 29, 2006; revised July 28 and August 16, 2006; accepted October 4, 2006.
Disclosure: A. Coops, None; D.B. Henson, None; A.J. Kwartz, None; P.H. Artes, None
The publication costs of this article were defrayed in part by page charge payment. This article must therefore be marked "advertisement" in accordance with 18 U.S.C.
1734 solely to indicate this fact.
Corresponding author: Paul Habib Artes, Faculty of Life Sciences, University of Manchester, Office C32, Moffat Building, North Campus, PO Box 188, M60 1QD, UK; paul_h_artes{at}yahoo.co.uk.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
B. C. Chauhan, D. M. Hutchison, P. H. Artes, J. Caprioli, J. B. Jonas, R. P. LeBlanc, and M. T. Nicolela Optic Disc Progression in Glaucoma: Comparison of Confocal Scanning Laser Tomography to Optic Disc Photographs in a Prospective Study Invest. Ophthalmol. Vis. Sci., April 1, 2009; 50(4): 1682 - 1691. [Abstract] [Full Text] [PDF] |
||||
![]() |
K A Townsend, G Wollstein, D Danks, K R Sung, H Ishikawa, L Kagemann, M L Gabriele, and J S Schuman Heidelberg Retina Tomograph 3 machine learning classifiers for glaucoma detection Br. J. Ophthalmol., June 1, 2008; 92(6): 814 - 818. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. M. Alencar, C. Bowd, R. N. Weinreb, L. M. Zangwill, P. A. Sample, and F. A. Medeiros Comparison of HRT-3 Glaucoma Probability Score and Subjective Stereophotograph Assessment for Prediction of Progression in Glaucoma Invest. Ophthalmol. Vis. Sci., May 1, 2008; 49(5): 1898 - 1906. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. M. Zangwill, S. Jain, L. Racette, K. B. Ernstrom, C. Bowd, F. A. Medeiros, P. A. Sample, and R. N. Weinreb The Effect of Disc Size and Severity of Disease on the Diagnostic Accuracy of the Heidelberg Retina Tomograph Glaucoma Probability Score Invest. Ophthalmol. Vis. Sci., June 1, 2007; 48(6): 2653 - 2660. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |