|
|
||||||||
1From the Hamilton Glaucoma Center, Department of Ophthalmology, the 2Division of Biostatistics and Bioinformatics, Department of Family and Preventive Medicine, University of California, San Diego, La Jolla, California.
| Abstract |
|---|
|
|
|---|
METHODS. Ninety-nine eyes with repeatable standard automated perimetry results showing glaucomatous damage and 62 normal eyes were included from the longitudinal Diagnostic Innovations in Glaucoma Study (DIGS). The severity of glaucomatous visual field defects ranged from early to severe (average [95% CI] pattern standard deviation [PSD] was 5.7 [5.06.5] dB). The GPS (HRTII ver. 3.0; Heidelberg Engineering, Heidelberg, Germany) utilizes two measures of peripapillary retinal nerve fiber layer shape (horizontal and vertical retinal nerve fiber layer curvature) and three measures of optic nerve head shape (cup depth, rim steepness, and cup size) as input into a relevance vector machine learning classifier that estimates a probability of having glaucoma. The MRA compares measured rim area with predicted rim area adjusted for disc size to categorize eyes as outside normal limits, borderline, or within normal limits. The effect of disc size and severity of disease on the diagnostic accuracy of both GPS and MRA was evaluated using the generalized estimating equation marginal logistic regression analysis.
RESULTS. Using the manufacturers suggested cutoffs for GPS global classification (>64% as outside normal limits), the sensitivity and specificity (95% CI) were 71.7% (62.2%79.7%) and 82.3% (71.0%89.8%), respectively. The sensitivity and specificity (95% CI) of the MRA result were 66.7% (58.0%76.1%) and 88.7% (78.5%94.34%), respectively. Likelihood ratios for regional GPS and MRA results outside normal limits ranged from 4.0 to 10.0, and 6.0 to infinity, respectively. Disc size and severity of disease were significantly associated with the sensitivity of both GPS and MRA.
CONCLUSIONS. GPS tended to have higher sensitivities and somewhat lower specificities and lower likelihood ratios than MRA. These results suggest that in this population, GPS and MRA differentiate between glaucomatous and healthy eyes with good sensitivity and specificity. In addition, the likelihood ratios suggest that GPS may be most useful for confirming a normal disc, whereas MRA may be most helpful in confirming a suspicion of glaucoma. Larger disc size and more severe field loss were associated with improved diagnostic accuracy for both GPS and MRA.
It is well established that the accuracy of tests increases with increasing severity of disease. It is therefore important to provide an estimate of diagnostic precision at various stages of disease. In addition, optic disc size has been shown to influence the diagnostic accuracy of imaging instruments, particularly the confocal scanning laser ophthalmoscope.6 7 8 9 10 11 12 Recently Medeiros et al.8 have used marginal logistic regression methods13 14 15 for simultaneous evaluation of the effect of severity of disease and disc size on the diagnostic accuracy of imaging instruments.
Each imaging instrument has specific advantages and limitations.1 A limitation of a commercially available CSLO, the Heidelberg Retina Tomograph (HRT, Heidelberg Engineering, Heidelberg, Germany) has been its reliance on an operator to outline the disc margin before topographic optic disc parameters can be calculated. The outlining of the disc margin adds processing time, and differences in how it is completed can lead to interobserver variability in stereometric variables.16 17 In addition, many of these topographic optic disc parameters are calculated with a reference plane.
The recently released HRT software version 3.0 includes the Glaucoma Probability Score (GPS). As the GPS calculation is based on the overall shape of the optic nerve head and posterior pole and does not rely on the outlining of the disc margin for its calculation,18 it may be less influenced by optic disc size than are conventional CSLO topographic optic disc parameters and the Moorfields Regression Analysis (MRA). The objective of this study was to compare the GPS with the MRA for discriminating between glaucomatous and healthy eyes and to evaluate the influence of disease severity and optic disc size on the diagnostic accuracy of these two classification systems.
| Methods |
|---|
|
|
|---|
All participants underwent a complete ophthalmic examination including slit lamp biomicroscopy, intraocular pressure measurement, dilated stereoscopic fundus examination, and standard automated perimetry (SAP) using the Swedish Interactive Threshold Algorithm (SITA) and the 24-2 program (Humphrey Field Analyzer; Carl Zeiss Meditec, Inc., Dublin, CA). Visual fields were reliable (fixation losses and false positive and false negative responses <33%). To be included in DIGS, at study entry, all participants had open angles, a best corrected acuity of 20/40 or better, a spherical refraction within ±5.0 D, and cylinder correction within ±3.0 D. A family history of glaucoma was allowed.
Participants were excluded from DIGS if they had a history of intraocular surgery except for uncomplicated cataract or glaucoma surgery. Participants were also excluded if there was evidence of secondary causes of elevated IOP (e.g., iridocyclitis, trauma), other intraocular eye disease, other diseases affecting the visual field (e.g., pituitary lesions, demyelinating diseases, HIV+ or AIDS, or diabetic retinopathy), or medications known to affect visual field sensitivity.
For purposes of this analysis, patients were classified as having glaucoma if they had at least two consecutive standard automated perimetry examinations with either a pattern standard deviation (PSD) outside the 95% normal limits or a glaucoma hemifield test (GHT) result outside the 99% normal limits. At least one of the abnormal fields was obtained within 6 months of CSLO imaging. The appearance of the optic disc was not used as criteria for designation as glaucomatous.
Normal control eyes had IOP <22 mm Hg with no history of elevated IOP and with normal visual field results, defined as a PSD within 95% confidence limits and a GHT result within normal limits (WNL). Normal control eyes also had no evidence of glaucomatous optic disc damage (no diffuse or focal rim thinning, or RNFL defects) as evaluated by clinical examination.
The severity of visual field damage was assessed on a scale of 0 (no field loss) to 20 (end-stage glaucoma), according to the Advanced Glaucoma Intervention Study (AGIS) severity score. The AGIS score is based on the extent of damage measured by the total deviation plot at different visual field locations and has been described in detail elsewhere.19
The research adhered to the tenets of the Declaration of Helsinki. Informed consent was obtained from all participants and the University of California, San Diego, Human Subjects Committee approved all methodology.
Confocal Scanning Laser Ophthalmoscopy
The HRTII provides topographical measures of the optic disc and peripapillary retina and has been discussed in detail elsewhere.1 Three scans centered on the optic disc were automatically obtained for each test eye, and a mean topography was created. Magnification errors were corrected by using patients corneal curvature measurements. The optic disc margin was outlined on the mean topography image by trained technicians while they viewed simultaneous stereoscopic photographs of the optic disc. All images included in the analysis were reviewed for adequate centration, focus, and illumination; all mean topography images had a standard deviation of <50 µm. The scans were obtained with HRT software version 1.5.9.0 or earlier, but were analyzed with the recently released software version 3.0.
HRT software version 3.0 includes improved alignment algorithms, a larger normative database, and the calculation of the GPS.18 Two measures of peripapillary retinal nerve fiber layer shape (horizontal and vertical retinal nerve fiber layer curvature) and three measures of optic nerve head shape (cup depth, rim steepness, and cup size) are used as input into a relevance vector machine learning classifier to estimate the probability of having glaucoma as between 0% and 100%. Two mathematical functions are used to model the topography of the optic nerve head: (1) A Gaussian cumulative distribution function is used to model the optic disc, and (2) a quadratic (parabolic) surface is used to model the peripapillary retina. To parameterize the cup, a parabolic surface is fitted to the peripapillary region of each topograph. As outlined in Swindale et al.,18 the parabolic surface, which serves as a reference plane for estimating the cup parameters, is then subtracted from the topography. The average location of the deepest points in the difference topograph is used to identify the cup center. In contrast to Swindale et al.,18 the GPS constructs a cumulative Gaussian distribution of topograph heights to estimate the cup radius (r) such that p(radius
r) = 0.5. The cup radius r serves as a cup margin. Thus, the cup area is computed as the area of circle of radius r and the mean cup depth is computed as the average height of the measurements inside the cup in the difference topograph. The rim steepness estimates are derived from the radial topographheight gradients. No contour line or reference plane is used in the GPS calculation. GPS output is then automatically classified into three categories: outside normal limits (ONL; GPS > 64%), borderline (BL; GPS between 24% and 64%) and within normal limits (WNL; GPS < 24%).
The MRA compares measured rim area to predicted rim area adjusted for disc size, to categorize eyes as ONL, BL, or WNL.9 It relies on a contour line and the standard reference plane (50 µm below the mean height of the contour in the temporal sector between 350° and 356°) for its measurements. By using the HRT 3.0 software, both the GPS and MRA classify eyes as within normal limits (WNL), borderline (BL), or outside normal limits (ONL), according to the same normative database of 700 eyes of whites and 200 eyes of African-Americans. For this analysis, the white normative database was used because most of the DIGS participants are of European descent. The comparison to the normative database is provided in six regions (superior temporal, inferior temporal, temporal, superior nasal, inferior nasal, and nasal), and as an overall global classification (if any of the six regions are ONL, then the eye is classified as ONL). For analysis using the MRA and GPS as categorical variables (ONL versus WNL), BL values were considered WNL for estimates of the sensitivity, specificity, and likelihood ratio.
In addition to estimating diagnostic accuracy by using the MRA and GPS as categorical variables, we evaluated the sensitivity of each at fixed specificities of 80% and 90%. We converted the MRA into a continuous variable by subtracting the predicted MRA from the actual MRA. This difference is used for determining whether the MRA is ONL. The difference between the predicted and actual MRA for each region was used to estimate sensitivity. For GPS the continuous variable used was the RVM output between 0 and 100.
The area under the receiver operator characteristic curve (AUROC) was calculated for both the three-level categorical variables (WNL, BL, and ONL) and for the MRA and GPS continuous variables MRA (predicted minus actual) and GPS (relevance vector machine output between 0%100%).
Statistical Analysis
The sensitivity, specificity, and likelihood ratios of the MRA and GPS were compared for both global and regional results. The 95% CI for sensitivity and specificity were calculated by the Wilson Score Method without continuity correction for proportions.20 21 Likelihood ratio confidence intervals (CI) were computed with the Simel et al. method,22 and AUROC CIs were calculated using the method of Delong et al.23
We compared the influence of disc size and severity of glaucoma on the diagnostic accuracy of the MRA and GPS as categorical variables (ONL versus not ONL) by using a generalized estimating equation (GEE) marginal logistic regression modeling approach. This method directly compares the effect of covariates on several tests performed on the same group of subjects and adjusts for subject-specific correlation.13 14 15 Sensitivity is the dependent variable in the GEE logistic regression model. The covariates, disc area, AGIS score, and test type (GPS or MRA) were entered into the model as independent variables, and first-order interaction terms were included. An exchangeable correlation structure among observations was assumed. This method has recently been applied to compare the influence of disc size and severity of glaucoma on the sensitivity of the GDxVCC (Carl Zeiss Meditec, Inc.), the HRT (Heidelberg Engineering), and the Stratus OCT (Carl Zeiss Meditec, Inc.) for glaucoma detection.8
In addition, as a confirmatory analysis of the categorical diagnostic findings, we evaluated the sensitivity for detection of glaucoma at two levels of specificity: 80% and 90%. For this analysis, the 80% and 90% cutoffs for specificity were determined by using the healthy control eyes and were applied to the glaucomatous eyes to determine the corresponding sensitivity cutoffs. This allowed sensitivity to be studied at the same level of specificity. GEE marginal logistic regression was then used to evaluate the influence of disc area and severity of glaucoma on diagnostic performance. The dependent variable, sensitivity, was dichotomized to 1 or 0 if the result of the diagnostic test was above or below the sensitivity cutoff.
Statistical analyses were performed with commercial software (JMP ver. 6.0; SAS Institute, Cary, NC) and R (http://www.r-project.org/), version 2.1.1.
| Results |
|---|
|
|
|---|
|
|
|
|
|
6, indicating a moderate to large effect on the posttest probability of glaucoma. The GPS ONL value had a moderate effect on posttest probabilities of glaucoma, with values ranging from 4.04 to 10.02. The likelihood ratios for a WNL result were smaller for GPS than for MRA, with most GPS likelihood ratios having a moderate to large effect on posttest probabilities (range, 0.0140.18), whereas MRA likelihood ratios had an insignificant effect (range, 0.200.59). BL values had an insignificant or weak effect on posttest probabilities, with likelihood ratios for MRA ranging from 1.06 to 3.54 and for GPS ranging from 0.66 to 1.19. Moreover, as shown in Table 4 and Figure 1 , the GEE logistic marginal regression models indicate that for each region, the independent variables AGIS score and disc size influenced the diagnostic accuracy (normal versus ONL) of both GPS and MRA (P < 0.05). For the global parameter, however, only severity of glaucoma (AGIS) was positively associated with increased sensitivity (P = 0.007); disc size did not reach statistical significance (P = 0.081).
The GEE marginal logistic model used was
![]() |
The parameter estimate of the slope suggests that as disc size increases, the sensitivity increases. Similarly, sensitivity increases with the severity of visual field damage. The interaction terms (disc size versus test type and AGIS score versus test type) were not statistically significant, suggesting that disc size and severity of glaucomatous visual field damage does not affect MRA differently than it affects GPS.
In addition to analyzing the GPS and MRA as categorical variables, we modeled the sensitivity of the GPS and MRA for detecting glaucoma at two fixed specificities, 80% and 90%, to ensure that the analysis on GPS and MRA as categorical variables was not driven by the inherent tradeoff between sensitivity and specificity. At a fixed specificity of 90%, both severity of visual field damage, and disc size were significantly associated with GPS and MRA sensitivity for detecting glaucoma (Table 5 , Fig. 2 ). At both levels of fixed specificity, no significant differences were found between the sensitivity of the GPS and MRA. However, at a specificity of 80%, the influence of disc size on sensitivity reached statistical significance in the temporal superior region (P = 0.029), but not in the global (P = 0.093), temporal inferior (P = 0.098), nasal superior (P = 0.141), and nasal inferior (P = 0.158) regions.
|
|
| Discussion |
|---|
|
|
|---|
The finding that GPS sensitivity tends to be higher than that of the MRA are consistent with the sensitivity for detection of early glaucoma reported by Harizman et al.,25 72.3% and 59.6%, respectively. This study also confirms previous reports that the diagnostic accuracy of HRT parameters, particularly linear discriminant functions and the MRA improve with increasing disc size,6 7 8 9 10 11 12 26 probably due to the difficulty in detecting neuroretinal rim loss in a small disc compared with a large disc.8 Most of these studies evaluated the association of diagnostic accuracy with disc sizes using univariate stratified analysis or regression techniques that do not take into consideration the severity of disease. It is likely that disc size and other covariates have more of an effect on diagnostic accuracy in eyes with early compared with severe glaucomatous visual field damage. The advantage of the multivariate technique used in the present study is that it can evaluate and control for the severity of visual field damage and disc size in the same analysis.
To compare more directly the results of the two classification parameters, we also chose to examine the sensitivity at two levels of specificity. At fixed specificities, the sensitivity of GPS and MRA were not significantly different. At 90% specificity, both disc size and severity of visual field damage were associated with the sensitivity of the measurements. However, at 80% specificity, the sensitivity was significantly associated with severity of visual field damage, but was not consistently associated with disc size (P ranged from 0.029 for the temporal superior region to 0.158 for the nasal superior region). It is unclear why disc size was not consistently associated with the sensitivity at 80% specificity.
To complete the analysis at fixed specificity, we incorporated the MRA in the analysis as a continuous variable (actual minus predicted) and compared it directly to the GPS. This comparison of continuous variables utilizes the normative database to calculate the predicted values, and can be considered comparable to the categorical values provided to the clinician on an HRT printout.
There are several ways to describe and summarize the ability of a diagnostic test to detect disease. The most common measures of diagnostic accuracy include sensitivity, specificity, and AUROC curves. The advantages and limitations of these measures have been described recently.8 27 In brief, sensitivity, specificity, and AUROC curve provide important information about the overall diagnostic accuracy of a test and were therefore used to compare the diagnostic accuracy of the MRA and GPS parameters, and to evaluate the effects of glaucoma severity and disc size. However, sensitivity, specificity, and AUROC do not necessarily provide information in a form that is useful to the clinician or patient in clinical decision-making. For example, sensitivity and specificity are related (as sensitivity increases, specificity decreases and vice versa) and depend on the specific cutoffs used to define the disease. For this reason, we included a comparison of the sensitivity of the GPS and MRA at fixed specificities. By definition, at a high fixed specificity of 90%, 10% of normal eyes will be classified as glaucomatous. Unfortunately, it is difficult to apply this information to a specific patient. Similarly, the AUROC curve is important for comparing the diagnostic accuracy of two different diagnostic tests, but has little clinical meaning for making decisions regarding a particular patient. In contrast, the likelihood ratio provides this type of information; it expresses the magnitude by which the probability of a diagnosis in a given patient is modified by the results of the test. In another words, the likelihood ratio indicates how much a given diagnostic test result will raise or lower the pretest probability of the disease in question. We therefore reported the likelihood ratios for the three categorical outputs of the MRA and GPS: ONL, BL, and WNL. We found that an MRA output of ONL had a moderate to large effect on the posttest probability of glaucoma both globally and for each region. A GPS output of ONL had a moderate effect on posttest probability. A GPS output of WNL was much more strongly associated with the probability that the test result was normal than was the same output from the MRA. The results in this study population suggest that GPS provides better information for confirming a normal disc, whereas MRA is most helpful in confirming a suspicion of glaucoma. It should be noted that even small changes in posttest probability may be relevant, depending on other relevant clinical information and the pretest probability of disease.
There are several possible limitations to the present study. First, the subjects with glaucoma were older than the normal subjects, which may lead to an overestimation of the diagnostic accuracy of the methods. However, the main objective of this study was to use the same population to compare the influence of disc size and severity of disease between the MRA and GPS, and since there was no relationship between age and disc size (R2 = 0.006, P = 0.153) and a very weak relationship between age and AGIS score (R2 = 0.017, P = 0.018), it is unlikely that the age difference influenced the comparison. Second, the sample size of this study was modest, leading to relatively wide confidence limits for estimates of AUROC, sensitivity, specificity, and likelihood ratios. Larger studies are needed to provide more precise estimates of the diagnostic accuracy. Finally, in our limited population, disc area was larger in the glaucomatous eyes than in the normal eyes, which may affect how disc size influenced the diagnostic accuracy of the MRA and GPS. However, the analysis of sensitivity at a fixed specificity using glaucomatous eyes only confirmed the relationship of better sensitivity with increasing disc size for both MRA and GPS. Other investigators also have found larger discs in subjects with glaucoma than in normal control subjects.26 A likely explanation for this difference in disc size is sampling bias related to glaucomas being more difficult to detect in small optic discs.8 If this is the case, then small discs are likely to be underrepresented and large discs overrepresented among patients with diagnosed glaucoma in glaucoma specialty clinics.26
In conclusion, GPS can differentiate between glaucomatous and healthy eyes with relatively good sensitivity and specificity. Using the manufacturers suggested cutoff values for classification as ONL, the GPS tended to have higher sensitivities and lower specificities and likelihood ratios than did the MRA. Disc size influences the diagnostic accuracy of both the GPS and MRA.
| Footnotes |
|---|
Submitted for publication November 1, 2006; revised January 24, 2007; accepted April 4, 2007.
Disclosure: L.M. Zangwill, Carl Zeiss Meditec (F), Heidelberg Engineering (F); S. Jain, None; L. Racette, None; K.B. Ernstrom, None; C. Bowd, None; F.A. Medeiros, Carl Zeiss Meditec (F); P.A. Sample, Carl Zeiss Meditec (F), Welch-Allyn (F), Haag Streit (F); R.N. Weinreb, Carl Zeiss Meditec (F, R), Heidelberg Engineering (F, R)
The publication costs of this article were defrayed in part by page charge payment. This article must therefore be marked "advertisement" in accordance with 18 U.S.C.
1734 solely to indicate this fact.
Corresponding author: Linda M. Zangwill, Hamilton Glaucoma Center, Department of Ophthalmology 0946, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0946; zangwill{at}glaucoma.ucsd.edu.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
K A Townsend, G Wollstein, D Danks, K R Sung, H Ishikawa, L Kagemann, M L Gabriele, and J S Schuman Heidelberg Retina Tomograph 3 machine learning classifiers for glaucoma detection Br. J. Ophthalmol., June 1, 2008; 92(6): 814 - 818. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |