|
|
||||||||
From the Hamilton Glaucoma Center and Department of Ophthalmology, University of California, San Diego, California.
| Abstract |
|---|
|
|
|---|
METHODS. Analysis 1 included 67 eyes with glaucomatous visual field loss and 56 eyes of normal volunteers. Estimates of diagnostic accuracy in this analysis were compared to those obtained from analysis 2, which included a cohort of patients with suspected glaucoma, but without visual field loss at the time of CSLO imaging. For analysis 2, 40 eyes with progressive glaucomatous optic disc change were included in the glaucoma group and 43 eyes without any evidence of progressive damage to the optic nerve that were observed untreated for an average time of 9.01 ± 3.09 years were included in the normal group. Areas under the receiver operating characteristic (ROC) curves (AUC) were used to evaluate diagnostic accuracy of CSLO parameters.
RESULTS. There was a statistically significant difference between the performance of the parameter with largest AUC, discriminant function Bathija, in analysis 1 (AUC = 0.91) compared with its performance in analysis 2 (AUC = 0.71; P = 0.002). For the contour-lineindependent parameter glaucoma probability score, a statistically significant difference was also observed in the performance obtained in analysis 1 (AUC = 0.89) compared with analysis 2 (AUC = 0.65; P < 0.001).
CONCLUSIONS. Estimates of diagnostic accuracy of CSLO in glaucoma can be largely different depending on the population studied and the reference standard used to define disease. Diagnostic accuracy estimates obtained from casecontrol studies including well-defined groups of subjects with or without disease may not be applicable to the clinically relevant population.
Spectrum bias is a general term used to indicate differences between the type of patients in the clinically relevant population and the study population.2 3 4 Diagnostic accuracy can be biased or overestimated if a test is evaluated in a group of patients already known to have the disease and a separate group of normal subjects, rather than in a relevant clinical population. The true value of a diagnostic test is established only in a study that closely resembles clinical practice, and a diagnostic test is useful only to the extent that it distinguishes between conditions that might otherwise be confused. In fact, the purpose of a diagnostic test is to assist in the diagnosis of patients with suspected diseasethat is, patients without any obvious finding that could promptly confirm or exclude the diagnosis, such as repeatable abnormal visual fields.
In the design of any clinical diagnostic study, a fundamental requirement is to use a proper reference standard to which the diagnostic test is to be compared.5 There is, however, a limitation in designing clinically relevant diagnostic studies in glaucoma, which is the absence of a perfect reference standard for the disease. For example, in studies with cross-sectional design, it is clearly difficult to establish whether a patient with suspect optic disc appearance, but no visual field loss, has glaucoma. This occurs due to the wide variability of the optic nerve appearance in the normal population, which makes a single optic disc examination frequently nondiagnostic in the early stages of disease. This limitation, however, can be overcome by the use of longitudinal information regarding the appearance of the optic nerve.6 That is, the presence of documented evidence of a history of progressive glaucomatous damage to the optic nerve can confirm the diagnosis of glaucoma in a patient who presents with a suspect optic disc appearance in a cross-sectional evaluation. Similarly, the absence of any evidence of progressive glaucomatous damage in a patient with suspect optic disc appearance who is observed without treatment for a sufficiently long period provides confidence that this patient has only a normal variation of optic disc morphology, but no glaucomatous damage.
The purpose of the present study was to evaluate the effects of study design and spectrum bias on the evaluation of the diagnostic accuracy of one of the imaging technologies, CSLO, in glaucoma. In one analysis, we applied a casecontrol design including patients with glaucomatous visual field loss versus healthy subjects. Estimates of diagnostic accuracy in this analysis were compared to those obtained from an analysis of a cohort of patients with suspected glaucoma in which the reference standard was based on history of documented progressive glaucomatous optic nerve head damage.
| Methods |
|---|
|
|
|---|
Each subject underwent a comprehensive ophthalmic examination, including review of medical history, best corrected visual acuity, slit-lamp biomicroscopy, intraocular pressure (IOP) measurement with Goldmann applanation tonometry, gonioscopy, dilated funduscopic examination with a 78-D lens, stereoscopic optic disc photography, and standard automated perimetry (SAP) with 24-2 Swedish Interactive Threshold Algorithm (SITA; Carl Zeiss Meditec, Inc., Dublin, CA). To be included, subjects had to have best corrected visual acuity of 20/40 or better, spherical refraction within ±5.0 D, cylinder correction within ±3.0 D, and open angles on gonioscopy. Eyes with coexisting retinal disease, uveitis, or nonglaucomatous optic neuropathy were also excluded from the investigation.
Instrumentation
The HRT II (software version 3.0, Heidelberg Engineering, Dossenheim, Germany) was used to acquire CSLO images in the study. It uses confocal scanning laser principles to obtain a three-dimensional (3-D) topographic image of the optic nerve. Its principles of working have been described in detail elsewhere.7 For each patient, three topographical images were obtained and were combined and automatically aligned to make a single mean topography used for analysis. Magnification errors were corrected using patients corneal curvature measurements. An experienced examiner outlined the optic disc margin on the mean topographic image while viewing stereoscopic photographs of the optic disc. Good quality images required a focused reflectance image with a standard deviation not greater than 50 µm.
Topographical parameters included with the HRT software and investigated in this study were disc area, cup area, rim area, cup/disc area ratio, rim/disc area ratio, cup volume, rim volume, mean cup depth, maximum cup depth, mean height contour, height variation contour, cup shape measure, mean RNFL thickness, RNFL cross-sectional area, linear cup/disc ratio, and two linear discriminant functions (LDFs), from Mikelberg et al.8 (LDF Mikelberg) and Bathija et al.9 (LDF Bathija). The software on HRT II also incorporates the Moorfields regression analysis (MRA) which is a comparison of the subjects rim area to a predicted rim area for a given disc area and age, based on confidence limits of a regression analysis derived from healthy subjects included in the instruments normative database.10 This database contains information from 733 eyes of white subjects, 215 eyes of African American subjects, and 104 eyes from Indian subjects. These subjects were selected based on the presence of normal IOP (<23 mm Hg), normal visual fields, no family history of glaucoma, and no history of ocular disease (Mike Sinai, PhD, Heidelberg Engineering, written communication, May 2006). Each sector is classified as within normal limits (WNL) if the measurement falls within 95% CI, borderline (BL) if the measurement falls between the 95% to 99.9% CI, and outside normal limits (ONL) if the measurement falls below the 99.9% CI. The MRA also provides results for the global rim area (MRA global) and a final classification (MRA classification). A normal MRA classification requires the MRA analysis of all sectors and the global rim area to be within normal limits. A borderline MRA classification occurs when at least one of the sectors or the global rim area is borderline, and an outside normal limits result occurs when at least one sector or the global rim area is outside normal limits.
The HRT 3.0 software utilizes a new manufacturer-developed automated analysis for the detection of glaucomatous damage called glaucoma probability score (GPS) which is independent of the contour line traced by the examiner around the optic disc margin.11 It is based on a 3-D model of the entire topographical image, including the optic disc and surrounding peripapillary retinal nerve fiber layer. Five shape-based parameters are used in the model to characterize the shape of the optic disc and RNFL. Three parameters are used to characterize the optic disc: cup size (width), cup depth (depth), and rim steepness (slope). Two parameters are used to characterize the RNFL: the vertical RNFL curvature (superior to inferior curvature) and the horizontal RNFL curvature (nasal to temporal curvature). A 3-D model incorporating information from the five parameters described is then constructed for the optic disc being examined. The values of the parameters are then fed into a machine learning classifier analysis called a relevance vector machine (RVM), which compares a patients results to previously defined healthy and glaucomatous models. According to the manufacturer, the final GPS result is the probability or likelihood that the scan has structural characteristics that are consistent with glaucoma.
Analysis 1
For this analysis, eyes were classified as glaucomatous if they had repeatable (at least two consecutive) abnormal visual field test results, defined as a PSD outside of the 95% normal confidence limits and/or a Glaucoma Hemifield Test result outside normal limits, regardless of the appearance of the optic disc.
Normal subjects were recruited from the staff and employees of the University of California San Diego and from the general population through advertisement. They were selected so that the age range was similar to that of subjects with glaucoma (4089 years). They were required to have an intraocular pressure of 21 mm Hg or less with no history of increased IOP, a healthy appearance of the optic disc and RNFL (no diffuse or focal rim thinning, optic disc hemorrhage, or RNFL defects), as evaluated by clinical examination, and a normal visual field result. A normal visual field was defined as a mean deviation (MD) and PSD within the 95% confidence limits, and a glaucoma hemifield test (GHT) result within normal limits.
All patients had visual field and imaging tests within 6 months. When both eyes of the same patient fulfilled the inclusion criteria, one eye was randomly selected for inclusion in the analysis.
A total of 67 patients with glaucomatous visual field loss at the time of HRT imaging and 56 healthy subjects were included in analysis 1. The mean ± SD age of patients with glaucoma and normal subjects in analysis 1 was 65 ± 10 years and 62 ± 8.1 years, respectively (P = 0.10). The median (first quartile, third quartile) of MD of the visual field test closest to the imaging date in glaucomatous and healthy eyes was 4.91 dB (9.92 dB, 2.81 dB) and 0.65 dB (1.35 dB, 0.13 dB), respectively (P < 0.001).
Figure 1 illustrates typical subjects included in analysis 1 in the glaucoma and normal groups.
|
These patients were then classified based on history of documented evidence of progressive glaucomatous change in the appearance of the optic disc that occurred before the HRT imaging date. Patients with documented evidence of progressive glaucomatous nerve damage at any time before the HRT imaging date were considered to have glaucoma. Progressive glaucomatous change in the appearance of the optic disc was assessed by simultaneous stereoscopic optic disc photographs (TRC-SS; Topcon Instrument Corp of America, Paramus, NJ). Stereoscopic sets of slides were examined by using a stereoscopic viewer (Asahi Optical Co.-Pentax, Tokyo, Japan). The photographs were evaluated by two experienced graders (FAM, CB), and each was masked to the subjects identity and to the other test results. For inclusion, photographs had to be graded of adequate quality or better. For each patient, the most recent stereophotograph was compared to the oldest available one, to maximize the chance of detecting progressive optic disc change. Each observer was masked to the temporal sequence of the photographs. Definition of change was based on focal or diffuse thinning of the neuroretinal rim, increased excavation, or enlargement of RNFL defects. Changes in rim color or the presence of disc hemorrhage or progressive parapapillary atrophy were not sufficient for characterization of progression. Discrepancies between the two graders were resolved either by consensus or by adjudication of a third experienced grader. Initial agreement between graders was obtained in 88% of the cases (93% of agreement for judgments of no progression and 83% of agreement for judgments of progression). When both eyes of the same patient showed progressive optic disc changes and met the inclusion criteria, one eye was randomly selected for inclusion in the study.
A total of 40 eyes with progressive glaucomatous optic disc change and no visual field loss before the HRT imaging date were included in the glaucoma group in analysis 2. These patients were observed for an average of 8.21 ± 3.26 years.
Patients without evidence of progressive change in the appearance of the optic disc or visual field loss in both eyes, observed without any history of intraocular pressurelowering treatment, were considered to be normal. One eye of each subject was randomized for analysis. A total of 43 eyes of 43 subjects were included in the normal group in analysis 2. These subjects were observed without treatment for an average of 9.01 ± 3.09 years, without showing any evidence of progressive damage to the optic nerve, providing reasonable confidence that they had only suspected disease, but no glaucomatous damage.
The mean ± SD ages of glaucoma and normal subjects in analysis 2 were 66.1 ± 12.8 years and 62.7 ± 12.9 years (P = 0.23). The median (first quartile, third quartile) of the MD of the visual field closest to the imaging date was 1.28 dB (2.79 to 0.09 dB) and 0.25 dB (1.05 to 0.32 dB), respectively (P = 0.004).
Figure 2 illustrates typical subjects included in analysis 2 in the glaucoma and normal group. Table 1 summarizes the differences between subjects included in both analyses.
|
|
Receiver operating characteristic (ROC) curves were used to describe the ability of CSLO to differentiate glaucoma from normal subjects in each one of the two analyses. The ROC curve shows the tradeoff between sensitivity and 1 specificity. An ROC curve area of 1.0 represents perfect discrimination, whereas an area of 0.5 represents chance discrimination. ROC curves obtained in analysis 1 were then compared with a z test to those obtained in analysis 2. According to Zhou et al.,5 the formula to compare two independent ROC curve areas is the z test (which asymptotically follows the standard normal distribution) given by:
![]() |
Sensitivities at fixed specificities of 80% and 95% were reported for each parameter in each one of the analyses (STATA ver. 9.0; StataCorp, College Station, TX, and SPSS ver. 13.0; SPSS Inc., Chicago, IL). The
level (type I error) was set at 0.05.
| Results |
|---|
|
|
|---|
|
|
|
|
|
|
| Discussion |
|---|
|
|
|---|
Estimates of diagnostic accuracy obtained in analysis 1 are similar to those reported in several other studies that included patients with glaucomatous visual field defects and normal subjects defined in a similar way.13 14 15 16 17 18 19 The parameter with the largest ROC curve area, LDF Bathija, had a sensitivity of 79% to detect patients with visual field loss, for specificity at 80% in healthy subjects, indicating good ability to discriminate between these two groups. A clinician, however, is often interested in using an imaging instrument to help establish whether a patient with suspicious optic disc appearance, but no visual field loss, has glaucomatous damage or only a normal variation of optic nerve morphology. This example illustrates an important paradox of assessment of imaging technologies for glaucoma diagnosis. The sensitivity and specificity of the test are usually measured in a patient population that has different characteristics from those of the population in whom the test is typically used. Estimates of diagnostic accuracy obtained from analysis 1, or the above-mentioned studies, are not directly applicable to the evaluation of patients with suspected glaucoma, as these patients clearly do not fall in either the category of patients with glaucomatous visual field loss or that of healthy subjects with no suspected disease.
In analysis 2, patients were selected based on the presence of suspect appearance of the optic nerve. These patients did not have confirmed visual field abnormalities, and their diagnoses at the time of HRT imaging could not be clearly established based on cross-sectional assessment. This design replicates a common situation faced by clinicians, which is the use of imaging instruments to help evaluate patients with suspected glaucoma. Therefore, estimates of accuracy obtained from analysis 2 are more likely to represent the performance expected from the test in this clinical situation.
Analysis 2 had a fundamental difference from analysis 1, which was the presence of diagnostic uncertainty in the former but not in the latter. Different from analysis 1, patients in analysis 2 could not be clearly separated based on the optic nerve appearance at the time of the imaging session. In fact, the presence of diagnostic uncertainty is considered a fundamental requirement for a clinically relevant study evaluating a diagnostic test.20 21 22 According to the American Medical Association evidence-based guidelines for evaluation of the medical literature, one of the fundamental questions to be asked during evaluation of the validity of a diagnostic study is whether the clinician faced diagnostic uncertainty.20 In studies with no diagnostic uncertainty, the power of the test to detect disease tends to be overestimated. Lijmer et al.21 evaluated the impact of design-related bias in studies of diagnostic tests. They found that studies with a casecontrol design including patients with well-established disease and a separate group of normal control subjects resulted in a threefold overestimation of the power of the test. In the present study, we found that diagnostic accuracy estimates obtained from analysis 1 were significantly higher than those found in analysis 2. For the LDF Bathija, for example, the AUC for analysis 1 was 0.91 compared with 0.71 for analysis 2. A similar trend was observed for all other parameters, including the contour-lineindependent parameter GPS in which the AUC was 0.89 in analysis 1, but decreased to 0.65 in analysis 2. The decrease in performance of topographic parameters in analysis 2 compared with analysis 1 is probably related, at least in part, to the less severe stage of disease in subjects with glaucoma included in analysis 2 compared with analysis 1. The decrease in performance is also probably related to the method of selection of normal subjects for the two analyses. In analysis 1, normal subjects were required to have normal optic discs, whereas in analysis 2, normal subjects had suspect appearance of the optic disc, making it more difficult for the diagnostic test to differentiate them from subjects with disease.
Significant differences were found for some HRT parameter values between glaucoma and normal subjects in analysis 2. This indicates that HRT can be useful to provide additional information for clinicians evaluating patients with suspected glaucoma. These results agree with those from Bowd et al.23 who reported on the baseline predictive value of HRT in detecting which patients with suspected glaucoma would later undergo conversion to glaucomatous visual field loss. As part of the Ocular Hypertension Treatment Study, Zangwill et al.24 also showed that the HRT topographic parameters had a significant predictive value in discriminating those ocular hypertensive eyes that converted to glaucoma from those that did not. It is important to note, however, that estimates of accuracy obtained from the these studies and from analysis 2 were, in general, much lower than those obtained from casecontrol studies with a design similar to that of analysis 1. In the work by Zangwill et al., the sensitivity of MRA classification to detect converting eyes was 28% for 85% specificity. Considering borderline results to be within normal limits, as in Zangwill et al., the sensitivity of MRA classification in analysis 2 was 47.5%, with a specificity of 67%.
Important differences could be identified in the performance of some topographic measures between analyses 1 and 2. For example, disc area was significantly lower in normal than in glaucomatous eyes in analysis 1, but not in analysis 2. Disc area has not been identified as a risk factor for glaucomatous damage (although some controversy still exists on this topic)25 26 27 and is not generally considered a useful parameter to differentiate between glaucomatous and normal eyes.24 28 29 Therefore, this finding is probably related to a bias in selection of normal subjects in analysis 1. As healthy subjects are required to have a healthy appearance of the optic nerve in analysis 1, their selection may exclude patients with suspect appearance of the optic nerve, such as those with large discs and large cups. Similar findings can be observed in studies in the literature with design similar to that of analysis 1.15 Another important difference between the two analyses was that cup-related parameters, such as cup area or cup shape measure, were significantly different between glaucomatous and normal eyes in analysis 1, but not in analysis 2. This is probably explained by the fact that patients in analysis 2 were selected based on the presence of suspect optic disc appearance, which generally includes patients with large cups. Results from analysis 2 are in agreement with the clinical observation that an enlarged cup is generally not sufficient to differentiate a suspect optic disc that has early glaucomatous damage from one with a physiologic large cup.
It is also important to note that cutoffs of abnormality for imaging instrument parameters recommended by manufacturers are generally based on casecontrol studies with design similar to that of analysis 1. However, these cutoffs are not likely to have the desired properties when applied to the situation of diagnosing disease in a patient with suspected glaucoma. For example, although a cutoff of 82 for the GPS parameter would provide 95% specificity (with 64% sensitivity) in analysis 1, it would result in a specificity of 86% in analysis 2 (with 35% sensitivity).
For analysis 2, we used evidence of documented progressive disc change to separate glaucoma suspect patients in those who were disease positive versus disease negative. As all these patients were required to have normal visual fields and suspect optic disc findings, no other reference standard would be available to classify these patients in this situation. Because of the wide variability of the optic nerve appearance, a single optic disc examination is frequently nondiagnostic in the early stages of glaucoma. In the absence of visual field loss, a diagnosis of certainty of glaucoma can only be given by demonstrating a previous history of progressive glaucomatous changes to the optic nerve. In fact, the presence of documented progressive glaucomatous change in the optic disc appearance has been suggested to be the best currently available reference standard for glaucoma diagnosis.6 The use of progressive optic disc change as reference standard, however, has some limitations. Demonstration of progressive optic disc change requires longitudinal follow-up and serial documentation of optic disc appearance, which may not be available for all patients. Patients with suspect optic disc appearance who did not show any evidence of optic disc change or visual field loss during follow-up were considered to be normal subjects in analysis 2. It might be argued that some of these patients had glaucomatous damage, but the follow-up time was insufficient to detect progression. Although it is unlikely that patients with glaucoma would not show progression or functional loss when observed for 9 years without treatment, this possibility cannot be completely discounted. It should be noted, however, that this limitation also applies to any other study design in glaucoma, as it is currently impossible to exclude completely a diagnosis of glaucoma in patients who may have the disease. In fact, the absence of a perfect reference standard is a relatively common problem in diagnostic medicine. Several statistical approaches have also been suggested to deal with this issue, including the evaluation of multiple tests in the absence of a reference standard.30 31 These approaches, however, have not yet been used for evaluation of diagnostic accuracy of imaging instruments in glaucoma and deserve further study.
It should also be emphasized that the study designs used in analysis 1 and 2 are not appropriate to evaluate whether a particular diagnostic test fulfills the requirements of a screening device for glaucoma. Studies including unselected populations are necessary to resolve this question. Also, the study design used in analysis 2 is helpful to address one question, which is the performance of imaging instruments in detecting disease in patients with suspected glaucoma by the appearance of the optic disc. This is not the only situation to which imaging tests can be applied. In fact, these methods have been shown to be helpful for evaluation of patients with ocular hypertension24 32 or for longitudinal follow-up of persons with suspected glaucoma and patients with confirmed disease.33 Also, detection of progressive structural change by longitudinal evaluation with imaging instruments could be used to help establish diagnosis in some cases. Different study designs may be appropriate for obtaining answers to other clinically relevant questions.
In conclusion, estimates of diagnostic accuracy of CSLO in glaucoma can be largely different depending on the population studied and the reference standard used to define disease. Diagnostic accuracy estimates obtained from casecontrol studies including well-defined groups of subjects with or without disease may not be applicable to the clinically relevant population.
| Footnotes |
|---|
Submitted for publication June 7, 2006; revised August 9, 2006; accepted November 21, 2006.
Disclosure: F.A. Medeiros, Carl-Zeiss Meditec, Inc. (F); D. Ng, None; L.M. Zangwill, Heidelberg Engineering (F, R); P.A. Sample, Carl-Zeiss Meditec, Inc., Alcon Laboratories, Inc., Allergan, Pfizer, Inc., and Santen, Inc. (F); C. Bowd, None; R.N. Weinreb, Carl-Zeiss Meditec, Inc. (F) and Heidelberg Engineering (F)
The publication costs of this article were defrayed in part by page charge payment. This article must therefore be marked "advertisement" in accordance with 18 U.S.C.
1734 solely to indicate this fact.
Corresponding author: Felipe A. Medeiros, Hamilton Glaucoma Center, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0946; fmedeiros{at}eyecenter.ucsd.edu.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
L. M. Alencar, C. Bowd, R. N. Weinreb, L. M. Zangwill, P. A. Sample, and F. A. Medeiros Comparison of HRT-3 Glaucoma Probability Score and Subjective Stereophotograph Assessment for Prediction of Progression in Glaucoma Invest. Ophthalmol. Vis. Sci., May 1, 2008; 49(5): 1898 - 1906. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |