|
|
||||||||
1From the UPMC Eye Center, Ophthalmology and Visual Science Research Center, Eye and Ear Institute, Department of Ophthalmology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania; 3Institute for Human and Machine Cognition, Pensacola, Florida; and the 4Department of Philosophy, Carnegie Mellon University, Pittsburgh, Pennsylvania.
| Abstract |
|---|
|
|
|---|
METHODS. Forty-seven patients with glaucoma (47 eyes) and 42 healthy subjects (42 eyes) were included in this cross-sectional study. Of the glaucoma patients, 27 had early disease (visual field mean deviation [MD]
6 dB) and 20 had advanced glaucoma (MD < 6 dB). Machine-learning classifiers were trained to discriminate between glaucomatous and healthy eyes using parameters derived from OCT output. The classifiers were trained with all 38 parameters as well as with only 8 parameters that correlated best with the visual field MD. Five classifiers were tested: linear discriminant analysis, support vector machine, recursive partitioning and regression tree, generalized linear model, and generalized additive model. For the last two classifiers, a backward feature selection was used to find the minimal number of parameters that resulted in the best and most simple prediction. The cross-validated receiver operating characteristic (ROC) curve and accuracies were calculated.
RESULTS. The largest area under the ROC curve (AROC) for glaucoma detection was achieved with the support vector machine using eight parameters (0.981). The sensitivity at 80% and 95% specificity was 97.9% and 92.5%, respectively. This classifier also performed best when judged by cross-validated accuracy (0.966). The best classification between early glaucoma and advanced glaucoma was obtained with the generalized additive model using only three parameters (AROC = 0.854).
CONCLUSIONS. Automated machine classifiers of OCT data might be useful for enhancing the utility of this technology for detecting glaucomatous abnormality.
Optical coherence tomography (OCT) is a noncontact, noninvasive imaging technology that uses light to create high-resolution, cross-sectional tomographic images of the retina and the ONH.7 The device differentiates layers in the retina due to the differences in time delay of reflection from various components of the tissue. Previous studies have shown that OCT data are highly reproducible,8 9 10 11 and that the device has the capability to differentiate glaucomatous from nonglaucomatous eyes.12 13 14 15 16
Since OCT provides numerous stereometric measurements of the disc, macula and peripapillary RNFL, it is important to find the parameters that serve best in the detection of glaucoma. Previous experience with confocal scanning laser ophthalmoscopy17 18 and scanning laser polarimetry19 20 showed that the combination of multiple parameters and advanced data analysis methods can improve the sensitivity and specificity of glaucoma detection. Specifically, improved discrimination between glaucomatous and healthy eyes was obtained with machine-learning classifiers.21 22 23 24
Machine-learning classifiers are trained computerized systems with the ability to detect the relationship between multiple input parameters and a diagnosis. Trained classifiers can be used to predict the diagnosis of new cases. The purpose of this cross-sectional study was to test the performance of numerous machine-learning techniques using OCT data to discriminate between glaucomatous and healthy eyes.
| Methods |
|---|
|
|
|---|
All subjects had a comprehensive ophthalmic evaluation, and all tests were completed within six months. The evaluation included medical history, best-corrected visual acuity, manifest refraction, intraocular pressure (IOP) measurements by Goldmann applanation, gonioscopy, slit-lamp examination before and after pupil dilation, VF testing, and OCT scanning of the disc, macula, and peripapillary RNFL. All subjects underwent pupillary dilation with 1% tropicamide and 2.5% phenylephrine, both from Alcon Laboratories, Inc. (Fort Worth, TX).
All the participants had best-corrected visual acuity of 20/40 or better and refractive error between 6.00 and +6.00 diopters (D; spherical equivalent). Subjects were excluded if they exhibited signs of retinal or ONH pathologies other than glaucoma, if media opacity or a poorly dilating pupil interfered with clinical viewing or imaging of the fundus, or if they chronically used medications that are known to affect retinal thickness. Patients were also excluded if they had systemic diseases that might affect the retina or VF or if they had any previous operation in the study eye other than uneventful cataract extraction.
Glaucomatous Eyes
Eyes were defined as glaucomatous if there was both glaucomatous optic neuropathy (GON) and glaucomatous VF loss. GON was defined as either intereye cup-disc ratio asymmetry > 0.2, accounting for disc size; rim thinning or notching; peripapillary hemorrhages; or cup-disc ratio
0.6. Glaucomatous VF loss was diagnosed if any of the following findings were evident on two consecutive VF tests: a glaucoma hemifield test outside normal limits, pattern standard deviation (PSD) < 5%, or a cluster of three or more nonedge points in typical glaucomatous locations, all depressed on the pattern deviation plot at a level of P < 0.05, with one point in the cluster depressed at a level of P < 0.01.
Healthy Eyes
Eyes were defined as healthy if there was no history or evidence of glaucoma, IOP
21 mm Hg, ONH not meeting the criteria for GON as previously described, and a normal Humphrey 242pattern VF not meeting the criteria for glaucomatous VF loss as previously described.
VF Testing
All subjects underwent Humphrey Swedish interactive thresholding algorithm standard or full-threshold 242 perimetry (Carl Zeiss Meditec, Dublin, CA). A reliable VF test was defined as one with fewer than 30% fixation losses, false-positive, or false-negative responses. The VF results were considered reproducible if the same type, location, and index of abnormality were evident in two consecutive VF tests.
OCT Scanning
All OCT scans were performed using commercially available equipment (Stratus OCT with software version 2.0; Carl Zeiss Meditec) with an in vivo tissue resolution of approximately 8 to 10 µm.
OCT measurements of the macula were generated with a fast protocol of six 6-mm linear scans in a spoke pattern configuration centered on the fovea, lines 30° apart. OCT measurements of the ONH were done with a fast protocol in a similar spoke pattern. Peripapillary RNFL scans were done with a fast protocol of three circumpapillary scans centered on the ONH with a diameter of 3.4 mm.
Scans were defined as poor quality if the signal-noise ratio was below 35 dB and/or there was overt misalignment of the surface detection algorithm of at least 15% consecutively or 20% cumulatively of the total sampling points. All OCT data were aligned according to the orientation of the right eye. In this way, clock hour 9 of the circumpapillary scan represented the temporal aspect of the ONH for both eyes.
Thirty-eight OCT parameters, which all appear in the conventional printouts, were used for the analysis. From the macular scan, we used retinal thickness in nine sectors as well as macular volume. We also used the global mean macular thickness, which was derived from a weighted mean of the regional measurements taking into account the relative area of each sector, as described previously.25 From the ONH scans, we used all 10 parameters: vertical integrated rim area, horizontal integrated rim width, disc area, cup area, rim area, cup-discarea ratio, horizontal cup-disc ratio, vertical cup-disc ratio, cup area (topographic), and cup volume (topographic). Circumpapillary analysis resulted in an additional 17 parameters: global mean RNFL thickness, 4 quadrant mean thicknesses, and 12 clock hour means.
Machine Classifiers
The following classifying methods were tested: linear discriminant analysis (LDA), generalized linear model (GLM), and generalized additive model (GAM). In addition, the following machine learning classifiers were tested: support vector machine (SVM) and recursive partitioning and regression tree (RPART). All classifiers were implemented in the statistical software (R version 1.9; R-Project, available at http://cran.r-project.org).
LDA assumes a Gaussian distribution of data and defines linear discrimination boundaries between the categories where it maximizes the variance between classes while minimizing the variance within classes. The classification of a new data point is determined by the likelihood that it is generated from each of the different categories.26 27
GLM assumes that the log of the odds ratio of a patient having glaucoma versus being healthy can be expressed as a linear function of the parameters.28 The decision boundary between glaucomatous and healthy eyes is the hyperplane where the predicted odds of a patient having glaucoma are equal to the predicted odds of the same patient being healthy.
GAM assumes that the conditional expectation of glaucoma severity given by OCT parameters (unchanged) can be expressed as a sum of univariate smooth functions of the OCT parameters.29 The fitted model minimizes mean squared prediction error subject to certain penalty of model complexity.
SVMs map multidimensional input space into a high-dimensional feature space.30 31 In this feature space, the classifier finds the hyperplane separating glaucomatous from healthy eyes that maximizes the distance of any case from the hyperplane. The transformation of input space to feature space is called a kernel; in this study a linear kernel was used. The SVM used in this study also allowed for imperfect classification of glaucomatous and healthy eyes by the algorithm in situations where perfect classification is not possible. Intuitively, this makes the classification correct for testing data that is near but not identical to the training data.
The RPART function is an implementation of the decision-tree algorithm. It recursively partitions the parameter space along some of the parameters.32 The partition process can be represented by a binary tree, and the partitioned regions of the parameter space are called leaves. The choice of the parameters to be split and the points at which the parameter space is split are chosen to maximize certain scores, such as information gain. A new case is classified using majority vote of cases in the training data belonging to the same leaf as the new case.
Feature Selection
The classification was performed using all 38 available parameters and with 8 parameters with the highest correlation with VF mean deviation (MD). A limited number of parameters was used to ensure the reliability of the machine classifiers accounting for the limited study sample size. A backward selection using Akaike information criteria (AIC) from these eight parameters was used to further simplify the classifier formula with preservation of the discrimination capabilities. Backward feature selection could not be used for SVM and RPART, for which AIC is not defined, and for LDA, in which the AIC is not reliable.
Data Analysis and Statistics
The study population characteristics were compared using Students t-test for continuous parameters and a
2 test for categorical parameters (JMP software; SAS Institute, Cary, NC).
Receiver operating characteristic (ROC) curves were used to describe the ability of the classifier to differentiate between glaucomatous and healthy eyes. ROC was calculated for each individual parameter. To get an unbiased estimate of the ROC curves, all 89 patients were divided into six equal groups (one patient appears in two groups). Six tests were conducted for each classifier; in each test, a different group of patients was chosen to be the testing set. The other five groups were used as training sets. ROC was calculated for each of the tests, and the final cross-validation ROC curve was computed as the pointwise average of the six ROC curves. The area under the ROCs (AROCs) for the six folds across algorithms were compared using the DeLong method.33 The sensitivity was calculated at the arbitrary specificities of 80% and 95%.
Cross-validation accuracy was used to estimate the ability of the different classifiers to discriminate between glaucomatous and healthy eyes.34 35 The accuracy was the number of true predictions out of the total number of observations. In this method, the model is created on all the data except one eye as a training set, then testing is performed on the remaining eye and reported with accuracy. This is repeated a number of times equal to the number of eyes tested, and the accuracies are averaged. This way of cross-validation maximizes utilization of the data set for creating the model.
| Results |
|---|
|
|
|---|
Seventy-five healthy volunteers were recruited. Among them, three had visual acuity below 20/40, two had diabetes, three had myopia exceeding 6 D, and five had a nonreliable VF test. Among the qualified healthy volunteers, 52 had both normal VF and normal ONH appearance. Ten more subjects were excluded due to poor scans (5 ONH, 4 macula and 1 NFL).
Eyes of 42 healthy subjects (42 healthy eyes) and 47 glaucoma patients meeting eligibility criteria (47 glaucomatous eyes) were analyzed in this study. The study population characteristics are summarized in Table 1 . The healthy subjects were significantly younger than the glaucoma patients (P = 0.001), and the mean VF MD of the glaucomatous eyes was 6.4 ± 5.0 dB.
|
|
|
|
Adding age as one of the attributes of the machine classifiers did not improve the prediction of the analysis. The sample size prevents drawing any conclusion about the importance of gender as a predictor.
Grading the Severity of Glaucoma
The glaucoma patients participating in this study were divided into those with early and late stages of disease, defining MD
6.0 dB as an indication of early glaucoma and MD < 6.0 dB as indicative of advanced glaucoma. Twenty-seven of the 47 patients were classified as having early glaucoma, whereas 20 had advanced glaucoma. Using machine classifiers, we noted a substantial reduction in the AROC for differentiating between early and advanced glaucoma compared to those found for differentiating between healthy and glaucomatous eyes. The best model to distinguish between these groups was GAM, using 3 of the 38 parameters that were tested (Table 3) . The AROC was 0.854 and the accuracy of GAM(3) was 0.745. The model correlation coefficient with MD was 0.811 (P < 0.001, confidence interval 0.8910.683; Fig. 2 ).
|
| Discussion |
|---|
|
|
|---|
Our study population included a spectrum of glaucomatous damage that approximated the average glaucoma practice. In this population we found that even the use of a single parameter allowed for better differentiation between healthy and glaucomatous eyes than the majority of previously published studies.13 15 16 36 37 This might be due to the inclusion criteria used in our study. However, Buedenz et al.38 have recently reported an AROC of 0.971 for an RNFL parameter similar to the findings in our study. Nouri-Mahdavi et al.39 used logistic regression of numerous parameters, which did not significantly improve the discrimination between glaucoma and healthy subjects. Applying Fourier analysis to OCT circumpapillary data resulted in an AROC of 0.925.16 Hougaard et al.40 used an NFL symmetry test on the RNFL scan data, which improved the sensitivity in detecting glaucoma compared to the best single parameter, but the difference was not significant.
We found that a limited number of parameters provided an improved differentiation between eyes. This in turn may allow implementation of these algorithms into OCT software for clinical use. Interestingly, all eight parameters used were acquired either in the ONH or the peripapillary regions, with no contribution from macular data.
We used two methods of cross-validation, sixfold and leave-one-out, to avoid training the classifiers and testing their performance on the same group. In sixfold cross-validation, the training is performed on five sixths and tested on the remaining sixth of the entire population. The procedure is repeated six times; thus, each group serves as a testing group one time. In the leave-one-out cross-validation, the training is done on the entire population, except one subject that is tested. This procedure is repeated multiple times, equal to the number of the participating subjects, and each time a single subject is tested. The cross-validated AROC and accuracy results thus provided unbiased estimates of the performance of the machine-learning classifiers trained with relatively small samples. It should be noted that with limited sample size, complex machine-learning classifiers such as SVM, LDA, or GLM tend to perform worse than simpler classifiers such as single OCT parameters. However, as sample size increases, complex classifiers tend to perform better. Our findings of improved cross-validated AROC and accuracy of the machine-learning classifiers compared with the single OCT parameters with only 89 participants is encouraging, although the one-side P-value for comparison with a single parameter only approached the significance level (P = 0.07). Nevertheless, these methods do not eliminate the possible confounder due to undetermined findings that might be exclusive to our study group. Therefore, it would be beneficial to test our models on a separate independent group of subjects.
Since glaucomatous damage can cause either local or generalized abnormalities, we used both segmental and global measurements with the machine classifiers (e.g., overall mean macula and segmental macular measurement). The eight selected parameters (Table 3) included overall NFL thickness, NFL thickness in the inferior quadrant, and NFL thickness at clock hours 6 and 7, which can be perceived as giving additional weight for the inferior sector as a typical location of glaucomatous damage.15 16 36 37 38
Linear discriminating methods (LDA and GLM) could differentiate between groups in a capacity similar to the multidimensional discrimination method of SVM. This can be appreciated in Figure 3 , where the multidimensional OCT data are projected onto a two-dimensional plane. It is easily observed that a linear plane can be placed between healthy and glaucomatous eyes.
|
We found a significant difference in age between the healthy subjects and the patients with glaucoma. There was concern that if age was included as a parameter in the machine classifier model, it might unduly influence the outcome; overriding the parameters produced by the OCT. To test this, we added the age as an input parameter to the machine classifiers. The accuracy of the classifiers was not improved. Nevertheless, we did not use age as one of the attributes of the machine classifiers to avoid the possibility of biasing the results.
Another limitation of our study was the small sample size. This might affect the findings when using all 38 OCT printout data parameters. As was mentioned earlier, complex machine classifiers that use numerous input parameters tend to perform better in larger datasets. Further investigation with larger number of participants is currently underway.
In summary, machine classifiers of OCT measurements can provide a simple and accurate index for diagnosing the presence or absence of glaucoma as well as its severity. The classifiers that used a limited number of parameters (8) yielded the best discriminating capacity. A grading system for the severity of glaucoma was developed. A long-term prospective study is needed to determine the utility of this grading index in assessing glaucoma progression, compared to existing parameters.
| Footnotes |
|---|
Supported by the National Eye Institute, Bethesda, MD (R01-EY13178, P30-EY13078); the Pennsylvania Lions Eye Research Fund, Pittsburgh, PA; Research to Prevent Blindness, New York, NY; and the Eye and Ear Foundation, Pittsburgh, PA.
Submitted for publication March 22, 2005; revised May 22, June 23, and July 21, 2005; accepted September 15, 2005.
Disclosure: Z. Burgansky-Eliash, None; G. Wollstein, None; T. Chu, None; J.D. Ramsey, None; C. Glymour, None; R.J. Noecker, None; H. Ishikawa, None; J.S. Schuman, Carl Zeiss Meditec (F, P)
The publication costs of this article were defrayed in part by page charge payment. This article must therefore be marked "advertisement" in accordance with 18 U.S.C.
1734 solely to indicate this fact.
Corresponding author: Gadi Wollstein, UPMC Eye Center, Department of Ophthalmology, University of Pittsburgh School of Medicine, 203 Lothrop Street, Eye and Ear Institute Suite 827, Pittsburgh, PA 15213; wollsteing{at}upmc.edu.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
K A Townsend, G Wollstein, D Danks, K R Sung, H Ishikawa, L Kagemann, M L Gabriele, and J S Schuman Heidelberg Retina Tomograph 3 machine learning classifiers for glaucoma detection Br. J. Ophthalmol., June 1, 2008; 92(6): 814 - 818. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Ferreras, L. E. Pablo, A. B. Pajarin, J. M. Larrosa, V. Polo, and F. M. Honrubia Logistic Regression Analysis for Early Glaucoma Diagnosis Using Optical Coherence Tomography Arch Ophthalmol, April 1, 2008; 126(4): 465 - 470. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Bowd, J. Hao, I. M. Tavares, F. A. Medeiros, L. M. Zangwill, T.-W. Lee, P. A. Sample, R. N. Weinreb, and M. H. Goldbaum Bayesian Machine Learning Classifiers for Combining Structural and Functional Measurements to Classify Healthy and Glaucomatous Eyes Invest. Ophthalmol. Vis. Sci., March 1, 2008; 49(3): 945 - 953. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Boden, K. Chan, P. A. Sample, J. Hao, T.-W. Lee, L. M. Zangwill, R. N. Weinreb, and M. H. Goldbaum Assessing Visual Field Clustering Schemes Using Machine Learning Classifiers in Standard Perimetry Invest. Ophthalmol. Vis. Sci., December 1, 2007; 48(12): 5582 - 5590. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Naithani, R. Sihota, P. Sony, T. Dada, V. Gupta, D. Kondal, and R. M. Pandey Evaluation of Optical Coherence Tomography and Heidelberg Retinal Tomography Parameters in Detecting Early and Moderate Glaucoma Invest. Ophthalmol. Vis. Sci., July 1, 2007; 48(7): 3138 - 3145. [Abstract] [Full Text] [PDF] |
||||
![]() |
M.-L. Huang, H.-Y. Chen, and J.-C. Lin Rule Extraction for Glaucoma Detection with Summary Data from StratusOCT Invest. Ophthalmol. Vis. Sci., January 1, 2007; 48(1): 244 - 250. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |