|
|
||||||||
1 From the Glaucoma Center, Department of Ophthalmology, and the 2 Institute for Neural Computation, University of California at San Diego, La Jolla, California; the 4 Computational Neurobiology Laboratory, Salk Institute, La Jolla, California; the 3 Department of Ophthalmology, University of Dresden, Germany; and 5 Discoveries in Sight, Devers Eye Institute, Portland, Oregon.
| Abstract |
|---|
|
|
|---|
METHODS. The visual fields of 114 eyes of 114 patients with OHT with four or more visual field tests with standard automated perimetry over three or more years and for whom stereophotographs were available were assessed. The mean (±SD) number of visual field tests was 7.89 ± 3.04. The mean number of years covered (±SD) was 5.92 ± 2.34 (range, 2.8111.77). Fields were classified as normal or abnormal based on Statpac-like methods (Humphrey Instruments, Dublin, CA) and by several machine learning classifiers. The machine learning classifiers were two types of support vector machine (SVM), a mixture of Gaussian (MoG) classifier, a constrained MoG, and a mixture of generalized Gaussian (MGG). Specificity was set to 96% for all classifiers, using data from 94 normal eyes evaluated longitudinally. Specificity cutoffs required confirmation of abnormality.
RESULTS. Thirty-two percent (36/114) of the eyes converted to abnormal fields during follow-up based on the Statpac-like methods. All 36 were identified by at least one machine classifier. In nearly all cases, the machine learning classifiers predicted the confirmed abnormality, on average, 3.92 ± 0.55 years earlier than traditional Statpac-like methods.
CONCLUSIONS. Machine learning classifiers can learn complex patterns and trends in data and adapt to create a decision surface without the constraints imposed by statistical classifiers. This adaptation allowed the machine learning classifiers to identify abnormality in visual field converts much earlier than the traditional methods.
| Introduction |
|---|
|
|
|---|
In a previous study, we compared the ability of several classifiers to detect early field loss.4 The inputs to the classifiers in that study were threshold values from standard visual fields plus the age from either healthy eyes or from eyes with glaucomatous optic neuropathy (GON). Because there is no absolute agreed-on gold standard for the presence of early glaucoma, the surrogate gold standard in this previous study, used to train the classifiers, was the absence or presence of GON. Visual field results were not used to select subjects or as a gold standard to train the output. The output from each classifier was a designation of either "normal field" or "glaucomatous field". The classifiers results were also compared with those of two glaucoma experts and the Statpac 2 indices5 6 (Humphrey Instruments, Dublin, CA) that are typically used to identify field abnormalities. We found that several machine learning classifiers representing different methods of learning and reasoning performed well in comparison with both Statpac 2 and the glaucoma experts when classifying the visual fields.
The purpose of the present study was to apply the best candidate machine learning classifiers from our previous study, along with more Statpac-like traditional classifiers, to a new set of longitudinal standard automated perimetry (SAP) data from 114 ocular hypertensive eyes. If the classifiers could identify visual field converts from this group, they might have great utility in situations in which experts in glaucoma are not available and for standardization of methods in clinical trials.
| Methods |
|---|
|
|
|---|
Each subject underwent a complete ophthalmic examination, which included review of relevant medical history, best corrected visual acuity, slit lamp biomicroscopy (including gonioscopy), applanation tonometry, dilated funduscopy, and fundus photography.
All patients had OHT with intraocular pressures more than 23 mm Hg when measured on at least two separate occasions and normal SAP visual field test results (defined later) on baseline examination. All had best corrected acuity of 20/40 or better, spherical refraction within +5.0 D, and cylinder correction within +3.0 D. Patients with significant lens opacity at the baseline clinical examination or on subsequent ophthalmic examinations were included. Patients with other disorders known to affect visual fields were excluded. Neither visual field nor optic nerve status was used to select these subjects.
Visual Fields
Visual field testing consisted of SAP, with the Full-Threshold test strategy and the 24-2 stimulus presentation pattern of the Humphrey Visual Field Analyzer (Humphrey Instruments) with 31.5 apostilbs (10 candelas/m2) white background and a Goldmann size III stimulus. Patients had to have normal visual fields (see definition of abnormality below) at baseline with at least three additional follow-up fields over a 3-year period. The mean (±SD) number of fields was 7.89 ± 3.04. The mean (±SD) number of years covered was 5.92 ± 2.34 (range, 2.8111.77 years). All visual fields from all eyes were evaluated by all machine learning and statistical classification methods.
Optic Disc
Included eyes also had serial simultaneous stereophotographs evaluated for evidence of glaucomatous optic neuropathy determined independently in a masked review by two glaucoma specialists at the Optic Disc Reading Center at USCD. Photographs were masked for temporal order. Disagreements were resolved by consensus or adjudication. Optic discs were considered abnormal when one or more of the following was present: excavation or undermining of the cup, nerve fiber layer defects, notching or rim thinning, or cup-to-disc asymmetry between eyes of more than 0.2. Normal optic discs showed no evidence of these abnormal findings. The findings were not part of the inclusionexclusion criteria for the study, but the presence and timing of detectable GON is reported in those eyes showing conversion from normal to abnormal visual fields by the various classifiers.
Normative Data
There were two sets of normative data used in this study. The first was used to develop a Statpac-like analysis package for determining single field abnormality. The second was a longitudinal data set used to set specificity cutoffs for conversion from normal to abnormal fields. Each is described in the following sections.
Statpac-like Visual Field Analysis.
To compare the machine learning classifiers to more traditional statistical approaches for analyzing visual fields, we used a Statpac-like analysis developed for the short wavelength automated perimetry (SWAP) ancillary arm of the Ocular Hypertension Treatment Study (OHTS).7
This analysis with its own normative database was developed to allow extraction of data that are not on the field analyzer printout (e.g., numerical glaucoma hemifield sector values) and to provide export of all data to spreadsheets. Although some of the information could be extracted from the field analyzer printout, we thought it important that all analyses be based on the same normative data set, to allow us to make comparisons between SAP and SWAP in future studies on machine learning classification of SWAP, using the same normative data for both tests.
The normative database consisted of one eye from each of the same 348 normal subjects tested on both SAP and SWAP between the ages of 20 and 85. The data were collected at five different centers by a standardized test protocol identical with that used to establish the field analyzers internal normative database. To be included in the normative database, all subjects had to have a normal findings in an eye examination, 20/30 or better visual acuity, normal color vision, no history of ocular or neurologic disease or surgery, refractive error of less than 5 D spherical equivalent and 3 D cylinder, no diabetes, and normal optic nerve appearance. They could not be taking any medications known to affect visual fields or color vision.
After age correction for each of the 52 test locations of program 24-2 (the two blind spot locations are not included), the total deviation plot, pattern deviation plot, their associated probability cutoffs, and probability plots were computed. The package then computed the global indices and the cutoff values at specific probabilities for mean deviation (MD) and pattern standard deviation (PSD) along with an asymmetry analysis patterned after the glaucoma hemifield test (GHT) analysis.6
Setting Specificities.
When developing an algorithm for conversion from normal to abnormal fields it is important to identify a meaningful specificity for field conversion in longitudinal data sets. To determine the parameters from the analysis that would provide a high specificity for visual field conversion, 94 normal eyes from a longitudinal study that had been performed at the University of California, Davis, were used. Each had been followed-up annually and had had four visual field tests. They ranged in age from 21 to 85. The inclusionexclusion criteria for these normal subjects was identical with those for the 114 eyes with OHT used in the present study, with the exception that they had intraocular pressure less than 20 mm Hg and a family history of glaucoma. None was part of the normative database sample of 348 eyes used to develop the Statpac-like analysis package described in the previous section. Fifty candidate criteria for change were evaluated, and the specificity for each in the 94 longitudinally followed-up normal eyes was determined. This analysis resulted in five best criteria for change from a normal to an abnormal field.7
The specificities and confidence limits for the five are summarized in Table 1
. The resultant algorithm for abnormality based on any one of the five criteria with confirmation is called Statpac-like Analysis for Glaucoma Evaluation (SAGE).
|
Machine Learning Classifiers
Several machine learning classifiers were compared with the results obtained with SAGE to assess their ability to classify the fields of the 114 ocular hypertensive eyes to determine which fields would be classified as abnormal. The machine learning classifiers were chosen based on the results of our previous study, in which we trained a set of classifiers to categorize SAP visual fields as normal or abnormal.4
In that study, the surrogate gold standard for glaucoma was the presence of glaucomatous optic neuropathy. Visual fields were not used to classify the subjects. The study indicated several classifiers that could separate fields from normal eyes and eyes with GON with a high specificity and sensitivity. These already trained classifiers were used in the present study. The sensitivity (the proportion of fields from eyes with GON classified as abnormal) and the specificity (the proportion of fields from normal subjects classified as normal) depended on the selection of a threshold cutoff value along the range of outputs for each classifier. We set the cutoff values for the present study to obtain a specificity of 96% for each of the classifiers, using the same 94 normal eyes from Dr. Johnsons longitudinal study that determined the 96% overall specificity for SAGE.7
As with SAGE, the cutoff needed for that specificity was based on two confirmed abnormal visual fields.
Consistent with the previous study, the input to each of the classifiers listed in the following sections was the absolute threshold at each of the 52 locations of the visual field and age. Training and testing were performed in our previous study, using cross-validation to classify eyes with known GON versus healthy eyes. In the present study, we asked these same already trained classifiers to classify visual fields from an independent group of 114 eyes selected based on IOP and normal baseline visual fields and not on optic nerve status.
SAGE is a type of classifier that uses statistically determined cutoffs to distinguish between classes. The attractive aspect of machine learning classifiers is their ability to learn complex patterns and adapt to the data. They are not constrained to linear analysis, which would, for example, result in a decision surface that is a line in two dimensions between the data of patients with abnormal fields and those with normal fields, or a plane in three dimensions. Instead, the decision surface can be any shape in the dimension that provides the best separation between the groups. Abnormal fields fall on one side of the surface and normal on the other. The decision surface itself is a boundary between the two clusters of data. A brief description of the classifiers used follows. More detail on each can be found in the detailed Appendix to our previous study.4
Support Vector Machines.
These are a new class of supervised machine learning algorithms or neural networks that are able to solve a variety of classification and regression (model-fitting) problems.8
9
A support vector machine (SVM) can separate data that are not easily separable in the original data space, by mapping the data of interest into a much higher dimensional space until a decision surface is identified that allows the separation of the input datain our case into two groups of visual fields: normal and abnormal. The name of this classifier refers to support vectors, which are those data points that lie closest to the decision surface and therefore are the most difficult to classify. As such, they have a direct bearing on the optimum location of the decision surface.2
Training maximizes the margin of separation between the normal and abnormal vectors while minimizing the estimated generalization errors in classification.10
11
12
The architecture (structure of the network) is similar to that of a multilayer perceptron (MLP), a basic form of neural network. It is a feed-forward network with an input layer, a hidden layer, and an output layer.
We used two types of SVM. For linearly separable data, the parameters used in the SVM-linear (SVMl) analysis are chosen so that the margin between the decision plane and the training examples is at maximum. To avoid the assumption of linear separability, we also used a multivariate Gaussian distribution (SVMg) analysis.2 Both SVMs significantly outperformed the MLP in our previous study. They have also shown good generalization of performance in face recognition,13 text categorization,14 recognition of handwritten digits,15 and breast cancer diagnosis and prognosis.16
Mixture of Gaussian.
The MoG is a special case of a committee machine.2
Committee machines use a set of hidden analyses to divide a computationally complex task into a number of computationally simple tasks, performed by "committee members". Each member does well at modeling its own simplified data set. In the associative MoG model, the members perform self-organized learning (unsupervised learning) on the input data to achieve good partitioning. The fusion of all the members outputs is combined with supervised learning to model the desired response. In our case, the desired response is "normal visual field" or "abnormal visual field." We made two adjustments to facilitate the computation of MoG. To help the MoG manage the high dimensionality of 53 inputs, we constrained the MoG analyses to one Gaussian cluster for each class. This constraint results in a quadratic discrimination function (QDF). In our previous work we found that this improved performance relative to Statpac indices. Also, these classifiers sometimes have difficulty with high-dimension input, and therefore we also projected the data by using principal component analysis (PCA) from the original 53 dimensions to a space of eight dimensions.17
PCA is a way of reducing the dimensionality of the data space by retaining most of the information in terms of its variance. Our previous work showed that for visual field data, QDF on the full dimension data worked comparably to the MoG on principle component analyzed data. We used both the QDF and MoG with PCA for the present study.
Mixture of Generalized Gaussians.
The mixture of generalized Gaussians (MGG) uses the same architecture as MoG, except it is designed for situations in which the underlying distributions of the data for the classification problem are not necessarily Gaussian. For instance, the data may distribute with heavier tails or may even be bimodal. It would degrade performance of the classifier to model these problems with Gaussian distributions. With the development of a generalized Gaussian mixture model,17
18
we are able to model the class conditional density distributions with higher flexibility, while preserving a comprehension of the statistical properties of the data in terms of, for example, means, variances, and kurtosis. It has been demonstrated in real-data experiments that this model generally improves classification performance over the standard MoG in those cases in which the assumption of a Gaussian distribution of data is incorrect.17
18
In summary, for our initial study, we chose the classifiers that have recently become popular due to their excellent classification performance and robustness in analysis of many data sets in different applications.9 19 20 21 The best among these for separating fields from eyes with GON and fields from normal eyes were used in the present study.4 The best were two types of discriminative classifiers, SVMl and SVMg, as well as, three generative classifiers (QDF, MoG, and MGG). Discriminative classifiers, such as SVM, minimize error by finding optimal boundaries between classes, whereas generative classifiers try to estimate the probability density of each class. These two principles are currently the state of the art in classifiers applied to pattern-recognition tasks.9 19 20 21
| Results |
|---|
|
|
|---|
Table 2 shows the results for the various classifiers. Thirty-eight percent (43/114) of the eyes converted by one or more methods. One or more of the classifiers identified all 36 eyes identified by SAGE. QDF identified 31 plus 1 additional eye; SVMl, 28 plus 6 additional eyes; SVMg, 29 plus 7 additional eyes; MoG, 26 plus 1 additional eye; and MGG, 25 plus 1 additional eye.
|
= 0.630.91; Table 3
).22
For the most part, the same eyes were identified as converting to abnormal fields by SAGE and by the classifiers under test. The two versions of SVM showed agreement of 96%, as did the MoG and MGG. This can be attributed to the similarities in their architecture. The QDF agreed most closely with the SAGE method (95% agreement).
|
The classifiers used in this study were chosen because we had previously shown they could successfully separate the visual fields of normal eyes from those of eyes with GON. To determine whether use of these classifiers in a new data set would be consistent with our previous findings, we assessed the presence of GON in the 43 (of 114) eyes that showed confirmed abnormal fields based on one or more classifiers. Two of these eyes had photographs that could not be assessed because of poor quality or missing information, leaving 41 eyes for the analysis. Based on the SVMg results, 66% (27/41) of these eyes had a glaucomatous optic disc at baseline or within 1 year thereafter. An additional 16% (5/41) showed development of GON sometime during follow-up, although 22% (9/41) showed no discernible evidence of GON during the course of the study. Table 4 shows the GON results, by classifier, in eyes identified as having converted visual fields. The table shows that SVMg identified as many eyes as SAGE and that it identified mostly, although not always, the same eyes. However, SVMg showed the best agreement with the presence of GON at 94% (32/34). The percentage of eyes with confirmed abnormal fields identified by the other classifiers as having GON was 81% (26/32) with SVMl, 81% (21/26) with MoG, 75% (24/32) with QDF, 75% (18/24) with MGG, and 74% (25/34) with SAGE.
|
| Discussion |
|---|
|
|
|---|
To differentiate true change from random fluctuation, evidence for change in visual fields should be confirmed on subsequent visual fields. For example, the Ocular Hypertension Treatment Study (OHTS),27 the Advanced Glaucoma Intervention Study (AGIS),28 the Collaborative Initial Glaucoma Treatment Study (CIGTS),29 and the Early Manifest Glaucoma Trial (EMGT)30 require that change from baseline be observed in three consecutive follow-up visual field analyses before change is verified. In each of these studies, a different algorithm was used for identifying change in their differing patient samples. There is no gold standard for change in visual fields.
This lack of a gold standard influenced the present study as well. In our initial work to determine the classifiers, we used the presence or absence of GON as our surrogate gold standard for glaucoma. The advantage of this was that it eliminated the bias in comparing classifiers with each other, Statpac, or the glaucoma experts that would be present if the fields themselves were used in the training. If a chosen set of criteria for field abnormality were used as the gold standard, it would probably include some elements of Statpac or expert judgment. This would, by definition, make those criteria perform the best. Using GON, a definite marker for glaucoma, eliminated this confounder. This said, some of the classifiers performed comparably to the best glaucoma expert.
In the present study, we avoided a gold standard for determining the sensitivity of each of the different classifiers and instead simply compared them. Therefore, the true sensitivity of the determinations is unknown. Whether these improvements in early detection are valid and whether they will be clinically useful remains to be seen. However, we can make a strong argument that the classifiers are indeed seeing something consistent with glaucomatous visual field loss for two reasons. First, they identified the same eyes that were later identified by traditional methods for assessing field abnormality. Second, they identified the same eyes that had GON or later development of GON. However, we must stress that our conclusions are based on the assumption that characteristic visual field abnormality determined by traditional methods in conjunction with GON indicates glaucoma.
With regard to specificity, there is an advantage to using a large longitudinal data set from normal eyes to determine which criteria identify glaucomatous change in visual fields from normal to early abnormality. This approach yielded five commonly used traditional criteria for SAGE, each highly specific individually, with only one confirmation required. Even in combination, they maintained a specificity of 96%. When cutoff values for each of the machine learning classifiers were also set this way, fair comparisons among methods were possible. Use of the longitudinal normal data set to select criteria for change is also supported by the high level of agreement among the various classifiers, and by the presence of GON in a high percentage of the eyes identified as changed, especially with SVMg.
The identification of change from a normal to abnormal visual field should be considered within the context of glaucoma progression and available treatment options. In general, the time course of glaucoma is slow and the need for early intervention requires assessment of many factors in addition to the vision loss, including the patients age, family history, other risk factors, and quality of life. Some elderly patients with a newly found loss of vision may expect to live out their lives without any noticeable change in performance or quality of life. However we cannot, as yet, accurately predict the likely rate of change for each individual. Younger patients at higher risk may show rapid change, and early detection, and intervention may significantly prolong good vision.
Although, current treatment for glaucoma involves lowering of intraocular pressure to target levels and ongoing follow-up for evidence of the success or failure of treatment, the advent of better medical and surgical therapies, genetic marking, and neuroprotective agents will most likely influence this treatment paradigm. The earlier detection of vision loss by machine learning classifiers, and their use in clinical trials to provide quantifiable and comparable evaluation of the data across sites could be very important in the accurate assessment of these new therapies. In our study, the machine learning classifiers detected visual field abnormality, on average, 4 years before traditional SAGE classification. In theory, their use could significantly shorten the time of clinical trials assessing small changes in SAP visual fields over time.
The use of appropriate machine learning classifiers may be even more important in studies in which other methods are used to measure visual function or optic nerve structure. Some of these newer methods, such as SWAP, frequency-doubling technology perimetry, and confocal scanning laser ophthalmoscopy, have already been used in clinical trials.31 32 Clinicians are not as familiar with these tests as they are with SAP. Interpretation of their results is therefore more difficult. In addition, the analysis packages available with the newer visual function tests are modifications of those developed for SAP and therefore may not be optimal. The use of machine learning classifiers with these newer tests should improve their utility and shorten clinical trial durations, although this remains to be studied.
In summary, we found that machine learning classifiers were able to identify confirmed change in visual fields of eyes with OHT substantially earlier than more traditional methods of analysis of SAP results.
| Footnotes |
|---|
Submitted for publication September 25, 2001; revised March 20, 2002; accepted April 1, 2002.
Commercial relationships policy: N.
The publication costs of this article were defrayed in part by page charge payment. This article must therefore be marked "advertisement" in accordance with 18 U.S.C.
1734 solely to indicate this fact.
Corresponding author: Pamela A. Sample, Department of Ophthalmology, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0946; psample{at}eyecenter.ucsd.edu.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
C. Boden, K. Chan, P. A. Sample, J. Hao, T.-W. Lee, L. M. Zangwill, R. N. Weinreb, and M. H. Goldbaum Assessing Visual Field Clustering Schemes Using Machine Learning Classifiers in Standard Perimetry Invest. Ophthalmol. Vis. Sci., December 1, 2007; 48(12): 5582 - 5590. [Abstract] [Full Text] [PDF] |
||||
![]() |
M.-L. Huang, H.-Y. Chen, and J.-C. Lin Rule Extraction for Glaucoma Detection with Summary Data from StratusOCT Invest. Ophthalmol. Vis. Sci., January 1, 2007; 48(1): 244 - 250. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. H. Goldbaum, P. A. Sample, Z. Zhang, K. Chan, J. Hao, T.-W. Lee, C. Boden, C. Bowd, R. Bourne, L. Zangwill, et al. Using Unsupervised Learning with Independent Component Analysis to Identify Patterns of Glaucomatous Visual Field Defects Invest. Ophthalmol. Vis. Sci., October 1, 2005; 46(10): 3676 - 3683. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Bowd, F. A. Medeiros, Z. Zhang, L. M. Zangwill, J. Hao, T.-W. Lee, T. J. Sejnowski, R. N. Weinreb, and M. H. Goldbaum Relevance Vector Machine and Support Vector Machine Classifier Analysis of Scanning Laser Polarimetry Retinal Nerve Fiber Layer Measurements Invest. Ophthalmol. Vis. Sci., April 1, 2005; 46(4): 1322 - 1329. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. M. Zangwill, K. Chan, C. Bowd, J. Hao, T.-W. Lee, R. N. Weinreb, T. J. Sejnowski, and M. H. Goldbaum Heidelberg Retina Tomograph Measurements of the Optic Disc and Parapapillary Retina for Detecting Glaucoma Analyzed by Machine Learning Classifiers Invest. Ophthalmol. Vis. Sci., September 1, 2004; 45(9): 3144 - 3151. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. A. Sample, K. Chan, C. Boden, T.-W. Lee, E. Z. Blumenthal, R. N. Weinreb, A. Bernd, J. Pascual, J. Hao, T. Sejnowski, et al. Using Unsupervised Learning with Variational Bayesian Mixture of Factor Analysis to Identify Patterns of Glaucomatous Visual Field Defects Invest. Ophthalmol. Vis. Sci., August 1, 2004; 45(8): 2596 - 2605. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Bowd, L. M. Zangwill, F. A. Medeiros, J. Hao, K. Chan, T.-W. Lee, T. J. Sejnowski, M. H. Goldbaum, P. A. Sample, J. G. Crowston, et al. Confocal Scanning Laser Ophthalmoscopy Classifiers and Stereophotograph Evaluation for Prediction of Visual Field Abnormalities in Glaucoma-Suspect Eyes Invest. Ophthalmol. Vis. Sci., July 1, 2004; 45(7): 2255 - 2262. [Abstract] [Full Text] [PDF] |
||||
Read all eLetters
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |