|
|
||||||||
1From the Department of Optometry and Vision Sciences, The University of Melbourne, Victoria, Australia.
The recent study by Åsman et al.1 demonstrated the practical limitations of a method commonly used to identify the general height (85th percentile value) of visual field sensitivity. In that paper the methods simulated outcomes by substituting within normal regions of the visual field, obtained from a large sample (n = 82) of normal observers, a zone of abnormality, whose features were derived individually from a large group (n = 123) of patients having glaucoma, to yield a synthetic abnormal field. The outcomes derived from this synthetic field were then compared to those of the normal field before corruption. These simulations showed that the presence of a local scotoma results in an underestimate of the general height (overestimate of the mean defect; MD) with the magnitude of error being related to the size of the scotoma (number of involved points). Although the average effect on MD was small (range, 0.2 to 2.3 dB), it produced a substantial corruption of the pattern defect index and its associated probability scales, frustrating the detection of progression. The authors1 conclude that improved methods are needed for describing the general height or sensitivity of the visual field. In this article we describe and evaluate two candidate methods that can be applied for such purposes.
Our logic stems from the fact that one of the challenges in clinical science is to identify normal signals given the presence of abnormality or noise. In perimetry, the distribution of outcomes for the dependent variable (in this case decibels) can be described by a probability density function (PDF).2 A bell-shaped or unimodal PDF (Fig. 1A) can be summarized with descriptive statistics of central tendency such as the mean or median and its spread.3 Unfortunately, single-peaked distributions are not typically found in clinical populations.4 5 6 Clinical PDFs have long tails, or become altered by disease7 to show multi-lobed distributions.4 5 6 This effect has been demonstrated in patients with primary open-angle glaucoma, optic neuritis, and/or ocular hypertension, as well as persons with normal eyes with reduced sensitivity.7 Indeed, in some cases, disease can yield a bi-lobed PDF,4 5 6 with few normal values.8 These multi-lobed distributions challenge traditional descriptors of central tendency and their ability to summarize normal parts of the visual field. The problem rests in extracting those few remaining normal data points from the abnormal values, because these will assist in monitoring the development of new defects in such eyes.9 Moreover, the recent work of Åsman et al.1 also shows that such extraction can have a significant bearing on diagnostic capacity. There are two problems in this process for perimetry; the adoption of traditional statistical descriptors, such as the mean, to derive perimetric indices, and the determination of the general height or typical sensitivity of the patient, which is often determined from the 86th percentile value. Although traditional methods have adopted these two different approaches for these applications, we will describe an alternative approach that can yield both outcomes simultaneously.
|
The literature13 proposes two procedures that can be used to yield robust means, where "robustness " refers to the capacity to remain unaffected by outliers. The first is called data trimming and the other is termed weighting. For perimetry, robustness is important, given that perimetric outcomes are negatively skewed.4 5 6 This skew leads to misrepresentation of the central tendency by a mean and the average collapses in the presence of far-advanced visual field losses,8 so the 86th percentile has been suggested as a more robust indicator of threshold.11 12 Although intuitively reasonable, formal testing of the capacity of the 86th percentile to return robust outcomes was lacking until the seminal work of Åsman et al.1 Their approach is based on real data and lacks fullness in the range of factors that can act to corrupt a clinical data set, such as false-positive responses, as these are not common in patients or may have been censored from acceptable outcomes. Some of these issues have been previously canvassed by Turpin et al.,14 wherein they propose that the only way to appreciate fully the limitations of a method is with simulation. We agree with their proposal and describe such an approach.
In this article, we compare methods that can be used to enhance the robustness of the estimate of the general height, identify the remaining normal data points by trimming and weighting,13 and compare the outcomes from these methods to the 86th percentile estimate of general height. The two approaches being considered can be used to identify normal values in the presence of disease or high variability. We demonstrate the benefits of implementing these processes with simulations. We have chosen to use simulations instead of empiric evaluation because the threshold PDF of diseased eyes is not known, and the true endpoint can never be known in clinical data.4 However, to ensure meaningful outcomes, our simulation is based on real data sets3 5 6 and has been tested by applying the methods to clinical data sets where it is known that the usual summary index (MD), fails.8 Finally, our findings should complement those derived from clinical simulations.1
Methods
The problem just described is one of identifying and extracting outliers to return normal values that can then be used to yield the general height. In the following, our methods make use of simulated clinical data, so we first define the clinical PDFs used in our simulations and then describe the different methods for extracting outliers (trimming or weighting).
A PDF gives the probability that various threshold values (in decibels) occur across a visual field, being normalized frequency distributions of the total population. Obviously, these can never be determined, and our estimate of the normal PDF (Fig. 1A) is derived from the normal data extracted from 11,400 central (030°) thresholds (75 control subjects; ages, 5285 years), as detailed elsewhere.5 6 These can be described by the hyperbolic secant of equation 1 .
![]() | (1) |
The PDF of any clinical population varies, depending on the inclusion criteria of the patient group.4 5 6 To control for such variability, we have chosen to develop composite PDFs (equation 2 ) that were created by polling at a specified rate from a normal PDF, an abnormal PDF, and a false-positive PDF. Composite PDFs were created with equation 2 .
![]() | (2) |
For completeness, we also simulated a generalized depression in field sensitivity (Fig. 1F) , using PDFs with a 6 dB loss (µg6 = 20.5 dB) giving an MD of 6.3 dB, and a 12 dB loss (µg12 = 15.2 dB) giving an MD of 12.9 dB. Again, any reduction in threshold was associated with an increased variability.7 In developing the composite PDF for generalized loss, 80% of values were drawn from the distribution having the generalized depression; 10% came from the normal distribution and the remainder represent false-positive responses.
We also considered the effect that a large false-positive rate (40%) can have on our algorithms. In these cases, the normal-to-abnormal ratio was kept constant. For example, with severe defects, 40% of data were polled from the false-positive distribution and 60% came from the other two distributions in a 15:75 ratio (see previous), meaning that normal data comprised 10% and abnormal data comprised 50% of the composite PDF. False responses were drawn from a Gaussian whose mean was displaced beyond the 0.5th percentile limit of the normal data, with little variability (µfp = 45±1 dB; Fig. 1A ). We acknowledge that our approach should not be taken as a descriptive of the false-positive PDF, which can never be determined, but we have adopted our approach because the proximity of our false-positive PDF to the normal PDF should provide the most severe test of the robustness of the outlier method for detecting small departures from normality.
Data trimming13 removes outliers to give asymptotic (near constant) outcomes and can be improved with an iterative process. The iteration can cease when it fails to affect outcomes by a reasonable amount, in our case 2 dB, which for our purpose was found to occur by the sixth iteration in all cases. We recognize that this criterion can vary according to the precision needed in a particular setting. The challenge for the trimming process becomes to define the window that can be used to identify normal data.
Most statistical approaches recommend that the trim profile should be symmetric and located at some statistically meaningful limits such as 95% confidence limits or ±1.96 SD from the mean.13 This approach loses efficiency when dealing with an asymmetric distribution where the mean value will not represent the central tendency. In these cases, an alternative would be to apply the same limits to the median value, being the central point of a skewed distribution. Note, as the distribution becomes less skewed, the median and mean collapse onto each other justifying such an approach. However, in cases in which bi-modal PDFs exist, the median also provides a poor representation of central tendency of the data set. We propose another approach to trimming skewed data, such that the trim profile is determined from the PDF, where its upper limit is set to the 95th percentile (1.96 SD), and the lower limit is set to have a common area under the curve (integral). This leads to trim limits of 0.78 and +1.96 standard deviations for the PDF, as in Figure 1A . This window should yield optimal performance with skewed data, especially in the presence of sensitivity losses due to disease.
In terms of our previous example, the bi-modal data set returns a mean of 12.6 and an 86th percentile of 30, with an SD of 15.7. It is obvious that trimming around either of these values using the 95% confidence interval (±1.96; SD ±30.8) will yield the same outcome, due to the large SD. However, applying our asymmetric trim window gives limits of 0.3 dB (12.6 0.78 · 15.7) and 43.3 dB (12.6 + 1.96 · 15.7) removing the low outliers and returning a trimmed mean of 29.3 dB (30, 30, 28). Although in this example, a robust mean was returned after a single trimming procedure, we find an iterative process is needed before the mean changes by less than 2 dB in most applications (Figs. 2A 2C) .
|
The purpose is to define a suitable tuning constant. The starting point is to use the normal PDF because this emphasizes data that fall within the region of interest and thus de-emphasizes outliers. The literature describes weighted measures as Horvitz-Thompson estimators and where weighting is based on the PDF, as a Hajek estimator.15 Determination of the Hajek estimate (Mw) is given by equation 3 .
![]() | (3) |
We evaluate the accuracy of the trimming and weighting algorithms by comparing their estimates to the modal level or the 86th percentile of the normal PDF. We also compare the capacity of the various methods to extract normal data and to return sensible estimates in clinical cases in which the general height and MD have been shown to fail.1 8
Results
Our data show that the Hajek estimate provides a robust outcome of perimetric data except for extremely large defects, for which the quadratic of the weighting function performs better. The outcomes of the algorithms are compared in Figure 2 . Figures 2A and 2C show the mean returned from a trimming process (in decibels) as a function of the iteration step, whereas Figures 2B and 2D show the W estimate (weighted mean) for various tuning constants. The leftmost symbols in each panel identify the mean for the entire data set, as might be specified by many modern perimeters. As expected, this mean value gets progressively reduced with increasing defect severity. Figures 2A and 2C refer to the trimming procedure, with Figure 2A showing the outcome for the three clinical PDFs and Figure 2C giving the results for generalized depressions in sensitivity and high false-positive (40%) responses. Figure 2A shows that trimming reaches the
2 dB criterion by two to four iterations (Fig. 2 , asterisk) in mild (large open circles), moderate (filled circles), and severe (small open circles) defects. However, it is only in mild defects that the procedure returns a robust estimate of the true threshold (solid horizontal line). Figure 2 shows that the presence of 40% false-positive responses (Fig. 2C , shaded squares) fails to give the desired outcome with trimming where the procedure asymptotes on to the false positive mean of 45 dB; and although it rapidly extracts the real threshold with mild generalized depressions, it requires many iterations with severe generalized losses.
The results for the W estimate are shown in the Figures 2B and 2D . Figure 2B shows the effect that various tuning constants have on the W estimate. In all cases, except in the severe defect group, the W estimate provides a robust statistic of the underlying mean. However, uncertainty (error) of the estimate is large with moderate (filled circles) and severe (small open circles) cases for the Hajek-estimate (c1 tuning constant). Weighting with a quadratic tuning constant (c2) yields not only robust estimates but also reasonable errors in these values. Higher-order power functions fail to give any greater advantage beyond the quadratic. The results of the simulations indicate that the W estimate asymptotes onto the 60th percentile value of the normal PDF.
The simulations indicate that the presence of high false-positive rates can modify outcomes as can severe loss, both of which challenge the adoption of the 86th percentile for this purpose. Indeed, Blumenthal and Sapir-Pichhadze8 recognized this limitation in their paper, in which they describe the problem of extracting a summary index in far-advanced glaucomatous field losses.
Discussion
We have detailed a mathematical procedure that can be used to return values unaffected by outliers, typical of diseased eyes. We described how a robust estimate of central tendency can be derived using a weighted mean (Figs. 2B 2D) . The method is superior to that based on visual inspection by the clinician, as it removes any intrinsic biases introduced by subjective assessment. This W index is robust to false-positive responses and at least 12 dB of generalized depression, as might be found with severe cataract or refractive error, and although we have applied the procedure to the entire field, we feel that similar applications on a regional basis may detect false responses better and provide added diagnostic information. Although we have simulated losses in our study, the method has the potential to flag the presence of supernormal thresholds, as well. Finally, it would be useful to compare the methods described in this communication to those presently being used, to emphasize their benefits, and to consider practical implementations.
Practical Applications
In Figure 3 , the candidate methods have been compared to the mean and 86th percentile values of the normal PDF. Figure 3A summarizes our simulation findings for all candidate methods, to show that the trim mean (circles) and the 86th percentile (squares) provide useful indices that become corrupted by the presence of false positives and moderate or severe defects. In comparison, the W indices (Fig. 3A , triangles) provide a robust statistic in all cases except for the Hajek estimate (c1 tuning constant) with severe defects. As an aside, one benefit of these procedures is that the normal values identified by these methods can be indicated to clinicians to guide their attention to the unaffected data to detect early change. The simulations imply that the power tuning constant (c2) gives better capacity for very severe losses. In the following, we consider a practical application of these methods, as earlier we argued that the full range of factors used in simulation may not manifest in clinical patients.
|
The Hajek estimate could be easily adopted by current perimeters. Although the simulations show that a quadratic tuning constant (c2) is most robust to the presence of severe defects (Fig. 3A) , there is a tradeoff between adopting the higher-order tuning constants for robustness and an overexpression of abnormality, as the kurtotic nature of these higher-order tuning constants yields a limited normal domain. Given the findings that we obtained from clinical data, we favor the adoption of the c1 tuning constant (Hajek). The effect that the shape of the tuning constant has on disease detection in clinical patients needs further consideration.
Finally, further to the observation of Åsman et al.1 that there is a need for the development of novel and robust methods for data extraction that can be used to return the general height of visual field data, we detail such in this communication. As suggested by them, this approach can also be used with other clinical tests such as optic nerve indices or laboratory assays to yield robust means in the presence of outliers. Here, some prior knowledge of expected outcomes, as may be provided by a pilot trial, is needed to generate a PDF that can then be applied as the tuning constant. As experimental data are more likely to be homogeneous rather than bi-modal, we propose that a Hajek estimate should be returned, to give robust outcomes of an unbiased data set.
Footnotes
2 Present affiliation: Department of Ophthalmology and Visual Science, The University of Chicago, Illinois. ![]()
Supported by Australian Research Council Grant ARC-LP0211474.
Submitted for publication August 23, 2004; revised February 21 and August 24, 2005; accepted October 21, 2005.
Disclosure: A.J. Vingrys, None; A.J. Zele, None
Corresponding author: Algis J. Vingrys, Department of Optometry and Vision Sciences, The University of Melbourne, Victoria, 3010, Australia; algis{at}unimelb.edu.au.
References
This article has been cited by other articles:
![]() |
J. Landers, A. Sharma, I. Goldberg, and S. Graham A comparison of global indices between the Medmont Automated Perimeter and the Humphrey Field Analyzer Br. J. Ophthalmol., October 1, 2007; 91(10): 1285 - 1287. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |