|
|
||||||||
1From the School of Computer Science and Information Technology, RMIT University, Melbourne, Australia; and the 2Department of Optometry and Vision Sciences, University of Melbourne, Carlton, Victoria, Australia.
| Abstract |
|---|
|
|
|---|
METHODS. Computer simulation was used to determine the error, distribution of errors and presentation count for a series of perimetric algorithms. Baseline procedures were Full Threshold and Zippy Estimation by Sequential Testing (ZEST). Retest strategies were (1) allowing ZEST to continue from the previous test without reinitializing the probability density function [pdf]; (2) running ZEST with a Gaussian pdf centered about the previous result; (3) retest minimizing uncertainty (REMU), a new procedure combining suprathreshold and ZEST procedures incorporating prior test information. Empiric visual field data of 265 control and 163 patients with glaucoma were input into the simulation. Four error conditions were modeled: patients who make no errors, 15% false-positive (FP) with 3% false-negative (FN) errors, 15% FN with 3% FP errors, and 20% FP with 20% FN errors.
RESULTS. If sensitivity was stable from test to retest, all the retest algorithms were faster than the baseline algorithms by, on average, one presentation per location and are significantly more accurate (P < 0.05). When visual fields changed from test to retest, REMU was faster and more accurate than the other retest approaches and the baseline procedures. Relative to the baseline procedures, REMU showed decreased testretest variability in impaired regions of visual field.
CONCLUSIONS. The obvious approaches to retest, such as continuing the previous procedure or seeding with previous values, have limitations when sensitivity changes between tests. REMU, however, significantly improves both accuracy and precision of testing and displays minimal bias, even when fields change and patients make errors.
Many perimetric strategies incorporate population information regarding likely thresholds. For example, the Swedish Interactive Thresholding Algorithm (SITA) of the HFA maintains two probability functions based on population data: one representing the probability of each possible outcome, assuming the test location is abnormal, and the other assuming the location is normal.2 When SITA terminates, the mode of the probability functions is returned as the sensitivity estimate. Similarly, the Zippy Estimation by Sequential Testing (ZEST) procedure commences with a probability density function (pdf) that typically represents the distribution of population sensitivity.3 4 A notable exception is the implementation of ZEST within a perimeter (Matrix; Carl Zeiss Meditec, Inc.) which commences with a pdf that assigns equal probability to all possible outcomes, to avoid having the prior pdf bias the final sensitivity estimate.5 Population information can also be used to influence the selection of the starting intensity for staircase procedures.
As well as incorporating population-based knowledge into perimetric test procedures, it may also be beneficial to include information from a given patients previous tests. However, as perimetric results often display high testretest variability, particularly in areas of visual field damage,6 7 automatically using previous sensitivity estimates to seed subsequent tests may counterproductively increase test times or measurement error. Furthermore, if there is a change in visual field sensitivity, relying too heavily on previous estimates may hamper detection of such change. It is not readily apparent whether greater advantages arise from seeding perimetric tests with population information or individual previous test information.
We used computer simulation to explore test procedures developed for retesting patients. Although we investigated hundreds of procedure variations, we report only the best performers herein. Specifically, we used a Bayesian procedure (ZEST), as Bayesian-like procedures are common in perimetry (for example, the SITA family of algorithms used in the HFA, and the ZEST procedure itself, used in the Medmont perimeter [Medmont Pty., Ltd., Camberwell, Australia] and the Matrix [Carl Zeiss Meditec]). We compared the performance of retest strategies with the error and precision obtained by simply rerunning the initial test strategy. We also explored the utility of a retest modification of a combined suprathresholdthreshold procedure that we have described previously, Estimation Minimizing Uncertainty (EMU).8 When modified for retesting visual fields, it is called Retest Minimizing Uncertainty (REMU). We sought to find a retest algorithm that enables rapid determination of sensitivity estimates with significantly improved accuracy and reduced testretest variability. Priority was given to reducing variability, in preference to simply saving test time. We sought for a similar average number of presentations per location as the SITA Standard algorithm, as SITA Standard is considered of generally acceptable test length in practice.
| Methods |
|---|
|
|
|---|
Two initial test procedures (Full-Threshold [FT] and ZEST) and three retest procedures (ZEST Continue [Z-Cont], ZEST Gaussian [Z-Gauss], and REMU) were incorporated in the simulation.
Test Procedure 1: Full Threshold.
FT is a staircase algorithm that has been largely replaced by the SITA family of algorithms.1 We included FT, as the full details of SITA are not available in the public domain. According to the developers of SITA Standard, it was designed to have similar testretest characteristics as FT but to terminate more quickly, a development goal that appears to have been met based on clinical comparisons of these test procedures.6 7 9 10 11 Consequently, we have included FT to provide a surrogate comparison to the testretest performance of SITA Standard.
FT commences with 4-dB luminance changes until the first response reversal (seeing to nonseeing or vice versa). The step size is then reduced to 2 dB. After two reversals the procedure terminates and sensitivity is estimated as the "last seen" intensity. For each location, the starting estimate of FT is determined according to a "growth pattern." The growth pattern used herein was the same as we have described previously (illustrated in McKendrick and Turpin,8 Fig. 1 ). If the measured estimate differs from the starting estimate of FT by more than 4 dB, a second staircase is commenced, using the first returned estimate as the starting intensity for the second staircase. In this situation, the HFA reports both estimates with no instructions on how they should be interpreted and does not use the second estimates when calculating the mean deviation (MD) or pattern standard deviation. Herein, we report the first estimate.
|
|
0.8 dB, as this provided a similar average number of presentations per location as SITA Standard.
Retest Procedure 2: Z-Gauss.
Z-Gauss is a ZEST procedure in which the prior pdf for a given location is a Gaussian centered on the sensitivity estimate returned at the first test visit. Gaussian standard deviations from 2.0 to 3.5 dB in steps of 0.5 dB were tested. Z-Gauss was terminated when the standard deviation of the pdf was
1.5 dB. Details for a pdf standard deviation of 3.0 dB are reported as this provided a similar average number of presentations per test location as SITA Standard. A further modification of the Z-Gauss procedure involved a standard deviation that varied with the sensitivity estimate of the first test according to the "combined" formulas proposed in Table 1 of Henson et al.,13 but capped to a maximum of 5 dB. This procedure is referred to as Z-Gauss-H.
|
REMU uses a quick suprathreshold test to check whether the sensitivity has decreased from the previous test as shown in steps 4.2.1 through 4.2.4 of Table 1 . Steps 1 through 3 in the procedure are important, because any change in general height since the previous test would yield an inaccurate setting of the suprathreshold values used in Step 4.2.1. The check in step 4.1 filters out locations where either there was genuine damage or a mistake was made in the previous test resulting in low sensitivity. In either case variability is high, and a short suprathreshold test at that location is not advisable. In step 4.1, the Gaussian pdf is located around the previous sensitivity adjusted by GH, whereas in step 4.2.5, the pdf is located 2 dB below the suprathreshold stimulus value.
Error Models Entered into the Simulation
To assess the effect of erroneous responses on the test procedures, four error models were applied.
Visual Fields Input to the Simulation and Change to Fields Incorporated at Retest
To examine the performance of the procedures on real visual fields, 265 normal visual fields and 163 glaucomatous visual fields were used as input (FT algorithm, 24-2 spatial pattern). The fields were collected for a previous study, at which time written informed consent, in agreement with the tenets of the Declaration of Helsinki, was obtained from the subjects to have their visual field data kept in a deidentified database for further research purposes. Normal patients were aged 47 ± 16 years and glaucomatous patients were aged 61 ± 13 years. Within the glaucomatous group, the visual field deficits ranged from mild to severe (median MD = 1.81 dB, 5th percentile = +2.14 dB, and 95th percentile = 22.55 dB). Visual fields were age corrected to 45 years, altered by 1 dB per decade. The locations adjacent to the blind spot (15, ± 3°) were excluded from analysis.
Three change conditions were applied to the whole visual fields to assess the performance of the retest procedures:
We also assessed performance when the whole-field sensitivity varied by ±2, ±4, and ±6 dB. The results for the intermediate ±3-dB case are reported herein. The ability of the algorithms to cope with diffuse change was assessed as diffuse variation in sensitivity is common and may reflect causes such as media opacification and nonvisual factors such as anxiety, learning effects, fatigue, or attention.
Performance was also assessed for locations within a simulated deepening scotoma. An artificial visual field was used (Fig. 1) with a scotoma that progressively deepened at each visit. We have demonstrated that FT and Staircase-Quest (an algorithm incorporating those aspects of SITA that appear in the public domain) have increased variability when the starting estimate provided to the procedure is inaccurate,3 which is most likely to arise on the edge of a scotoma. The artificially progressing fields were not intended to model glaucomatous progression per se, but to explore performance for a known situation where FT (and presumably SITA) performs suboptimally. The field consisted of a true sensitivity of 33 dB for all locations except those labeled A, B, and C in Figure 1 . The sensitivity at these three locations was decreased by 3 dB per visit to create a sequence of eight visual fields containing an isolated deepening scotoma. The sensitivity of the remainder of the visual field was stable.
Simulations were run 1000 times for each test procedure, error response model, and visual field change per visit.
| Results |
|---|
|
|
|---|
|
Figure 3 demonstrates that when sensitivity was stable from one test to the next, all the retest procedures were faster than the test procedures, with some reduction in the MAE and spread of error. Statistical comparison (ANOVA) resulted in significant differences in the MAE (defined as P < 0.05 on post hoc Holm-Sidak testing) between almost all the test procedures (no-error condition: all procedures significantly different with the exception of Z-Gauss and REMU for patients with glaucoma, and ZEST and REMU for normal subjects; typical FN errors: all procedures different within both groups; typical FP errors: all different except ZEST and REMU for glaucoma group; unreliable: all significantly different except Z-Cont and REMU for glaucoma and Z-Gauss and REMU for normal subjects). Inspection of Figure 3 demonstrates that, in most cases, although statistically significant, the magnitude of the differences was small.
Comparison of All Procedures for a Uniform Change in Sensitivity across the Whole Visual Field
Figures 4 and 5 show the performance of the test and retest strategies when the entire visual field sensitivity was either reduced (Fig. 4) or elevated (Fig. 5) by 3 dB. The figures are plotted in the same format as Figure 3 .
|
|
Figures 4 and 5 demonstrate that Z-Gauss (up triangles) performed better than Z-Cont and, in general, displayed better performance (similar or reduced error and faster test time) than simply repeating the test strategy ZEST. Z-Gauss performed similarly to REMU when there was a whole-field improvement in visual field sensitivity (Fig. 5) , but was substantially slower to terminate when there was a reduction in sensitivity (Fig. 4) . The asymmetry in the performance of Z-Gauss was due to the truncation of the pdf at the top end of the dynamic range (40 dB). Statistical comparison of the presentations required to terminate demonstrated that REMU terminated faster on average than Z-Gauss, and that Z-Gauss terminated faster than ZEST in all error conditions, for both whole-field increases and decreases in sensitivity (ANOVA, Holm-Sidak post hoc testing; P < 0.05). Although statistically significant, it is important to consider whether the magnitude of the difference is likely to be clinically significant. When averaged across all error conditions, when the whole field decreased by 3 dB, ZEST required approximately seven presentations to terminate, Z-Gauss approximately six, and REMU approximately five. When sensitivity increased by 3 dB, ZEST required approximately 6.5 presentations on average, Z-Gauss approximately 4.5, and REMU approximately four. For comparison, FT terminated using between five and six presentations for both the whole-field increase and decrease in sensitivity. For most error conditions, the MAE of REMU and Z-Gauss was not different (P > 0.05). REMU was more accurate on average than Z-Gauss in the presence of FN errors or unreliable performance if there was a whole-field decrease in sensitivity (P < 0.05, Holm-Sidak post hoc testing). The magnitude of the difference was unlikely to be of clinical significance.
Performance of REMU as a Function of Input Sensitivity
The data in Figures 3 to 5 are pooled across the entire visual field; however, it is well known that testretest variability of visual field algorithms is greatest when visual field sensitivity is reduced.6 7 Increased variability is expected, as it has been established that the slope of the psychometric function for white-on-white perimetric stimuli decreases with reducing perimetric sensitivity.13 Accurate and precise thresholds can be determined in the presence of such variability; however, a larger number of presentations are required.8 This fact is critical to the design of the REMU test procedure: a minimum number of presentations are expended in normal test locations, to enable more presentations to be used in abnormal locations. To explore this more thoroughly, Figure 6 shows box plots of the mean errors of our (on average) best performing retest algorithm (REMU) as a function of the true input to the simulation for the situation where the whole field was stable (Fig. 6A) and where there was either a uniform increase or decrease in sensitivity. The x-axis of Figure 6 represents the known true sensitivity of the patient. This figure is a little different from similar figures reported in clinical studies, in which it is typical to plot the sensitivity measured at visit 1 against the sensitivity measured at visit 2, hence incorporating errors in both dimensions. For the ZEST and FT procedures, the errors derived from clinical testretest should, on average, be double those derived from measured versus actual, as the error distribution will be the same at test and retest. However, for REMU, the error for the initial visit will be that of ZEST, with a different error distribution at the retest visit when the REMU procedure is used.
|
Performance of the Retest Procedures When There Is a Localized Decrease in Sensitivity
The previous figures compare performance of the procedures when used twice (test then retest). A simulated localized scotoma was modeled that decreases in sensitivity by 3 dB on each of eight visits. At each visit, the retest procedures were seeded by the previous visit, enabling observation of whether compounding errors result when the visual field is changing. Figure 7 shows the performance of the test (FT and ZEST) and retest (Z-Gauss and REMU) procedures. It is important to note that the x-axis in Figure 7 should not be interpreted as necessarily linear or evenly spaced in scale. Z-Cont is not shown here, as this procedure was shown to perform poorly for whole-field sensitivity change. As shown previously,8 Figure 7 demonstrates that when patients make FP errors, the starting estimate for the FT procedure becomes an increasingly poorer predictor of true sensitivity each visit, hence the error increases. ZEST, Z-Gauss, and REMU are all able to track the change in visual field sensitivity, even when the patient is responding unreliably.
|
| Discussion |
|---|
|
|
|---|
REMU demonstrated the best overall retest performance. In our simulations, a ZEST procedure was initiated when the suprathreshold test in REMU was not passed. There is no reason why other thresholding algorithms cannot be substituted in this step. REMU is designed to minimize presentations in areas of previously measured normal sensitivity, yet expends more presentations in areas of sensitivity loss. These additional presentations enable accurate and repeatable estimates in areas of visual field loss. In particular, REMU demonstrated a substantially narrower range of outcomes for each true sensitivity than did either ZEST or FT when the true sensitivity was below approximately 15 dB (see Fig. 6 ). This observation has several important implications. First, it predicts that REMU will have reduced clinical testretest variability when compared with current commercially available algorithms, thereby enhancing detection of visual field change. Second, it demonstrates that a considerable proportion of the wide distribution of testretest variability measured with current strategies is likely to result from the algorithms used to measure threshold. In our model, response variability was increased with decreasing sensitivity but was fixed in magnitude for all low sensitivities. Consequently, if performing well, the procedures should display a fixed level of testretest variability across all low sensitivities as was demonstrated by REMU (see Fig. 6 ). This was not the case for either FT or ZEST.
The improved average performance of REMU comes at a small cost. REMU employs a suprathreshold check to see whether sensitivities that were previously in the age-matched normal range have decreased, and only fully determines these locations if they fail the suprathreshold test. If a location has sensitivity at or above age-matched normal and the sensitivity increases, then the suprathreshold test should be passed and the result reported as the previous value. That is, REMU is unlikely to detect localized improvements in sensitivity from an age-matched normal baseline. Whole-field improvements in sensitivity should be detected due to the general height check at stage 3 of the REMU algorithm. We see this as an acceptable tradeoff for the decrease in variability in areas of field loss. Of course, if a change in the visual field is expected, for example an improvement due to cataract extraction, then the standard algorithms can be run instead of REMU to create a new seeding field, and REMU can be run thereafter.
In this work we have explored ways of using previous sensitivity estimates to seed the current test. There is a wealth of prior test information that we have not used, such as individual location response sequences. How to make the best use of this information and whether it yields significant benefit is a topic of ongoing study in our laboratory. Other prior information includes the gradient of an individuals visual field. Many current test algorithms choose the starting estimate of threshold based on information from neighboring locations plus an eccentricity adjustment; for example, the growth pattern of both FT and SITA.1 The eccentricity adjustment can be customized to an individual using the gradient of their previous visual field, thus resulting in a more accurate starting estimate of threshold. We have performed experiments with this approach (data not reported) but found that for quite a bit of effort, the gain in performance was minimal.
The utility of computer simulation ultimately depends on how closely the model represents human performance. To this end, we have used empiric visual field data, and known rates of response variability collected in clinical populations. Although simulation studies have limitations, in the absence of computer simulation, it is impossible to assess the accuracy of perimetric procedures, as a patients true sensitivity can never be known with certainty. Simulation also enables us to explore in detail the performance of test procedures for situations that are less common, for which it is difficult to obtain large quantities of clinical data, yet which may have important implications if the algorithms perform poorly. Simulation is an essential precursor to clinical assessment of perimetric algorithm performance and has been used successfully for this purpose.2 3 4 8 14 15 We are currently collecting real patient data on the best-performing procedures described herein.
The experiments predict that sensitivity estimates from previous test visits can be used to obtain more accurate and repeatable visual field assessment outcomes at subsequent tests. The obvious approaches to retesting visual fields, such as continuing the prior procedure or directly seeding with previous values, have performance limitations when sensitivity changes from one test to the next. REMU, however, significantly improves both accuracy and precision of retesting perimetric sensitivity, and furthermore displays minimal bias, even when fields change and patients make errors.
| Acknowledgements |
|---|
| Footnotes |
|---|
Submitted for publication September 10, 2006; revised November 26 and December 14, 2006; accepted February 15, 2007.
Disclosure: A. Turpin, None; D. Jankovic, None; A.M. McKendrick, None
The publication costs of this article were defrayed in part by page charge payment. This article must therefore be marked "advertisement" in accordance with 18 U.S.C.
1734 solely to indicate this fact.
Corresponding author: Allison M. McKendrick, Department of Optometry and Vision Sciences, University of Melbourne, Corner of Cardigan and Keppel Streets, Carlton, Victoria 3053, Australia; allisonm{at}unimelb.edu.au.
| References |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |