|
|
||||||||
From the Research Group in Eye and Vision Science School of Medicine, University of Manchester, United Kingdom.
| Abstract |
|---|
|
|
|---|
METHODS. Probability theory was applied to examine various suprathreshold pass criteria (i.e., the number of stimuli that have to be seen for a test location to be classified as normal). A suprathreshold strategy that requires three seen or three missed stimuli per test location (multisampling suprathreshold) was selected for further investigation. Simulation was used to determine how the multisampling suprathreshold, conventional suprathreshold, and full-threshold strategies detect localized field loss. To determine the systematic error and variability in estimates of loss area, artificial fields were generated with clustered defects (025 field locations with 8- and 16-dB loss) and, for each condition, the number of test locations classified as defective (suprathreshold strategies) and with pattern deviation probability less than 5% (full-threshold strategy), was derived from 1000 simulated test results.
RESULTS. The full-threshold and multisampling suprathreshold strategies had similar sensitivity to field loss. Both detected defects earlier than the conventional suprathreshold strategy. The pattern deviation probability analyses of full-threshold results underestimated the area of field loss. The conventional suprathreshold perimetry also underestimated the defect area. With multisampling suprathreshold perimetry, the estimates of defect area were less variable and exhibited lower systematic error.
CONCLUSIONS. Multisampling suprathreshold paradigms may be a powerful alternative to other strategies of visual field testing. Clinical trials are needed to verify these findings.
In suprathreshold perimetry, stimuli are presented above the estimated detection threshold of a normal visual field location. If the patient responds, it is assumed that the corresponding test location does not have significant loss. In normal observers and those with early glaucoma, most stimulus presentations occur well above threshold, and the observer may be less uncertain of how to respond. Suprathreshold tests may therefore be easier tasks to perform with these patients, who often have little or no experience with perimetry. Although it is widely accepted that suprathreshold tests may be less sensitive to shallow visual field loss than threshold tests, they have often been used in epidemiologic screening10 11 12 13 and are routinely used in primary eye care.14 15 As with threshold perimetry, the results of conventional suprathreshold tests exhibit large testretest variability in patients with glaucoma.16
In this study, we address the issue of the sensitivity and test variability in suprathreshold perimetry. Using probability theory, we examine the influence of the pass criterion (i.e., the number of stimuli that have to be seen for a test location to be classified as normal) on a suprathreshold tests performance. We propose an optimized suprathreshold strategy, based on a pass criterion designed to reduce variability and improve sensitivity, without sacrificing specificity. The performance of this multisampling suprathreshold strategy in detecting visual field loss and quantifying its area is evaluated by computer simulation and compared with that of the full-threshold and conventional suprathreshold strategies.
| Methods |
|---|
|
|
|---|
Equation 1 gives the function for obtaining exactly k responses to n stimuli (Pn(k)), given the probability p of a response. (The ! symbol stands for the factorial expansion.) The classification function (i.e., the probability of obtaining at least k responses to n stimuli (Fn(k)) is the cumulative form of this function (equation 2) .
![]() | (1) |
![]() | (2) |
We derived classification curves for suprathreshold examinations with 1 to 5 stimulus presentations per test location (see the Results section). From several alternatives, we selected a pass criterion that demands three seen stimuli (of up to five presentations at each location, i.e., 3/5) for subsequent simulations. The suprathreshold strategy based on this criterion is referred to as a multisampling suprathreshold.
Evaluation by Computer Simulation
Computer simulations were performed to derive measures of test performance for the multisampling suprathreshold strategy, as well as for two established strategies for comparison (described in the Perimetric Strategies subsection).
The stimulus matrix used in all simulations corresponded to that of the 24-2 test of the Humphrey Field Analyzer (HFA; Zeiss-Humphrey Systems, Inc., Dublin, CA), excluding the two most nasal locations and that of the blind spot. At each of the 51 test locations, the average age-corrected sensitivity was calculated from normal control data (see the Patient Data section) to establish a reference field. Normal visual fields were simulated by varying the reference field through adding a constant value to the sensitivity at each location. This value was a random number from a normal distribution with a mean of 0.0 and a standard deviation of 1.5 dB. Point-wise normative limits for 5% pattern deviation probability were derived by simulating normal visual fields with the full-threshold strategy (empiric 5th percentiles of 10,000 simulated estimates). Pattern deviation was defined as the deviation of the test location relative to the reference field, corrected for overall elevation or depression (estimated by the 85th percentile of total-deviation values, i.e., the seventh least-defective location in the field).
The simulation software was written in commercial software (Delphi, ver. 4; Borland Software Corp., Scotts Valley, CA). In simulating a visual field test, a perimeter unit would repeatedly pass references of a stimulus (i.e., location and intensity) to a patient unit, which would then return a "seen" or "not seen" response, based on sensitivity and response variability at the given location and on the likelihood of a false-positive or -negative patient error.
The model for response variability used in the simulations has been described previously.17 It was established from frequency-of-seeing (FOS) data, from which sensitivity (50%-seeing threshold, in decibels) and response variability (slope of FOS curve, standard deviation in decibels) parameters were estimated by probit analysis. The log-transformed response variability varied linearly with sensitivity, independent of stimulus eccentricity and age.
Perimetric Strategies
Both suprathreshold strategies were threshold relatedthat is, the stimulus intensities were adapted to the general height (GH; defined as the 85th percentile of all sensitivity estimates in a visual field) of the simulated patients visual field, using an algorithm designed to give more precise estimates than that used in the suprathreshold programs of the HFA.18 19 In brief, six stimuli were presented in a 1-dB updown staircase at each of four midperipheral seed locations, and the GH was estimated from an average of the four last stimulus levels. To guard against the influence of localized and diffuse losses affecting the seed locations, only staircases with two or more reversals were averaged. The test intensity would be set to the age-setting if fewer reversals had occurred. The suprathreshold increment was 5 dB in both suprathreshold strategies.
Multisampling Suprathreshold Strategy.
Based on the analysis of classification curves, a criterion of three seen or three missed stimuli was chosen to classify a location as normal or defective, respectively. This requires between three and five stimuli to be presented at each location (pass criterion 3/5).
Conventional Suprathreshold Strategy.
At each location, a single stimulus is presented. If there is no response, the presentation is repeated, and locations are classified as defective only if both presentations were missed (pass criterion 1/2).
Full-Threshold Strategy.
The full-threshold strategy1 is a staircase algorithm with a step size of 4 dB, reduced to 2 dB after the first response reversal. Sensitivity is estimated by the intensity of the last seen stimulus after the second reversal. Similar to the implementation in the HFA, our simulations commenced testing at four seed locations in the midperiphery of the visual field, and the remaining staircases were started 3 dB above or below the expected value (estimated as the average sensitivity at neighboring points). For consistency between the three strategies, we did not, however, repeat staircases when estimates differed by more than 4 dB from the expected value, and no double determinations were performed to estimate the short-term fluctuation.
Patient Data
The three samples of visual field data used in this study were as follows: (1) Data from normal control subjects were used to establish a normal reference field of age-normal sensitivity estimates on which normal visual fields were modeled. (2) Testretest full-threshold results from patients with glaucoma (complete glaucoma sample) were used to establish the validity of the response variability model on which our simulations are based.17 (3) From the latter sample, we selected results with repeatable early to moderate defects (glaucoma subsample) as end points for simulations to investigate the sensitivity of the three strategies to increasing levels of visual field loss.
Normal Control Sample.
This sample consisted of 109 full-threshold tests from 109 normal control subjects from a prospective study at the Department of Ophthalmology, Dalhousie University, Halifax, Canada.20 Inclusion criteria were normal findings in an ocular examination, visual acuity of 6/9 or better, a negative family history of glaucoma, intraocular pressure 19 mm Hg or less and previous experience with full-threshold perimetry.
Complete Glaucoma Sample: Normal to End-Stage Glaucomatous Field Loss.
Testretest full-threshold data from 190 glaucoma patients were collected at the Manchester Royal Eye Hospital.21 Patients were included if the visual acuity was 6/18 or better, and if at least one full-threshold test had been performed earlier. A history of unreliable test performance did not lead to exclusion. Patients underwent two HFA 24-2 full-threshold tests, administered by the same examiner using the same instrument, during separate sessions within 2 weeks. Both eyes were examined in 152 patients and one eye in 38 patients, yielding 342 pairs of testretest results. The mean deviation (MD) ranged from +2.6 dB to -28.9 dB (median, -3.0 dB). We ignored a small learning effect (testretest MD difference, mean -0.4 dB; 95% confidence interval [CI; -0.5 to -0.2 dB]).
Glaucoma Subsample: Early to Moderate Visual Field Loss.
A subsample of 113 testretest pairs of 90 patients was selected from the complete glaucoma sample. Tests were included if, on both tests, the MD was better than -10.0 dB and the glaucoma hemifield test (GHT) result was either outside normal limits on both occasions (n = 71) or outside normal limits on one occasion and borderline or "general reduction of sensitivity" on the other (n = 42). The MDs ranged from +1.5 to -9.5 dB (mean, -4.2 dB). Figures 1a 1b 1c show examples of either end of this spectrum.
|
Simulation 2: Comparing the Tests Sensitivity to Localized Field Loss.
Each field of the subsample (averaged pairs with early to moderate field loss) was set to represent the end point (loss stage 1) of a deteriorating series of fields. The start field (loss stage 0) was a normal field with the same GH as the end point (loss stage 1). Between series, the GH of the fields varied in the same way as for the simulated normal fields from which the normative pattern deviation limits had been generated. Intermediate levels of loss (0.1, 0.2... 0.9) were calculated by linear interpolation between the sensitivities of the starting and ending fields. For each level of loss, we simulated 3000 test results with each strategy. Criteria for abnormality were set to give specificities above 95% when normal fields were simulated (Table 1) .
|
Simulation 3: Estimating Defect Area.
Clustered visual field defects (depth, 8 and 16 dB), extending over 1 to 25 locations of the stimulus matrix, were superimposed on simulated normal visual fields. Areas of loss did not extend across the horizontal midline. For each defect size, the median and the 5th and 95th percentiles of the estimated number of defective test locations (pattern deviation probability beyond the 5% level) were derived from 1000 simulations.
| Results |
|---|
|
|
|---|
|
The performance of a near-ideal paradigm is illustrated by the family of classification curves in Figure 2a , derived from the binomial distribution with n = 1000 presentations and pass criteria (k) of 250, 500, and 750. At normal test locations, where the probability of a response is high, the likelihood of being classified as defective is virtually zero. However, at damaged test locations, where the probability of a response is lower than some arbitrary choice of cutoff, the classification curves rapidly approach 1. These curves suggest that the near-ideal paradigm provides for a highly sensitive and highly specific test. Its outcomes would be robust to large proportions of false-negative and -positive response errors and independent of the response variability of the visual field. Such performance, however, is far removed from that attainable in clinical perimetry, in which the number of responses collected is limited by the subjects attention span, the time available for examinations, and the number of test locations.
The diagonal lines in Figures 2a 2b 2c 2d 2e represent the classification curve of a suprathreshold test based on presentation of only a single stimulus (pass criterion 1/1). This pass criterion fully reflects the response variability of the visual field location (slope of the psychometric function), and it is equally affected by false-positive and -negative errors on the part of the patient. To avoid false-positive test results, the conventional suprathreshold strategy repeats presentations if a stimulus has been missed (pass criterion 1/2). The classification curve of this criterion (Fig. 2b , lower curve) is convex downward. Toward the right limit of the abscissa, the 1/2-curve runs very low, reflecting the fact that this criterion is fairly tolerant against occasional false-negative response errors. However, the 1/2 curve runs below the diagonal even at low response probabilities (toward the left limit of the abscissa, i.e., where genuine visual field loss is likely), illustrating that the tolerance to occasional lapses (specificity) is obtained at the cost of sensitivity. This particular tradeoff would be reversed if the 2/2 criterion were chosen (Fig. 2b , upper curve)that is, by presenting two stimuli at each location and by flagging as defective any test location at which one or both presentations were missed. Such a test would have high sensitivity, but poor specificity.
The classification curves steepen when more stimuli are presented, reflecting the inverse relationship between sample sizes and the variability of estimates (Figs. 2c 2d 2e) . Although several different pass criteria could have been chosen to increase the sensitivity beyond that achieved with the conventional suprathreshold criterion (e.g., 2/3, 3/4, 4/5), the 3/5 criterion was the most attractive. The lower extreme of its classification curve was almost indistinguishable from that of the 1/2 criterion (promising good specificity), whereas its upper part was substantially higher (promising better sensitivity). Although the classification curve of the 2/3 criterion appeared similar, its lower extreme ran a little higher, suggesting a slightly lower specificity (Fig. 2f) . We therefore chose to explore the performance of the 3/5 criterion in our simulations.
Results of Simulation Experiments
Results of Simulation 1: Validating the Model.
Figure 3 shows the testretest intervals of full-threshold perimetry, established both from real data and from simulated test results. The variability of full-threshold estimates is well documented. Whereas the intervals are moderate for high sensitivities, they increase considerably when sensitivity is lower.2 5 Although the simulations tended to overestimate variability of high sensitivity estimates and to underestimate variability in the midrange, the good overall agreement between real and simulated intervals validated the response behavior model of our simulations.
|
|
Results of Simulation 3: Estimating Defect Area.
Figure 5 shows the variability (length of the reference intervals) and the systematic error (position of the median with respect to the diagonal) in estimates of defect area with the three strategies, for enlarging defects of constant depth.
|
| Discussion |
|---|
|
|
|---|
The sensitivity of a diagnostic test is closely related to its variability in normal and abnormal populations. To achieve a high level of specificity, a variable test requires a strict criterion for what constitutes a positive result, which in turn reduces its sensitivity. The disparity between the performance curve of the ideal scenario (no response variability and no between-subject variability in the shape of the visual field) and those of the three strategies highlight the delay between the occurrence of visual field loss and its reliable detection by clinical tests. In a visual field test, abnormality is only then statistically detectable if it exceeds the noise level of variability inherent in psychophysical examinations.22 Current techniques for analysis of threshold data, for example, rely on statistical significance testing to classify individual test locations for total and pattern deviation probability maps. The simulations show that this approach consistently underestimates the area of localized visual field loss.
Threshold tests demand a high level of attention, are perceived as demanding by novice subjects, and usually require some level of training before reliable results are obtained.6 7 8 9 Although it might be speculated that suprathreshold tests are somewhat easier for the patient to perform, the conventional suprathreshold strategy demands only a single stimulus to be seen for a location to be classified as normal. It is therefore highly sensitive to false-positive patient errors, and the small samples of responses collected lead to unacceptable test variability when fields are defective.16
The simulations support our hypothesis that, with increased sampling, a suprathreshold strategy may perform as well as a threshold test in detecting visual field loss and perhaps is better at quantifying its spatial extent. By sacrificing information on the depth of a defect (which is difficult to obtain, owing to the large response variability in damaged areas), multisampling suprathreshold strategies may obtain better information on its size and location. This performance gain with respect to the conventional suprathreshold technique comes at the cost of a larger number of stimulus presentations. However, most tested locations (whether normal or defective) will only require three presentations, and the largest number of presentations (5) of the 3/5-multisampling criterion sets an upper limit that compares well with modern threshold techniques.
Simulations often simplify reality. Their simplicity confers both benefit and bane. A system may be studied quickly and in the absence of confounding factors, but the results may never reflect the entire complexity of real life. For example, our model of response variability did not account for the possible increase in variability between test sessions. Because it was based on constant-stimulus data of patients who had previously met the conventional reliability criteria (<33% false-positive and -negative response errors, <20% fixation losses), we may underestimate the variability in patients who respond poorly, or overestimate the variability inherent in suprathreshold paradigms. Furthermore, there were minor differences between the implementations of the full-threshold strategy in our simulation and in the HFA. However, the good agreement between simulated and real patient data suggests that our simulations account for most of the variability observed in real visual field tests and that different implementations of the full-threshold strategy do not greatly influence the results. Although several assumptions made for simulating normal and damaged visual fields (no variations in the shape of the visual field in normal subjects, purely localized visual field loss) are not met in clinical populations,23 24 they were identical for all simulated strategies and are therefore unlikely to affect any comparison between them.
Our simulations explored the detection and quantification of localized loss. Diffuse loss, even though it may be common in glaucoma, is not pathognomonic to the disease. By design, threshold-related suprathreshold paradigms are insensitive to diffuse visual field loss, because the test intensity is adapted (within given limits) to the estimated general height of the visual field. However, the GH estimate is itself an important index for diffuse reduction of visual field sensitivity.
Computer simulations cannot replace clinical trials, but they are well suited to precede and complement them. Our results indicate that multisampling suprathreshold techniques may be a powerful alternative to other, established strategies of visual field examination. Clinical trials are now needed to verify these results with real patients.
| Acknowledgements |
|---|
| Footnotes |
|---|
Submitted for publication October 8, 2002; revised November 16, 2002; accepted January 10, 2003.
Disclosure: P.H. Artes, None; D.B. Henson, None; R. Harper, None; D. McLeod, None
The publication costs of this article were defrayed in part by page charge payment. This article must therefore be marked "advertisement" in accordance with 18 U.S.C.
1734 solely to indicate this fact.
Corresponding author: Paul H. Artes, Department of Ophthalmology, Dalhousie University, QEII Health Sciences Centre, 1278 Tower Road, Halifax, Nova Scotia, B3H 2Y5 Canada; paul_h_artes{at}yahoo.co.uk.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
A. Turpin, D. Jankovic, and A. M. McKendrick Retesting Visual Fields: Utilizing Prior Information to Decrease Test-Retest Variability in Glaucoma Invest. Ophthalmol. Vis. Sci., April 1, 2007; 48(4): 1627 - 1634. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |