|
|
||||||||
From Discoveries in Sight, Devers Eye Institute, Portland, Oregon.
| Abstract |
|---|
|
|
|---|
METHODS. The right-eye perimetric results of 100 subjects were analyzed. Subjects had visual acuities of 6/12 or better, no history of eye disease, and normal slit lamp biomicroscopic and ophthalmoscopic examinations. Subjects performed testretest visual field examinations on a Humphrey Field Analyzer (HFA) 24-2 test (Zeiss Humphrey Systems, Dublin, CA), and on a custom frequency-doubling (FD) perimeter with targets spaced in the same 24-2 pattern.
RESULTS. Testretest correlation (Spearman rank correlation coefficients, rs) for mean defect (MD) and pattern SD (PSD) were 0.65 and 0.40 (HFA), and 0.82 and 0.39 (FD perimeter). Three subjects with HFA MDs in the lower 5% had similarly low MDs on retest, whereas no subject was common between the test and retest for the lower 5% of HFA PSD. Correlation between the HFA and FD test results were 0.41 (MD) and 0.05 (PSD). Based on these correlations, the bias introduced into perimetric probability limits were determined, by using Monte Carlo simulations.
CONCLUSIONS. Although a criterion of a normal MD may produce a subpopulation with supernormal perimetric performance, a criterion of a normal PSD is less likely to do so. Also, a criterion on one test type is less likely to create a supernormal group on a different test type. The bias introduced into perimetric probability limits is small.
Investigators have stressed the importance of specifying study populations when evaluating clinical diagnostic tests,10 11 and subject inclusion criteria provide a means through which to do this. Inherent in the use of inclusion criteria for subjects in a study of normal observers is the assumption that a classification of "normal" is equivalent to that of "disease-free." Unfortunately, "normal" and "disease" may be part of a continuum, as in a disease process such as hypertension, thereby making the distinction between the two categories less clear. To avoid making this distinction, it is possible to create a perimetric database without any specified inclusion criteria for subjects. Ignoring the possible effects of unintentional recruitment bias,12 the resultant probability limits give the likelihood of a particular index value arising from the population as a whole (i.e., disease and disease-free observers), rather than from a group of normal observers. Such limits, however, would have a reduced sensitivity for detecting ocular disease, particularly once the prevalence of disease rose above the probability limit defining an abnormal result. Because the prevalence of glaucoma alone reaches above 5% (a commonly accepted limit for "normality") in older populations,13 the use of a criterion-determined normal population to create perimetric databases is important for maximizing a tests sensitivity for detecting ocular disease.
The use of inclusion criteria, however, means that the resultant database no longer reflects the performance of the general population, but rather that of a criterion-determined subpopulation.12 Using a perimetric-based criterion raises an interesting question, however: Is it appropriate to use an inclusion criterion (perimetric performance) that is based on the variable for which normal limits are being determined? In particular, what is the meaning of normative probability limits of 2%, 1%, and 0.5%, when they are based on a group from which the lowest 5% was removed? For example, using 5% probability levels as a criterion for normality results in a database that contains only the top 95% of normal performers. Furthermore, requiring subjects in the database to have two normal indices (e.g., MD and PSD) at the 5% level makes things worse, resulting in only the top 90% (0.95 squared, assuming complete independence of the indices) of performers. Such a database would produce a high false-positive rate for detecting abnormal visual fields, as the subject group who formed the database had perimetrically "supernormal" performance.
This example assumes that an otherwise normal observer with an abnormal MD or PSD always returns an abnormal index on subsequent testing. This is unlikely to be true, however. Variability in both indices would result in a variety of subjects periodically returning abnormal fields, rather than a fixed 5% of the otherwise normal population.
Furthermore, even if perimetric indices could be determined without variability, the supernormal phenomenon should only manifest if there is a good correlation between the perimeter used for the inclusion criteria and the perimeter whose normative database is being created, in normal observers. This correlation is distinct from that which exists between two tests in subjects with ocular disease14 15 16 and is likely to be lower, given the comparatively restricted range of test indices returned by normal observers.17 Although it may be expected that good correlation exists among perimeters that have similar test parameters, this may not be true among perimeters designed to measure different visual functions (e.g., frequency-doubling [FD] perimetry,18 or SWAP3 ). Previous work has failed to find a significant correlation between the MD index for conventional incrementthreshold perimetry and FD perimetry in a group of normal observers, despite the presence of a strong correlation when a similarly sized group of glaucomatous observers was used.15
Therefore, two important factors determine whether using a perimetry-based inclusion criterion generates a supernormal group of perimetric observers: the variability of perimetric indices in normal observers and the correlation between the perimetric indices of two tests (the established perimetry test and the new perimetry test) in normal observers. In the current study, we investigated how well a normal subjects perimetric performance predicts his or her performance on subsequent perimetric examinations. In addition, we examined the correlation between the performance of normal observers for two different types of perimetry: incrementthreshold perimetry (HFA) and FD perimetry.19 Based on our empirical findings, we performed a Monte Carlo investigation of the effects of inclusion criteria on perimetric normative database probability limits.
| Materials and Methods |
|---|
|
|
|---|
For experiments 1 and 2, we analyzed the right eye results of 100 subjects recruited from staff, associates, and patients of the Ophthalmology Department of the University of California, Davis. All subjects had normal biomicroscopic and ophthalmoscopic examination results, visual acuities equal to or better than 20/40 (6/12), spectacle refraction not greater than ±5.00 DS and ±2.00 DC, and no history of ocular disease or systemic disease known to affect vision, and none was taking any medications known to affect vision. Subjects had at least one prior visual field examination (HFA II 24-2 full threshold) at a session separate from the main study, at which time both MD and PSD indices were normal (P
5%). Because of this, our subjects were not naïve perimetric observers for the data presented in this study and so may not demonstrate the same improvement in performance with serial field testing (the "learning effect") expected in a naïve sample. The significance of this is discussed in the following sections.
If a perimetric test is used purely as a screening procedure on naïve subjects, then it would be appropriate for a learning effect to be accounted for so that false-positive results do not arise simply through subject inexperience. Such a test, however, would have its sensitivity to detect disease compromised, due to the increased variability in results (and, correspondingly, the increased width of the 95% limits) obtained with naïve observers.2 A test that is used primarily to monitor patients should not account for a learning effect, however, as it is expected that subjects taking the test are either not perimetrically naïve or are naïve and so may require training to achieve consistent results. We believe that it generally is undesirable to have a database that is influenced by a learning effect and that it is preferable to be aware that some naïve subjects may require training. Our approach is consistent with that used in the development of the Humphrey Visual Field Analyzer, in which only subjects experienced with perimetry were included. As noted by Hiejl et al.,2 "... [I]f a model of the normal visual field were to be based on subjects without any previous experience in visual field testing, the normal variability would be very large and nonrepresentative for many clinically examined patients."
Subjects performed test and retest sessions (right and left eyes) on both the 24-2 HFA II (full threshold) and a customized FD perimeter using the same 24-2 test pattern,19 with testing spaced over four visits and interspersed with other perimetric tests (not analyzed in this study). We performed customized calculations20 of the indices MD and PSD for each eye and for each test type, for both test and retest sessions, using a linear model for the effect of aging2 19 and the formulas used by the commercial HFA device.21 Both test indices also are available on the newer SITA test algorithm for the HFA perimeter.9 Percentile limits were calculated empirically, using linear interpolation. Table 1 shows the distribution of subjects, by age.
|
|
To generate two distributions with known correlation, we combined each element in two independent Gaussian distributions (X and Y; mean = 0, SD = 1) using the following equation:
![]() | (1) |
gives the proportion of each distribution contributing to Z and can vary between 0 and 1. It should be noted that
is not identical with Pearsons coefficient of determination (r2) nor the Spearman rank correlation coefficient (rs). The variance of the distribution Z is:
![]() | (2) |
X2 and
Y2 are the variances of distributions X and Y, respectively. The SD of distribution Z could then be normalized by dividing each element by the root of this variance:
![]() | (3) |
Based on these equations, we produced a normally distributed set of test indices, X, and another set of normally distributed indices, Z(norm), with a known correlation between the two sets. We simulated 2000 indices for each of the two distributions X and Z(norm), using two combined multiplicative congruential random number generators, as implemented by Press et al.,23 giving a period of approximately 2.3 x 1018. Serial correlations were removed by using a Bays and Durham shuffle.23
| Results |
|---|
|
|
|---|
|
|
|
|
|
| Discussion |
|---|
|
|
|---|
5%) MD and/or PSD for their right and/or left eye on initial testing, as determined by the commercial HFA database, despite having normal (P
5%) indices at a prior screening visit (see the Methods section).
Based on our correlation findings, it is possible to predict by how much nominal probability limits would shift when a criterion of normal perimetric results is used in generating a perimetric database. We will take the example of generating a normative database for the FD perimeter, using only subjects with a normal (P
5%) MD index on the HFA. The correlation between the FD perimeter and the HFA was 0.41 (95% CI, 0.230.57; Fig. 4 ), suggesting that the 5% limits in our new database would actually exclude approximately 5.9% (95% CI, 5.56.7%) of the normal population not previously screened on the HFA (Fig. 5) . Similarly, the 1% limits would exclude approximately 1.5% (95% CI, 1.21.8%) of the normal population. Such shifts are small compared with those expected if there were a perfect correlation between the two tests (9.75% and 5.95% for the 5% and 1% limits, respectively: see the Results section). As the correlation between any new test and the HFA should be less than the autocorrelation (i.e., testretest) of the HFA, the testretest correlation of the HFA should set an upper limit on what probability limit shifts are expected in practice. We found a testretest correlation coefficient of 0.65 (95% CI, 0.520.76) for the HFA index MD (Fig. 2) , which predicts an upper limit of 7.2% (6.4%7.9%) and 2.0% (1.7%2.6%) for the 5% and 1% probability limit shifts, respectively (Fig. 5) . Given the uniformly lower correlations found for the index PSD, it is likely that a criterion based on PSD will cause a probability limit smaller than those expected with a criterion based on the index MD. It is possible that many or all the described probability limit shifts are smaller than those introduced by inexact modeling of the change in sensitivity with age19 or by assuming that the variance of sensitivity distributions is constant with age.2
It could be argued that to exclude subjects with otherwise undetected disease reliably, it would be best to use a visual field index that has high testretest repeatability. However, high repeatability will produce greater biases in the resultant database (Fig. 5) . It must be remembered that the repeatability between normal observers, as assessed in this study, may not be the same as when repeatability is assessed in a group of normal and diseased observers.15 In particular, variability must be viewed in light of the range of index values encountered clinically. Because of this, a test index may show little or no repeatability among normal observers, but still be a useful diagnostic index provided diseased observers return test index values outside the normal range. For example, we found PSD to have poorer testretest reliability than MD for normal observers, which could be interpreted that PSD would be the poorer choice for detecting early disease. In contrast, though, previous work has found that PSD is superior to MD in detecting glaucomatous visual field damage24 and that PSD (in an analogous form, corrected loss variance) is superior to MD for detecting the onset and progression of glaucomatous visual field damage.25
If we use 5% limits of normality on conventional perimetry as a normative criterion, on average we expect to exclude 1 person in 20 from the database. If we use multiple test indices as a criterion for normality (for example, normal MD, PSD, and glaucoma hemifield test [GHT]), then the proportion of subjects rejected increases, and the specificity for disease detection decreases. By repeat testing of subjects with abnormal tests indices, test specificity may be improved,20 26 and some of the subjects initially falling outside the normative criteria may become eligible for inclusion in the database. If retesting is allowed, however, it is important that this be noted in the eligibility criteria.
In conclusion, we find that using a criterion of normal perimetric performance is unlikely to result in large biases in perimetric normative databases, particularly if the criterion test differs in type from the test whose database is being produced. In addition, we find that criteria based on the index PSD introduce smaller biases than those based on the index MD. Based on these findings, we recommend that future developers of perimetric databases might further minimize biases by adopting liberal criteria for those reference test indices showing strong correlation with the new test, and relatively stricter criteria for those indices showing poor correlation. For example, in our study, a normative criterion of MD
1%, and a PSD
5% on the HFA may be more appropriate than setting
5% limits for both indices. Considering that the correlation between indices on the reference and new test typically is unknown when commencing data collection for a new database, it may be worthwhile to apply perimetric criteria post hoc once data collection is completed, correlation analyses performed, and the likely level of bias calculated.
| Footnotes |
|---|
Submitted for publication January 21, 2003; revised May 5 and June 25, 2003; accepted June 30, 2003.
Disclosure: A.J. Anderson, None; C.A. Johnson, None
The publication costs of this article were defrayed in part by page charge payment. This article must therefore be marked "advertisement" in accordance with 18 U.S.C.
1734 solely to indicate this fact.
Corresponding author: Chris A. Johnson, Discoveries in Sight, Devers Eye Institute, 1225 NE Second Avenue, Portland OR 97232; cajohnso{at}discoveriesinsight.org.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
P. H. Artes and B. C. Chauhan Signal/Noise Analysis to Compare Tests for Measuring Visual Field Loss and Its Progression Invest. Ophthalmol. Vis. Sci., October 1, 2009; 50(10): 4700 - 4708. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. J. Anderson, C. A. Johnson, M. Fingeret, J. L. Keltner, P. G. D. Spry, M. Wall, and J. S. Werner Characteristics of the Normative Database for the Humphrey Matrix Perimeter Invest. Ophthalmol. Vis. Sci., April 1, 2005; 46(4): 1540 - 1548. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |