|
|
||||||||
1From the Department of Optometry and Visual Science, City University, London, United Kingdom; and the 2Glaucoma Research Unit, Moorfields Eye Hospital, London, United Kingdom.
| Abstract |
|---|
|
|
|---|
METHODS. RA noise was estimated from patient data and characterized by fitting theoretical distributions to the observed data. Multilevel regression was used to determine factors that significantly affect noise. Computer simulations of disease progression were performed by adding noise generated from the distribution derived from the observed data to the average rate of loss in RA estimated from longitudinal data. Rates of detection of disease progression were investigated for various progression rates, follow-up periods, and rates of imaging.
RESULTS. Noise was not normally distributed and was best characterized by the hyperbolic distribution, which fit averages well while allowing for extreme values. Noise was greatly influenced by image quality, but age did not have a significant effect. Rates of detection improved for more frequent imaging, better quality images, and faster rates of disease progression.
CONCLUSIONS. Noise in HRT measurement of RA is well characterized by the hyperbolic distribution. Sensitivity of detection improves with more frequent testing, but if consistently poor-quality images are yielded for a patient, the probability of detection is low. Results from this work could be used to tailor individual follow-up patterns for patients with different rates of RA loss and image quality, especially in a clinical trial setting.
In using instruments to assess visual structure and function, measurements are rarely made without error. Thus, variability is inherent in the measurement of RA and can be separated into variability resulting from true physiological change and variability resulting from measurement error. This measurement error (or noise) may be attributed to a range of factors, including patient characteristics (e.g., lens opacity8 9 ), machine characteristics (e.g., image alignment10 ), and operator characteristics (e.g., different operators,11 placement of the contour line outlining the optic disc margin12 13 ).
The practical consequences of "noisy " measurements of RA are clear: if a deterioration in RA is seen in the sequential imaging of glaucomatous eyes, how can the clinician determine whether this change is caused by worsening disease or by measurement error? Making an incorrect decision has consequences for the patient and the health provider: failing to detect true change (a false-negative) may leave the patient with worsening glaucoma that is undertreated, thus compromising his or her visual function. Alternatively, incorrectly diagnosing progression (a false-positive) may result in unnecessary and expensive intervention or treatment changes. Consequently, the major challenge in the quantitative evaluation of RA lies in distinguishing true change in RA from noise.14
The aims of this study were to use cross-sectional and longitudinal patient data to characterize the noise in HRT measurement of RA to determine which factors significantly affect noise and to use this information to investigate the optimum frequency of HRT imaging in follow-up for the reliable detection of glaucomatous disease progression.
| Methods |
|---|
|
|
|---|
Both studies used the HRT confocal scanning laser ophthalmoscope (Heidelberg Engineering) for acquisition of three-dimensional images of the posterior segment of the eye. Images were acquired as they are in the clinical setting, with technicians obtaining the best possible images and using the HRT software checks for image quality. All images were carefully inspected by one of the authors (NGS) for clinically apparent misalignment and were manually realigned if necessary. Mean images were generated and analyzed (Eye-Explorer software, version 1.7.0, and HRT Viewing Module, version 3.0.4.10; Heidelberg Engineering). The 320-µm reference plane was used for all analyses because it has been shown to result in measurements that are less variable than the standard reference plane.8 11 17 Briefly, the height of the standard reference plane can vary depending on the height of the contour line at the temporal optic nerve head margin, whereas the 320-µm reference plane is fixed and thus yields less variable morphometric data. Studies were performed in accordance with the tenets of the Declaration of Helsinki, informed consent was obtained from the participants, and the research was approved by the appropriate ethics committee.
Estimation of Noise
RA was calculated for the whole optic disc (global RA). The noise in RA measurements was estimated as follows:
Characterization of Noise
Theoretical distributions were fitted to the cross-sectional noise. In approximating medical data, the normal distribution is typically used because of its broad applicability and mathematical tractability. This is a symmetrical distribution (also referred to as the Gaussian distribution) whose shape is determined by two parameters, location (mean) and spread (standard deviation, SD). If it is assumed that measurements follow a normal distribution, then certain reasonably well-known properties are true, such as approximately two thirds of the data falling within a range ±1 SD from the mean. We also assessed the suitability of the hyperbolic distribution.18 This distribution belongs to the family of "stable " distributions, where stable refers to the property of distributions that retain shape when added together. These distributions generalize the normal distribution. They are more "peaked," more observations fall directly on the average than are seen in a normal distribution, and tails are heavier than are seen in the normal distribution. These distributions are used widely in financial mathematics for modeling stable random variables with extreme values that occur more frequently than in the normal distribution. We hypothesized that the hyperbolic model would mimic the clinical observation of HRT measurements, in which most values are highly reproducible but in which noise sometimes increases dramatically because of image acquisition or processing difficulties. In contrast to the normal distribution, the hyperbolic distribution has four parameters: location, scale, peak, and symmetry. These parameters may be manipulated to give a family of distributions to fit data according to patient characteristics. The goodness-of-fit of these distributions was assessed by the Kolmogorov-Smirnov statistic (for which the null hypothesis states that the distribution fits the data). The distribution that best described the test-retest noise was then validated by assessing its goodness-of-fit to the noise in the longitudinal data. This analysis was repeated for measurements of RA within the six predefined sectors of the optic disc: temporal, temporal superior, temporal inferior, nasal, nasal superior, and nasal inferior.
The relationship between the cross-sectional noise and potential predictive factors was investigated using regression methods to determine those factors that had a statistically significant effect on noise. Patient factors (age, sex, diagnosis, and lens opacity) and image characteristics (image quality, visit number, and operator) were considered. A measure of lens opacity was obtained using Scheimpflug lens photography (Marcher Case 2000 series; Marcher Diagnostics, Hereford, UK). This system produces a central nuclear dip (CND) value.19 CND is a measure of the density in the center of the lens nucleus that gives an objective assessment of the degree of nuclear opacification. Image quality was assessed by the SD of the topographic images, each of which comprises the mean of three single images. This SD is known as topographic SD or mean pixel height SD (MPHSD) and is the HRT manufacturers index for image quality.20
The regression method used to model the data was multilevel modeling (MLM), a standard statistical technique used in the medical, social, and educational sciences.21 MLM is similar to ordinary multiple linear regression in that a model between a number of predictor variables and a single outcome variable may be developed and approximated by a straight line. Ordinary linear regression makes the assumption that all outcome observations are independent of each other, but in the cross-sectional data each patient contributes five deviations so that deviations are nested within patients and are thus not independent. A deviation-level analysis ignoring this clustering may result in the underestimation of the standard errors of regression coefficients, giving overly small P values, whereas a patient-level analysis (e.g., using average deviations) loses potentially valuable information.22 MLM adjusts for the hierarchical structure of the data, allowing for the correlation between deviations for each patient and explicitly modeling the way in which deviations are grouped within patients. Essentially, in MLM, patients are regarded as a (random) sample from the population of all patients, and inference is made about the variation between patients in general. Intercepts and slopes of the fitted regression lines can vary randomly between patients. Multilevel modeling was carried out with the use of a software package (MLwiN, version 2.01; Multilevel Models Project, Institute of Education, London, UK).23
Frequency of Imaging in Follow-up
Computer simulations of disease progression in patients with glaucoma were performed by combining the best estimate of noise (i.e., the cross-sectional deviations) with the best estimate of noise-free progression. Noise-free progression was calculated by performing a regression of RA over time for all patients who converted to glaucoma in the longitudinal data set (n = 44) and taking the average of these regression slopes. Conversion to glaucoma was defined on the basis of VF change by AGIS criteria.15 16 One thousand "virtual " patients were simulated to have this rate of progression, to which noise generated from the distribution of noise observed in the test-retest data was added. Sensitivity and specificity of RA linear regression to disease progression, defined as the average slope, were calculated for a range of frequencies of imaging and lengths of follow-up. A test outcome positive for progression was defined as a negative regression slope of RA over time, with P < 0.05. Computer simulations were performed in the statistical programming language R, version 2.0.1 (The R Foundation for Statistical Computing, Vienna, Austria),24 and the R package HyperbolicDist25 was used to model the hyperbolic distribution.
| Results |
|---|
|
|
|---|
|
|
Results of the multilevel regression analysis are summarized in Table 1 , showing the statistically significant factors affecting the size of RA deviations. The factors shown not to affect noise were sex, diagnosis (POAG or OHT), visit number, operator, and, of particular importance, age. When fitted in a model as the only variable, MPHSD, age, and CND individually had a statistically significant effect on noise. However, when fitted together in a multiple regression model, image quality, as measured by MPHSD, had an overwhelming statistically significant effect on noise. An average increase in MPHSD of 1 µm produced an increase in noise of 0.0005 mm2 (95% confidence interval [CI]: 0.00030.0007 mm2). Lens opacity, as measured by CND value, had a moderately strong effect. A unit increase in CND increased noise by 0.002 mm2 (95% CI: 0.00050.0040 mm2). Age was not statistically significant in the multiple regression model.
|
|
|
30 µm), acceptable (3150 µm), and unacceptable (>50 µm). These categories of MPHSD reflect the categories given in the HRT literature.20 Histograms of the observed longitudinal noise along with data generated from the hyperbolic and normal models are shown in Figure 4 . The hyperbolic distribution provided a very close approximation to the data, whereas the normal model failed to predict the peakedness of the data or the long tails.
|
|
Figure 6 shows the cumulative detection rates for the 1000 virtual patients simulated using the noise characteristics yielded from the results reported. Column A shows detection rates assuming a zero rate of loss of RA (i.e., stable disease). Column B shows rates assuming the median rate of loss. Column C shows rates associated with a faster rate of loss: the upper quartile of loss (0.023 mm2 per year, a loss of approximately 1.5% of an average normal RA per year). Simulations were repeated for the three categories of MPHSD: good (
30 µm) shown in row I, acceptable (3150 µm) shown in row II, and unacceptable (>50 µm) shown in row III (Fig. 6) . As expected, these simulation experiments indicated that increasing the frequency of testing improved detection rates in patients with progressive disease at all lengths of follow-up and with all image qualities. For example, for a virtual patient with an average rate of RA loss and good-quality images, imaging once a year for 4 years (Fig. 6BI) resulted in a detection rate of 37%, whereas imaging four times a year for 4 years gave a more acceptable 78% detection rate. Of course, detection rates are better still in eyes with disease that progresses faster (Fig. 6CI) . For the upper quartile of loss in RA, over a follow-up period of 4 years, imaging once a year will detect 71% and imaging 4 times a year will detect 98% of patients with progressive disease. Detection rates also improve as image quality improves. For example, imaging twice a year over a 5-year follow-up period will detect 98% of fast progressing disease with good-quality images (Fig. 6CI) , 89% of patients with acceptable quality images (Fig. 6CII) , and 56% of patients with unacceptable quality images (Fig. 6CIII) . Column A of Figure 6 shows the percentage of virtual patients with nonprogressing disease incorrectly identified as progressing, giving an indication of the specificity of detection. Specificity deteriorates over time, more steeply for more frequent testing.
|
| Discussion |
|---|
|
|
|---|
When the cross-sectional noise was separated into the six segments of the optic disc, significant differences were clear in the spread of noise across the sectors. It has been suggested that early glaucomatous changes often result in narrowing of RA in the inferior and superior temporal sectors.33 Therefore, it is particularly important that any reduction in RA in these areas be reliably detected. We found the noise in these areas to be relatively small compared with the noise in the temporal and nasal sectors, which showed the greatest spread. Any RA changes occurring in these latter two sectors would therefore have to be of larger magnitude to be reliably detected. The results described in this study provide a foundation for developing a technique for detecting progression in sectoral RA. The differences in noise distribution in the different disc sectors cannot be explained by a relationship between RA and variability, whereby RA in more damaged discs is noisier than RA in discs with early damage. No relationship has been established between RA and variability,2 and no statistically significant differences have been found in the test-retest variability of HRT II stereometric parameters between glaucomatous and normal eyes.34
Modeling the relationship between noise and possible predictive patient or scan factors allows us to understand which patients are likely to have reliable (low noise) scans. Because the nature of the test-retest datai.e., more than one measurement of RA per patientviolates the independence assumption of ordinary linear regression, we used multilevel techniques to account for this clustering in the data. This nonindependence is also true of the longitudinal data because patients underwent repeated imaging over time. MLM may also be used to model this sort of data structure. MLM is particularly appealing because the interpretation of the parameter estimates is similar to that of estimates arising from ordinary linear regression. Results from this analysis suggest MPHSD to be the factor with an overriding effect on noise to the exclusion of most other patient factors, including age. Thus one important clinical finding from this study is that useful scans can be obtained during follow-up of older patients with glaucoma, a significant finding given the high prevalence of glaucoma in the elderly population. In fact, taking MPHSD into account when interpreting changes in the RA of patients with POAG would remove much of the uncertainty in deciding how frequently and over how long a follow-up period to image. Noise was found to be less spread in images of better quality, enabling true change to be more easily distinguished from noise and requiring less frequent imaging for the reliable detection of disease progression in patients with high-quality images. This important finding should be incorporated into planned methods for detecting change in RA over time. Our measure of lens opacity, CND, had a statistically significant effect on noise independently of MPHSD; however, CND is primarily used as a research tool and is not readily available in the clinic.
Our computer simulation experiments of frequency of testing indicated that, in general, the sensitivity of disease progression increased with more frequent testing, for testing over a longer follow-up period, and for better quality images. For example, if we consider a virtual patient progressing at an average rate, imaging twice a year over 4 years gives a sensitivity of 42% for good quality images and 29% for acceptable images. However, sensitivities of 61% and 36% are achieved by imaging four times a year over 4 years for good and acceptable quality images, respectively. Of course, faster rates of loss are detected with better precision; for a virtual patient whose disease is progressing at the upper quartile of loss, imaging twice a year over four years would give sensitivities of 86% and 64% for good and acceptable quality images, respectively, and imaging four times a year over 4 years would result in detection rates of 95% and 82%. In these analyses, the mean (in the cross-sectional sample) and regression line (of longitudinal sample) are only estimates of true RA and might have been biased, indicating that the deviances (or residuals) could have underestimated the true noise. We must also emphasize that the simulation experiments simply demonstrate how ordinary linear regression performs in the presence of measurement noise sampled from a hyperbolic distribution; of course, the process of fitting trend lines by the method of least squares assumes that the errors (more precisely, residuals from the fit) are normally distributed. Alternative methods for fitting a trend to a series of observations, in which the process considers these attributes of the data, may provide more accurate estimates of rates of loss but are the subject of future work. It is hoped that these might improve the diagnostic precision we report from the current computer experiments.
One important caveat regarding the assessment for progression at each point in time during repeated sequential imaging is that it results in deteriorating specificity analogous to an inflated type I error brought about by repeated statistical hypothesis testing. Corrective statistical methods are required to maintain an acceptable level of specificity throughout follow-up, and any method for detecting change should incorporate solutions for this. Additionally, further modifications may be carried out to reflect the relative importance of tests conducted over a fixed observation period, such as the duration of a clinical trial.
As is customary in statistical methods, the computer simulations were based on average rates of RA loss, and this use of averages is often at odds with the needs of clinicians who necessarily think in terms of individual patients. However, it may be possible to tailor rates of progression and rates of imaging to individual patients. In a larger data set, patients may be divided according to their rates of loss and their values of MPHSD (the factor that determines the level of noise and thus the rate of imaging necessary to detect progression). The rates of change in RA used in our simulation experiments were based on data from patients with glaucoma that developed according to VF criteria.15 The patterns of VF change in glaucomatous progression are well documented, but given the lack of any criterion for progression and the measurement error inherent in perimetric assessment, these rates of change are necessarily approximations of any true underlying change.14
The value of the HRT for detecting glaucomatous progression will be realized as standards for specificity and optimal image acquisition frequencies are established. Alternative techniques for detecting glaucomatous progression in series of HRT images include topographic change analysis (TCA)35 36 and, more recently, statistical image mapping (SIM).37 These methods detect change at the pixel level (or group of pixels in TCA) rather than with summary measures such as RA. Change is evaluated within each patient, thus obviating the need for average measures of change and variability. The noise characteristics in RA measurements may also be apparent in these analyses; this is the subject of future work. The development of methods that make use of stereometric parameters such as RA still have a role in determining progression; they are clinically familiar, and change in an area is easier to grasp than change in topographic height. Summary measures of disc changes are useful for describing disease progression in large samples of patients in clinical trials. It is likely that the complete analysis of longitudinal HRT data may be best served by an amalgam of global analysis of changes in the optic nerve head coupled with techniques that can help the clinician visualize the localized areas of likely change.
In conclusion, we have established that the distribution of measurement error in HRT imaging of RA is best approximated by the hyperbolic distribution, thus allowing for computer simulations of progression and estimates of the sensitivity and specificity of detection of progression using RA. Issues concerning the attributes of noise may be relevant to other imaging modalities and other structural measures. Image quality is critical in terms of determining progression, and any method for detecting change must take this into account. Detection rates will improve with more frequent imaging, but techniques for correcting false-positive rates must also be applied. The results presented here will be used to develop statistical methods that will improve rates of detection and monitor change more reliably.
| Footnotes |
|---|
Submitted for publication January 30, 2006; revised June 1 and July 10, 2006; accepted October 16, 2006.
Disclosure: V.M.F. Owen, None; N.G. Strouthidis, Heidelberg Engineering (F); D.F. Garway-Heath, Carl Zeiss Meditec (C, F), Heidelberg Engineering (F); D.P. Crabb, None
The publication costs of this article were defrayed in part by page charge payment. This article must therefore be marked "advertisement" in accordance with 18 U.S.C.
1734 solely to indicate this fact.
Corresponding author: David P. Crabb, Reader in Statistics and Measurement in Vision, Department of Optometry and Visual Science, City University, Northampton Square, London EC1V OHB, UK; d.crabb{at}city.ac.uk.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
A. Poli, N. G. Strouthidis, T. A. Ho, and D. F. Garway-Heath Analysis of HRT Images: Comparison of Reference Planes Invest. Ophthalmol. Vis. Sci., September 1, 2008; 49(9): 3970 - 3975. [Abstract] [Full Text] [PDF] |
||||
![]() |
B C Chauhan, D F Garway-Heath, F J Goni, L Rossetti, B Bengtsson, A C Viswanathan, and A Heijl Practical recommendations for measuring rates of visual field change in glaucoma Br. J. Ophthalmol., April 1, 2008; 92(4): 569 - 573. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |