|
|
||||||||
1From the School of Biomedical and Natural Sciences, The Nottingham Trent University, Nottingham, United Kingdom; and the 2Glaucoma Research Unit, Moorfields Eye Hospital, London, United Kingdom.
| Abstract |
|---|
|
|
|---|
METHODS. Proven quantitative techniques, collectively referred to as statistic image mapping (SIM), are widely used in neuroimaging. These techniques are applied to HRT images. A pixel-by-pixel analysis of topographic height over time yields a statistic image that is generated by using permutation testing, derives significance limits for change wholly from the patients own data, and removes the need for reference data sets. These novel techniques were compared to the Topographic Change Analysis (TCA super-pixel analysis) available in the current HRT software, by means of an extensive series of computer experiments. The SIM and TCA techniques were further tested and compared to linear regression of rim area (RA) against time, in real longitudinal HRT series of eyes of 20 normal subjects and 30 ocular hypertensive (OHT) patients that were known to have converted to glaucoma, on the basis of visual field criteria.
RESULTS. Computer simulation indicated that SIM has better diagnostic precision at detecting change. In the real longitudinal series, SIM flagged false-positive structural progression in two (10%) of normal subjects, whereas TCA identified three (15%), and linear regression of RA against time identified two (10%). SIM identified 22 (73%) of the OHT converters as having structural progression, whereas the TCA and linear regression of RA against time each identified 16 (53%) over the course of the follow-up.
CONCLUSIONS. SIM has better diagnostic precision in detecting change in series of HRT images when compared to current quantitative techniques. The clinical utility of these techniques will be established on further longitudinal data sets.
An alternative approach, devised by Chauhan et al.,16 17 considers change over time at the level of groups of pixels within the image: the Topographic Change Analysis (TCA). Now included in the HRT software, this technique divides the image into a 64 x 64-superpixel array. (Each superpixel is 4 x 4 pixels, thus containing 16 pixels.) Change in topographic height in superpixels is quantified with a standard statistical method comparing a set of baseline images to the most recent follow-up images.16 17
In neuroimaging, positron emission tomography (PET) or magnetic resonance imaging (MRI) scans yield a sequence of 3-D images of the subjects brain from which the temporal and spatial characteristics of neuronal activity can be deduced. In the case of MRI, for example, this is done by measuring changes in cerebral blood oxygenation related to brain activity. The images are complex and high-dimensional, typically containing as many as 100,000 measured volume elements or voxels (3-D pixels). Consequently, the neuroimaging research community has been forced to develop an extensive suite of techniques to register, align, process, and analyze arrays of imaging data.18 We propose to exploit part of this catalog of proven methods, specifically the techniques collectively referred to as statistic image mapping (SIM) which are used for determining areas of activity and change in series of MRI- and PET-type images, by applying them to series of retinal and optic nerve head topography images. In particular, we use a nonparametric version of these techniques. These are intuitive to understand, and assessment of change in the image is based solely on the subjects own data and within-subject image variability, rather than any a priori information or patient population characteristics.
The purpose of this study is to describe and apply SIM techniques to HRT images. We also evaluated the performance of this new statistical approach by comparing it to the TCA method currently made available on the HRT software. We did this by means of an extensive series of computer simulation experiments that used a novel technique for generating simulated series of stable and progressing "virtual patient" HRT images, in which noise, typical of the misalignment inherent in serial topographical images, was mimicked. In addition, we applied SIM techniques to longitudinal sets of real HRT data from patients and normal subjects, and made comparisons with the TCA method and trend analysis of HRT rim area measurements.
| Methods |
|---|
|
|
|---|
Statistic Image Mapping
The methods take advantage of proven statistical techniques that have been developed to analyze series of MRI and PET images. These analyses usually proceed by computing a statistic at the pixel level (or the voxel in the case of MRI and PET images), indicating evidence for the observed effect of interest, and resulting in an "image of statistics," or a statistic image map. The entire statistic image must be assessed for significant effects, using a method that accounts for the inherent multiplicity involved in testing all the pixels at once. This analysis can be accomplished in a classic parametric statistical framework,23 24 but we used an alternative that is based on permutation testing. The latter is conceptually simple, does not rely on any theoretical probability models that may or may not be appropriate for the HRT data, deals with the multiple comparison problem of testing a vast image space, and critically derives significance limits for change based only on the individual patients series of data. These specific techniques and the mathematics that underpin them, as applied to PET and MRI data, are extensively described elsewhere.25 26 27 28 29 30 31 What follows is a description of three component parts of this approach and how we applied them to a series of HRT data, with the descriptive order chosen to facilitate the explanation of the methods, rather than to replicate the computational sequence, details of which can be found in the Appendix.
Permutation Testing at Individual Pixels.
Consider that three HRT images, at each patient visit, are acquired at regular intervals during a clinical follow-up. After registration of the image series, the topographic height at each individual pixel is considered in turn. Visually, this can be done by plotting the topographic height at each pixel as a time series (Fig. 1) . Next, a suitable statistic is derived for summarizing the change, or stability, of the topographic height at that pixel over time: the line of best fit (slope) derived from ordinary least-squares regression. The standard error (SE) of this slope gives an indication of how well the data fit the linear trend, with relatively high values indicating a poor fit or a noisy series of observation. Our test statistic at each pixel is simply the absolute value of the slope divided by the SE. A relatively large test statistic would be evidence of clear linear change of topographic height at that pixel. This process is performed at all the pixels, and the patients series of data is reduced to a statistic imageno longer a physiological image, but a 256 x 256-pixel map of statistics summarizing change within the image. The next step is to determine whether the observed test statistic at each pixel is unusual, or more extreme, than would be expected by chance. This testing of the significance of the test statistic is not completed in the conventional manner, by considering the observed test statistic as a random variable from a probability model, but uses a permutation test. We randomly shuffle, or relabel, the order of the observed data and recalculate the test statistic for all possible permutations of the order of images. If we let N denote the number of all possible labelings, ti the statistic corresponding to labeling i, then the set of ti for all possible relabeling constitutes the permutation distribution. For example, there would be 369,600 [12!/(3! x 3! x 3! x 3!)] of these in a series of four clinical visits with three scans at each visit (see Appendix for more details of this calculation). We then assume that all the ti are equally likely and determine the significance of the observed test statistic by counting the proportion of the permutation distribution as, or more, extreme than our observed value, giving us our P-value. If P is, for example, <5% we label the pixel as active or changing. (We therefore assume that images acquired at the same visit are no more correlated than images acquired between visits. Previous work on the influence of time separation on interimage topographic variability support the intuition behind this approach.32 ) This permutation test is performed pixel by pixel, and the statistic image becomes "thresholded" at the 5% level, with pixels flagged if they are significant (Fig. 1) . In practice, a sample of 1000 randomizations (drawn without replacement from all the possible labelings) are used to generate the permutations distribution.33 34 This eases the computation burden but still allows for a statistically exact test at standard levels of significance testing. (Larger samples would be needed to evaluate P < 0.1%.)
|
|
Evaluation of the New Approach
We compared the performance of our new approach against the TCA method in a computer virtual-patient simulation. The TCA method was replicated in consultation with the authors of the technique (David Hamilton, Department of Mathematics and Statistics, Dalhousie University, Canada, private communication, 2004). In short, the 256 x 256-pixel array from each topographical image is divided into a 64 x 64-superpixel array. An ANOVA is conducted to measure the extent of a constant shift in the topographic height over all 16 pixels within each superpixel from one set of images (three replicates at baseline) to another (three replicates at the follow-up visit), but the method also considers an interaction term allowing for different changes at different pixels within each superpixel. The significance of change at each superpixel is evaluated using an F distribution, in which the degrees of freedom are adjusted by a correction. It is worth highlighting that this adjustment is used for analysis within a superpixel and does not correct for the spatial correlation or multiple testing across the whole image. The criteria for change implemented in a recent publication were applied exactly.17 Any virtual patient who showed a cluster of 20 or more significant superpixels bound within the contour line for the optic disc, where the topographic change compared with baseline occurred in three consecutive sets of follow up images, was considered to have confirmed progression of glaucoma.
Computer Simulation.
Simulations of topographic images have been used previously to test the ability of new techniques to distinguish glaucomatous from normal optic discs.35 36 In this study, simulation experiments were designed to quantify the specificity and sensitivity of our technique and the TCA superpixel method. Subjects with stable or unstable images were simulated. Those with unstable images had gradual and episodic change applied to a region of the neuroretinal rim. Each subject comprised a longitudinal series of 30 images: 10 sets of 3 images (baseline set, and nine follow-up visits).
We simulated each stable series by using 30 identical copies of an HRT topographic image (replicating 10 visits with three scans per visit) and then applied noise to each image. Progressing patients series with gradual change (linear) were simulated by creating 30 identical images and applying a cumulative decay of 5 µm per visit to a cluster of 480 pixels to the neuroretinal rim. Progressing patients series with episodic change (sudden) were simulated by applying a height decay of 50 µm to the cluster at a randomly selected visit between visits 2 and 10, inclusive.
Between-image variability had two elements: "misalignment noise" and background noise. Misalignment noise was simulated by applying a series of transformations to each image in translations (x', y', and z') and rotations about each axis (
x',
y', and
z'). Transformations are applied at a subpixel level, using bicubic interpolation algorithms.37 The magnitude of each of the six transformations was made unique in each simulation by using a random number sampled from a normal distribution, wherein the mean of the size of the transformation is set at zero and a variance is fixed for each transformation. To mimic background noise, Gaussian noise was added to each pixel with variance v and mean zero. (A proven nonbiased random-number generator was used to sample from a normal distribution.38 ) To replicate the repeatability of topographic height measurements in clinical data, groups of virtual subjects were simulated having a mean pixel height standard deviation (MPHSD) of 15, 25, or 35 µm. The result of applying movement noise mimicked the between-image variability illustrated in previous studies, with higher variation in areas of high gradient change, such as across blood vessels and in the cup.39 40 Each simulated series was stored to computer disc, allowing the specificity and sensitivity of both techniques to be evaluated on identical image series.
Computer Experiments.
Specificity was examined in our first set of experiments by generating 300 stable virtual patient series. Three groups of 100 virtual patients were generated with an MPHSD of 15, 25, or 35 µm. We then applied our new SIM technique to these data, using the criteria for change specified in the Appendix, recording for each patient series the visit at which (false-positive) change was first detected. We then applied the TCA method to the same data set, again recording for each patient series the visit at which (false-positive) change was first detected.
The sensitivity of the techniques was tested in six separate experiments: for gradual (linear) change and sudden (episodic) change; with change applied to a cluster of an area of 480 pixels; and with an MPHSD of 15, 25, or 35 µm. The same progression criteria were used as for the specificity experiment. The follow-up visit at which change was first detected was recorded for both the SIM and the TCA analyses.
The SIM technique, the replicated TCA method, the simulations, and the computer experiments were all developed in purpose-written software using C++.
Real Longitudinal HRT Series.
Patients with ocular hypertension (OHT) who had reproducible visual field loss that developed while they were under observation and normal subjects were selected from the OHT clinic at Moorfields Eye Hospital. The study groups are described in detail elsewhere.12 13 In short, OHT patients had an intraocular pressure (IOP) of
22 mm Hg on two or more occasions, two initial reliable visual field results with AGIS (Advanced Glaucoma Intervention Study) score of 0, absence of other significant ocular disease that would affect visual field performance, and age >35 years. The eligibility criteria for the normal subjects included IOP consistently <22 mm Hg, baseline reliable visual field results with an AGIS score of 0, no significant ocular disease, no family history of glaucoma, and age >35 years. A reliable visual field was defined as <25% fixation errors, <30% false-positive errors and <30% false-negative errors. The normal subjects were followed up concurrently with the OHT patients. The study was in compliance with the guidelines established in the Declaration of Helsinki.
Thirty OHT eyes that converted to a diagnosis of glaucoma (converters) during the follow-up and 20 eyes of 20 normal subjects were randomly selected. A "converter" was defined as an eye with an initial AGIS score of 0 and follow-up AGIS scores of
1 on three consecutive reliable visual field test results. Both groups were imaged at regular intervals; the converters follow-up period ranged from 2.8 to 7.3 years and the control subjects ranged from 2.8 to 7.3 years. Twenty-one topography images (representing seven visits with three scans per visit) were selected from each subject, taking the images from the baseline and last visit and images from five interim visits. Image quality was not a factor in the selection of subjects.
The topography images were extracted from the Moorfields HRT database, using the scientific features of the HRT Eye-Explorer software v1.4 (Heidelberg Engineering). The image data were exported as aligned for analysis by the HRT software and then subjected to SIM analysis, exactly as described for the simulation experiments (using the same progression criteria at visits 4 to 7). TCA was performed using the HRT software. In addition, simple linear regression of rim area (RA), as estimated by the HRT software (320-µm reference plane), against time was performed. The significance of the slope was examined sequentially from visits 4 to 7 on all subjects. Progression was defined at a visit if a subject had a statistically significant negative slope (P < 0.05).
| Results |
|---|
|
|
|---|
|
|
|
| Discussion |
|---|
|
|
|---|
The computer simulation and analysis of real longitudinal HRT data provided evidence that SIM is more powerful at detecting localized change than the TCA method and analysis of HRT rim area measurements against time. This result was achieved without the expense of more false positives. The developers of the TCA method originally demonstrated a high level of sensitivity and specificity in detecting change in computer simulation experiments,16 but series of confirmation tests and a requirement for a certain cluster size were needed to produce similarly adequate levels of diagnostic precision in real longitudinal image data sets.17 This necessity is not surprising, as the original simulations centered on a single superpixel rather than results across the whole image. A statistical adjustment (the Satterthwaite correction) was used to correct for similarity (spatial correlation) of the topographic height within a superpixel, but no real account was made for the multiplicity of testing across the whole image. The empiric solution to the problem of multiplicity of testing included the requirement for clusters of pixels to be above a certain size, based on observed series of normal subjects.17 However, this empiric solution is based on observations of variability within a population and not on observed variability within a subjects own data series. One possible reason our technique outperforms this analysis in our computer simulation, and in real longitudinal data, is that it inherently corrects for the multiple comparison problem. Handling this aspect of imaging data is one of the key features of the SIM approach.
At the center of our technique is the use of permutation testing, tailoring the analysis to the data itself without incorrectly assuming that topographic heights, across the whole image, follow the behavior of a random variable from a known probability distribution, or without reliance on some reference patient population database. Statistically speaking, permutation methods are known to be both flexible and exact.33 With the increased computational power now available, there seems to be no important argument against their preferred use in situations in which there may be arbitrary properties of the observed data that cannot be accounted for by a probability model. The most plausible explanation, however, for the better diagnostic precision of SIM in comparison to the TCA technique in these computer experiments is simply the use of the whole series of the data. The TCA method uses only the baseline images and three follow-up images. This may be reasonable when the follow-up is short, but when the available series of data lengthens beyond four visits, it results in considerable data redundancy, as illustrated in Figure 3 when the difference between the two methods appears approximately halfway through the potential follow-up of 10 visits. It is also interesting to note that there is no discernible difference between the power of the methods when episodic or sudden loss is specified (Fig. 3) . This aspect of the results is reassuring, because our choice of the pixel-by-pixel test statistic is essentially a rate (trend) parameter, which might not be considered sensitive to detecting a sudden change. However, we have reported for threshold measurements in the visual field that linear regression adequately identifies sudden change, unless a series of data becomes very long.44 At the same time, there is a real advantage of using a rate parameter, as it may provide clinically interpretable information once the technique has identified a significant region of change. Of course, there is no firm evidence about structural loss in glaucoma being either gradual or sudden, but it seems the new technique described herein will be sufficiently sensitive to both types of deterioration.
This work serves an additional purpose in reporting the statistical image mapping techniques as an example of the catalog of proven methodology developed by the neuroimaging community that should be exploited by clinicians and scientists who are developing analysis techniques for retinal images. Moreover, the specific statistical techniques described in this work should not be restricted to quantifying glaucomatous structural progression, but could be used to identify features in topography images acquired to monitor other disease processes. A good example would be the recently proposed macula thickness maps for diabetic edema.45 Furthermore, these techniques should not be confined to one type of retinal imaging modality and could be applied, for example, to series of images acquired from optical coherence tomography and scanning laser polarimetry.
The computer simulation of series of HRT images reported in this work is the first of its kind, with previous computer simulations restricted to separating normal and glaucomatous. It is also novel because it imitates noise by replicating the repeatability of topographic height measurements by applying Gaussian and misalignment noise that typical remain in serial topographical images, even after they have been registered by the HRT software. Of course, an evidence base for the clinical validity of SIM, as applied to series of retinal images, can only be achieved in a study of a longitudinal cohort of patients and control subjects large enough to provide sufficient statistical power, but this was beyond the scope of the present study and is the subject of future work. In addition, we see the main value of these techniques as being a way of providing the clinician with a much needed, reliable method of visualizing, quantifying, and assessing rates of glaucomatous change in small localized areas in series of retinal images, rather than binary progression or stable classifications that rely on topographic summary parameters. The permutation test provides great advantages over the parametric approach, because it always works, given a short series of data, and makes no assumptions about the underlying distribution of the slopes or the topographic heights. In conclusion, we have demonstrated the application of a new set of techniques to detect change in series of retinal and ONH topography images. The evidence provided by the results from computer simulation experiments and the real longitudinal HRT data and their proven use in the field for which they were originally developed suggest they can be a useful clinical tool.
| Appendix 1 |
|---|
|
|
|---|
The number of possible unique permutations is expressed as
![]() |
![]() |
The following steps represent the computational paradigm to compute a permutation distribution and test statistical significance:
Preprocessing: The Pseudo Test Statistic.
The pseudo test statistic tstat(i,j) is calculated by dividing slope b(i,j) with a smoothed SE se(i,j). The smoothed SE is calculated by convolving the SE se(i,j) with a Gaussian kernel. We used a square Gaussian kernel of symmetrical full width at half maximum of 11 and size 17 x 17 to smooth the SE se(i,j). The pseudo test statistic is calculated for the observed case and for each unique permutation.
Permutation Testing for Thresholded Clusters.
The following paradigm is a programming methodology for thresholding clusters:
| Acknowledgements |
|---|
| Footnotes |
|---|
Supported, in part, by funding from the Moorfields Eye Hospital Special Trustees.
Submitted for publication August 6, 2004; revised December 20, 2004; accepted January 11, 2005.
Disclosure: A.J. Patterson, None; D.F. Garway-Heath, Heidelberg Engineering (C); N.G. Strouthidis, None; D.P. Crabb, None
The publication costs of this article were defrayed in part by page charge payment. This article must therefore be marked "advertisement" in accordance with 18 U.S.C.
1734 solely to indicate this fact.
Corresponding author: David P. Crabb, School of Biomedical and Natural Sciences, The Nottingham Trent University, Clifton Campus, Nottingham NG11 8NS, UK; david.crabb{at}ntu.ac.uk.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
M. Balasubramanian, C. Bowd, R. N. Weinreb, G. Vizzeri, L. M. Alencar, P. A. Sample, N. O'Leary, and L. M. Zangwill Clinical Evaluation of the Proper Orthogonal Decomposition Framework for Detecting Glaucomatous Changes in Human Subjects Invest. Ophthalmol. Vis. Sci., January 1, 2010; 51(1): 264 - 271. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. H. Artes and D. P. Crabb Estimating Normative Limits of Heidelberg Retina Tomograph Optic Disc Rim Area with Quantile Regression Invest. Ophthalmol. Vis. Sci., January 1, 2010; 51(1): 355 - 361. [Abstract] [Full Text] [PDF] |
||||
![]() |
R Asaoka, N G Strouthidis, V Kappou, S K Gardiner, and D F Garway-Heath HRT-3 Moorfields reference plane: effect on rim area repeatability and identification of progression Br J Ophthalmol, November 1, 2009; 93(11): 1510 - 1513. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. N. Weinreb and P. L. Kaufman The Glaucoma Research Community and FDA Look to the Future: A Report from the NEI/FDA CDER Glaucoma Clinical Trial Design and Endpoints Symposium Invest. Ophthalmol. Vis. Sci., April 1, 2009; 50(4): 1497 - 1505. [Full Text] [PDF] |
||||
![]() |
B. C. Chauhan, D. M. Hutchison, P. H. Artes, J. Caprioli, J. B. Jonas, R. P. LeBlanc, and M. T. Nicolela Optic Disc Progression in Glaucoma: Comparison of Confocal Scanning Laser Tomography to Optic Disc Photographs in a Prospective Study Invest. Ophthalmol. Vis. Sci., April 1, 2009; 50(4): 1682 - 1691. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Bowd, M. Balasubramanian, R. N. Weinreb, G. Vizzeri, L. M. Alencar, N. O'Leary, P. A. Sample, and L. M. Zangwill Performance of Confocal Scanning Laser Tomograph Topographic Change Analysis (TCA) for Assessing Glaucomatous Progression Invest. Ophthalmol. Vis. Sci., February 1, 2009; 50(2): 691 - 701. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Poli, N. G. Strouthidis, T. A. Ho, and D. F. Garway-Heath Analysis of HRT Images: Comparison of Reference Planes Invest. Ophthalmol. Vis. Sci., September 1, 2008; 49(9): 3970 - 3975. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. M. F. Owen, N. G. Strouthidis, D. F. Garway-Heath, and D. P. Crabb Measurement Variability in Heidelberg Retina Tomograph Imaging of Neuroretinal Rim Area Invest. Ophthalmol. Vis. Sci., December 1, 2006; 47(12): 5322 - 5330. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. J. Patterson, D. F. Garway-Heath, and D. P. Crabb Improving the repeatability of topographic height measurements in confocal scanning laser imaging using maximum-likelihood deconvolution. Invest. Ophthalmol. Vis. Sci., October 1, 2006; 47(10): 4415 - 4421. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. G. Strouthidis, A. Scott, N. M. Peter, and D. F. Garway-Heath Optic disc and visual field progression in ocular hypertensive subjects: detection rates, specificity, and agreement. Invest. Ophthalmol. Vis. Sci., July 1, 2006; 47(7): 2904 - 2910. [Abstract] [Full Text] [PDF] |
||||
![]() |
N G Strouthidis, E T White, V M F Owen, T A Ho, and D F Garway-Heath Improving the repeatability of Heidelberg retina tomograph and Heidelberg retina tomograph II rim area measurements Br J Ophthalmol, November 1, 2005; 89(11): 1433 - 1437. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |