|
|
||||||||
1 From the Department for Medical Informatics, Biometry, and Epidemiology, Erlangen, Germany; and the 2 Department of Ophthalmology, University of ErlangenNürnberg, Germany.
| Abstract |
|---|
|
|
|---|
METHODS. In a cross-sectional clinical study, 406 eyes of 203 glaucoma patients and 200 eyes of 100 normal control subjects 18 to 70 years old underwent optic disc morphometry, automated perimetry, measurement of temporal contrast sensitivity by a full-field flicker test, blue-on-yellow visually evoked potential (VEP), and black-and-white pattern-reversal electroretinogram (ERG). Diagnosis of glaucoma was based on a qualitative classification of the optic nerve head and retinal nerve fiber layer independent of intraocular pressure and visual field. Confirmatory factor analysis was performed in the patient group as a whole and in a subgroup showing moderate to advanced glaucomatous optic nerve head damage.
RESULTS. The confirmatory factor analysis models explained the data satisfactorily (P > 0.18, all patients; P > 0.34, subgroup). Global glaucomatous damage was quantified best by the mean defect of automated perimetry (r = 0.81; r = 0.87), followed by the area of the neuroretinal rim (r = 0.64; r = 0.73), the full-field flicker test (r = 0.59; r = 0.65), the pattern-reversal ERG amplitude (r = 0.54; r = 0.55), and the VEP peak time (r = 0.55; r = 0.54).
CONCLUSIONS. Confirmatory factor analysis allows quantification of the validity of established and new procedures that measure global glaucomatous damage using cross-sectional data. The results are not dependent on the preselection of a specific gold standard. Psychophysical testing and morphometry quantified glaucomatous damage best, compared with electrophysiological procedures.
| Introduction |
|---|
|
|
|---|
In the biostatistical literature, the determination of agreement of repeated measurements on identical measurement scales is discussed controversially.1 2 The use of correlation measures is criticized sharply by the investigators in the second study.2 However, if different measurement scales are compared, the use of correlation and regression analyses is characterized as at least of limited use.2 3 To our knowledge no really convincing alternatives have been proposed. Therefore, in studies concerning glaucoma, correlation analyses using accepted measures of glaucomatous damage as reference criteria are used frequently. These criteriathe gold standardare, for example, the neuroretinal rim area (NRRA) and the mean visual field defect. This strategy, however, has several drawbacks. It is difficult to prove the validity of the gold standard (e.g., morphometry or perimetry) itself. Measures that are sensitive to the same type of glaucomatous damage (e.g., diffuse glaucomatous damage) show more similar results than measures that reflect a different pattern (e.g., localized glaucomatous damage) of the disease. Measures may be influenced by common factors independent of the disease: Different parameters derived from the same measurement or different psychophysical tests influenced by the compliance of the proband show more similar results than measures with independent errors. To cite an example, correlations among different psychophysical tests are expected to be higher than correlations between one psychophysical test and an electrophysiological or morphometric measure. If perimetry is set as the gold standard, agreement with electrophysiological measures may be underestimated, whereas agreement with psychophysical measures may be overestimated.
The purpose of the present study was twofold: We describe a statistical tool, confirmatory factor analysis,4 that does not rely on the selection of a gold standard and that allows evaluation of how well diagnostic measures quantify global glaucomatous damage. We used this method to analyze five different proceduresone morphometric measurement and four tests of visual functionthat have been proven5 6 7 to be useful in the diagnosis of the glaucomatous diseases. One particular reason for this study was the finding reported in earlier studies,3 that correlations between the NRRA and sensory measures, which have been observed in patients with moderate or advanced glaucomatous damage, are to a much smaller degree observed among patients in the early state of the disease.6 This fact may be subject to measurement errors or to the pathogenic mechanisms of the disease. We therefore investigated whether in those patients sensory measures provide any information about glaucomatous damage. We restricted the analysis to measures that were sensitive to global damage, omitting measures exclusively sensitive to localized glaucomatous damage or to the variance of the damage (e.g., perimetric loss variance). We did not include methods that allow differentiation between diffuse and localized loss, such as the pattern deviation probability map.8
| Methods |
|---|
|
|
|---|
Perimetric Mean Defect.
The perimeter (Octopus 501; Interzeag, Schlieren, Switzerland; 59
measure points, program G1, three phases) was used. Local or diffuse
visual field loss was defined according to Bebié et
al.11
(pathologic cumulative perimetric defect curves
based on graphical display of ranked local defects compared with the
95th and 99th percentiles of normal curves with identification of
localized, diffuse, and broadly distributed visual field losses).
Flicker Test.
A system6
with a full-field bowl (58 cm in diameter) and a
white flicker light was used. The test was performed under photopic
conditions and required no fixation by the subject. The flicker
threshold was determined at a constant frequency of 37.1 Hz at a
time-average luminance of 10 candelas [cd]/m2.
The mean luminance of the full-field bowl was corrected by taking into
account the pupil diameter and the StilesCrawford effect. The
contrast sensitivity was assessed using a staircase tracking procedure.
The mean value of at least six threshold crossings entered the
evaluation.
Both electrophysiological tests were performed with a two-channel Maxwellian view system with a Xenon arc lamp as the light source. The circular field was 32° in diameter in all recordings. With this stimulus system, retinal illuminance is independent of pupil width; therefore, no correction of pupil width was necessary. For peak latency of the blue-on-yellow onset visual evoked potential (VEP),7 one channel provided a high-contrast, 0.88-cyc/deg square-wave stripe pattern of blue light (460 nm, 3.3 x 102 trolands [td]), the other channel provided a homogeneous yellow adaptation light (570 nm, 1.3 x 104 td) that was superimposed on the stripe pattern. Stimulation was in the onset (200 msec)offset (500 msec) mode. Recording was monopolar from the inion against the left ear lobe while the right ear lobe was grounded. After amplification (EMP 88 [Electronic Medicine Technique, Pölzl, Munich, Germany], filter: 0.570 Hz), 150 sweeps (400 msec in length) were averaged (500-Hz sampling rate). Peak time measurements of the onset responses were made from the moment of pattern onset to the peak of the main negative wave (N1). For amplitude of the black-and-white pattern-reversal electroretinogram (ERG), only one channel of the viewing system was used. The stimulus was a vertical, high-contrast (0.93), black-and-white square-wave stripe pattern with a spatial frequency of 0.88 cyc/deg. The pattern reversal was square wave and occurred at a frequency of 7.8 Hz. The mean luminance was 4263 photopic td. The responses were recorded with a carbon glide electrode hooked over the subjects lower eye lid. After amplification (EMP 88, filter: 0.5 Hz-70 Hz, no notch filter) the responses were averaged and stored in a digital computer (IBM-AT, Armonk, NY; sampling rate 1000 Hz, 256 msec sweep, n = 30). Four pattern-reversal responses, and therefore eight amplitudes, were analyzed within one sweep. A subsequent fast Fourier analysis evaluated the amplitude of the second harmonic component of a total of 240 pattern-reversal responses. In both procedures, ERG and VEP, two recordings were made to check for reproducibility.
Diagnostic Criteria
The definition of glaucoma was based on the optic disc damage.
Criteria were: glaucomatous changes of the optic nerve head such as
unusual small NRRA in relation to the optic disc size, an abnormal
shape of the neuroretinal rim, cup-to-disc ratios that were higher
vertically than horizontally, and/or localized or diffuse retinal nerve
fiber layer defects.5
Visual field loss and
intraocular pressure (IOP) were no inclusion criteria. The
description of the sample, however, included visual field loss and
tonometry. The definition of normal-pressure glaucoma (max IOP,
21 mm
Hg) and open-angle glaucoma (max IOP, >21 mm Hg) was based on at least
two IOP measurements before initial medical therapy.
Patients and Control Subjects
Glaucoma Patients.
Four hundred six eyes of 203 patients with chronic open-angle glaucoma
were included in the study (Table 1)
. The patient group was divided into two subgroups: One
subgroup included 109 eyes of 76 patients with an NRRA of at least 1.35
mm2 (equivalent to the mean - 1 SD in the control
group), and the other subgroup contained 297 eyes of 170 patients with
NRRA of less than 1.35mm2. One hundred eighty-six of all
203 glaucoma patients had primary open-angle glaucoma, 17 had secondary
open-angle glaucoma (9 primary melanin dispersion, 6 pseudoexfoliative
syndrome, 1 anterior chamber angle recession after ocular contusion, 1
who developed glaucoma under systemic cortisone therapy). Three hundred
fifteen of the 406 glaucomatous eyes were treated topically. On the day
of examination, no subject had intraocular pressure more than 24 mm Hg.
|
Only patients and control subjects whose eyes were both classified in the same of the two groups were included in the study. All subjects had clear optic media; exclusion criteria were other eye diseases (optic media opacities, retinal diseases) and systemic diseases (diabetes mellitus). Subjects performing visual field testing with false-positive and false-negative responses of more than 12% were excluded (three control subjects, eight patients). All subjects had visual acuity of 20/30 or better. The principles of the Declaration of Helsinki were complied with: The participants were informed about the purpose of the study and the nature of the measurements. They were informed that they had the right to withdraw from the study at any time and signed an informed consent form.
Statistical Methods
The normal distribution assumption could be accepted for the NRRA,
and the amplitude of the pattern-reversal ERG. For the perimetric mean
defect and the flicker sensitivity, the log transformation provided the
best adjustment to normality, and for the peak latency of the
blue-on-yellow VEP, the loglog transformation provided the best
adjustment. The flicker sensitivity, the amplitude of the
pattern-reversal ERG, and the peak latency of the blue-on-yellow VEP
were age adjusted by linear regression analysis in the control group.
Correlation analyses used the Pearson product moment correlation
coefficient. If outliers or nonlinearity were present, the Spearman
correlation coefficient was computed additionally. In every subject
both eyes were used. The necessary corrections for statistical testing
and estimation of standard errors were taken into account. For group
comparisons of continuous variables the mean value of the left and
right eyes of subjects was used in the t-test for
independent samples. For significance testing of correlations and path
coefficients (see later description) the SEs were determined by the
bootstrap method and corrected according to the number of subjects, not
the number of eyes. This means that, although both eyes of one patient
were included, the precision of our results refers to the number of
patients instead of the number of eyes. In the subgroup analyses of
patients with NRRA of less than/at least 1.35
mm2, the Bonferroni correction with factor 2 was
applied.
For categorical variables the
2 test was used.
The level of significance was 0.05 (two-sided) in all statistical
tests. With 203 patients, a true correlation coefficient
r = 0.20 was detectable with a power of 80%;
r = 0.25 was detectable with a power of 90%. For
convenience of presentation, all procedures were rescaled so that
higher values always indicated the pathologic domain. For graphical
presentation of more than one procedure on the same axis of a figure,
values were standardized by subtracting the mean value of the control
group and dividing by the SD of this group. Nonlinear curve fitting was
performed using the Lowess algorithm.12
All statistical
analyses were conducted with commercially available software (SPSS for
Windows, ver. 6.1.3; SPSS, Chicago, IL)13
with an
integrated module (AMOS, ver. 3.6; SmallWaters; Chicago,
IL).14
The Method of Confirmatory Factor Analysis
Confirmatory factor analysis is a special case of structural
equation modeling. This method is a standard statistical approach in
other fields of applications, especially in the psychometric
literature. However, because to our knowledge with only a few
exceptions15
16
17
18
19
20
these or related approaches are rarely
applied in ophthalmology, we provide a short description of how the
method works.
Assuming that a gold standard exists that perfectly measures glaucomatous damage, then any diagnostic procedure could be judged by the size of the correlation r between this procedure and the perfect gold standard. If the value of this correlation lies near 1, the procedure would measure glaucomatous damage very accurately. Now, assume that we knew for two different measures the correlations with this gold standard, say r1 and r2. Then it can be shown by simple calculations that the correlation between the two measures, say r12, would be at least equal to the product r1 · r2. For example, if the first measure correlated to the gold standard with r1 equal to 0.8 and the second measure with r2 equal to 0.9, then the correlation r12 between both measures would be at least 0.72. However, under certain circumstances a correlation r12 would be considerably larger than r1 · r2. This would be the case, if the two measures were also influenced by common factors different from the glaucoma disease. If both measures were psychophysical, such a common factor could be, for example, the concentration of the patient. The dependency of both measures on the concentration of the patient would increase the correlation coefficient. This, of course, would not increase the validity of both measures concerning the underlying disease. Procedures that do not share common factors different from glaucomatous damage and for which therefore r12 equals exactly r1 · r2, are called conditionally independent.
If a perfect gold standard does not exist, only the pairwise
correlations r12 between diagnostic
measures can be determined. However, if at least three measures are
conditionally independent, then it is possible to calculate from the
pairwise correlations r12,
r13,
r23 the values of
r1,
r2, and
r3 in absence of any gold standard
procedure. The system of equations
![]() | (1) |
![]() | (2) |
However, the crucial assumption of conditional independence cannot be tested for only three measures. Only for more than three measures can the suitability of the model be examined. With, for example, five procedures, 10 correlations are available, and the system (1) contains 10 equations for only five unknowns. Moreover, with more than three measures the crucial assumption may be weakened to a certain degree. There may be some additional correlations quantified between the measures because of factors distinct from the glaucoma disease. The price is that exact formulae are not available, and numerical algorithms have to be used. In addition to the determination of the path coefficient, the model also allows the computation of an index, quantifying the damage for the individual patient.22 This index comprises the best approximation of a gold standard by using the procedures under investigation.
In summary, the size of the path coefficients obtained in our analysis shows the ability of the measurements to quantify glaucomatous damage. They preserve a ranking of these measurements. The measurement that shows the largest coefficient is able to quantify glaucomatous damage best. The conditional independence assumption is essential in our approach.
Confirmatory Analysis for Global Glaucomatous Damage
The basis of the analysis is the fact that all five procedures
under investigation measure global glaucomatous damage by using clearly
different approaches. With the exceptions listed below, we therefore
assume that the observed correlations are due to the global
glaucomatous damage. However, three limitations have to be respected:
1) The decrease of the NRRA may be observable before sensory damages,
2) perimetry and the flicker test may also correlate among each other
because of some general psychophysical fitness, and 3) the
electrophysiological procedures (pattern-reversal ERG, blue-on-yellow
VEP) may also correlate, because the same technical device is used
(Fig. 1) . The assumptions of conditional independence were checked in several
ways: First, we performed correlation analyses in the control group. In
this group, no glaucomatous damage is present. Therefore, we do not
expect correlations between diagnostic measures of glaucoma within this
group. In contrast, if we observe such correlations, factors different
from glaucoma influence the respective diagnostic measures. A similar
analysis concerning only the correlations of the rim area with
functional test results has been performed previously.23
Second, for all pairs of measures we compared the correlations in the
sample to the correlations predicted by the model. This answers the
question of whether the smaller set of path coefficients is able to
explain the larger set of pairwise correlations. Third, for all
possible groups of three and four variables we computed the path
coefficients and compared the different results. If the path
coefficients differed in these analyses, this would contradict the
assumptions of our model.
|
2 test.14 | Results |
|---|
|
|
|---|
|
|
|
|
|
|
Subgroup with NRRA of at Least 1.35 mm2.
In this group a path analysis was not adequate, as can be seen from
Table 2
and also because of the small size of the sample.
Therefore, the structure depicted in Figure 1
may not be true.
However, correlation analysis showed a strong association between the
flicker sensitivity and the perimetric mean defect and to some lower
degree between the perimetric mean defect and the peak latency of the
blue-on-yellow VEP (Table 2
, Fig. 5
). The correlation between the flicker sensitivity and the perimetric
mean defect was substantially higher than that expected from the
analyses in the control group. Therefore, to some degree, even in
this subsample the measurements provided information about the
early stage of glaucomatous damage.
|
| Discussion |
|---|
|
|
|---|
There have been only a few applications of factor analysis or related statistical methods in ophthalmology. In one study16 a model with a priori hypotheses was performed to explain multistage mechanisms of activities of daily living after cataract surgery. To our knowledge in all other applications, exploratory factor analysis was used. In two studies17 20 the classic application of factor analysis, identification of underlying concepts in items of questionnaires, was performed. In one study19 factor analysis was used on a grouping of clinical history, preoperative findings, and operative problems to explore associations within these findings and with postoperative visual acuity. In two older studies15 the purely exploratory approach was also used. Principal component analysis, a method devoted to data reduction but not to exploration of correlational structures, was performed on normal visual field data18 in the Barbados Eye Study.26 27 In summary, to our knowledge factor analysis as a tool for avoidance of a gold standard has not been performed before in ophthalmic research.
The main finding of our analysis was a ranking of procedures according to their ability to quantify global glaucomatous damage. The best result was obtained for the perimetric mean defect, followed by the area of the neuroretinal rim and flicker sensitivity. The two electrophysiological measures, amplitude of the pattern-reversal ERG and peak latency of the blue-on-yellow VEP, showed the lowest, nearly identical coefficients. Additionally, in the present analysis we considered the possibility of common factors different from the glaucoma disease that might increase the correlations between the procedures under investigation. The analyses clearly showed that there was indeed a common factor shared by flicker testing and perimetry that we termed psychophysical fitness. The same was not true, however, for the two electrophysiological procedures, pattern-reversal ERG and blue-on-yellow VEP, although the same technical device was used.
The splitting of the patient group into two subsamples was performed by using a cutoff point of 1 SD from the mean value of the control group. This cutoff point is probably not the same as would be chosen in a study on diagnostic discrimination of glaucoma patients from control subjects with no disease. Inspection of Figure 4 shows, however, that this choice is useful, because the statistical assumptions of correlation analyses are clearly better fulfilled in the subsample compared with the complete group. However, the ranking of procedures according to their path coefficients was identical in both analyses.
There was a significant age difference of 6 years between the control subjects and the glaucoma patients. However, all diagnostic measures were age corrected by linear regression analysis in the control group. Dichotomizing age at the median of 47 years did not lead to any significant differences between younger (at most, 46 years) and elder (at least, 47 years) control probands (results not given in detail). Furthermore, the control group served only to explore unexpected correlations between the diagnostic procedures. It was not the intent of this study to prove how well the procedures were able to discriminate between patients and healthy control subjects. However, we cannot completely exclude the possibility that the concentration and compliance of control subjects who were recruited from the university staff was higher than that of the patients.
The classification of the subjects was according to qualitative morphologic criteria. These criteria are based on the same photograph as the measurement of the NRRA, which could have produced a statistical bias. However, parameter estimates did not differ systematically in the models with NRRA compared with those without (Tables 4A 4B , rows 6, 7, 12). Therefore, we did not expect a bias concerning the morphologic variable rim area, because of the morphologic selection criteria.
There was a clear discrepancy between the rather high path coefficients of the flicker sensitivity and the low weight of this measure in the final index for both models. The reason for this lies in the high correlation between the flicker sensitivity and the perimetric mean defect. If the perimetric mean defect is excluded from the analysis, the path coefficients do not change considerably (Tables 4A 4B) but now in the index the flicker test is assigned the highest weight among the remaining four procedures (results not given). Therefore, it is the path coefficient, not the weight in the index that truly reflects the usefulness of a procedure.
The perimetric mean defect was the best variable in quantifying global glaucomatous damage. However, in the present study, indeed only less than half of all glaucomatous eyes showed localized or diffuse visual field loss. For the perimetric mean defect there was no significant difference between control subjects and the subgroup of glaucoma patients with a rim area of at least 1.35 mm2 (Table 1) . This result is in accordance with the literature.28 29 A growing number of histologic and clinical studies have convincingly shown that optic nerve damage in patients with glaucoma occurs and can be detected before conventional perimetry uncovers early visual field defects.1 Clinical investigations using morphologic techniques have shown that quite a number of optic nerve head variables, such as the neuroretinal rim as a whole and measured separately in various disc sectors, the shape of the neuroretinal rim, and the presence and size of peripapillary atrophy, were abnormal in some individuals with ocular hypertension but normal findings in conventional visual field examinations.2 In contrast, the flicker test showed a significant difference between the control group and the group with NRRA at least 1.35 mm2 but was inferior to perimetry in quantifying global glaucomatous damage. Both results are not contradictory.
The reason that the perimetric mean defect behaves very well in the quantification of glaucomatous damage but much poorer in the early diagnosis of the disease lies in the great overlap of the pathologic and the normal range (4 dB) in perimetry. This has much less consequence for correlation analyses within patients than for early diagnosis of glaucoma. In the subsample with NRRA of at least 1.35 mm2, we observed no correlation between the NRRA and sensory measures. Nevertheless, the analyses showed that within this subsample compared with the healthy control group the perimetric mean defect and the flicker sensitivity correlated with each other to a much higher degree, but not to the rim area. One explanation is that morphologic diagnosis of glaucoma is not based on the NRRA alone, but also on other morphologic criteria, such as the presence of localized retinal nerve fiber layer defects. We therefore conclude that also in this study group with rim area at least 1.35 mm2 the psychophysical measures quantify global glaucomatous damage to a certain degree. Measures that reflect the status of disease to only a moderate degree may still be useful in early diagnosis. Also the amplitude of the ERG and the peak latency of the VEP did not show significant differences between the control subjects and the patients with NRRA of at least 1.35 mm2. Therefore, at least in patients with "normal" NRRA (but other morphologic signs that indicate glaucoma) both measures are not sensitive in early diagnosis of glaucoma.
One purpose of this study was to demonstrate the usefulness of confirmatory factor analysis for the problem of validating measures of glaucomatous damage. The method is most powerful if a whole group of measurements is available but a true gold standard does not exist. We believe that this situation is exactly met by many studies concerning primary open angle glaucoma. In their most simple form, formulae (2) allow evaluation of any electrophysiological procedure if quantitative morphologic data on optic nerve damage and results of a psychophysical test of global glaucomatous damage are available. This approach improves the mere calculation of correlations substantially.
| Footnotes |
|---|
Submitted for publication March 10, 1999; revised July 27, 1999; accepted September 23, 1999.
Commercial relationships policy: N.
Corresponding author: Peter Martus, Institute for Medical Informatics, Biometry and Epidemiology, University of ErlangenNürnberg, Waldstr. 6, 91054 Erlangen, Germany. peter.martus{at}imbe.med.uni-erlangen.de
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
T. Jehle, K. Wingert, C. Dimitriu, W. Meschede, J. Lasseck, M. Bach, and W. A. Lagreze Quantification of Ischemic Damage in the Rat Retina: A Comparative Study Using Evoked Potentials, Electroretinography, and Histology Invest. Ophthalmol. Vis. Sci., March 1, 2008; 49(3): 1056 - 1064. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |