|
|
||||||||
1From the Department of Optometry, University of Bradford, Bradford, United Kingdom; and the 2School of Education, Flinders University of South Australia, Bedford Park, South Australia, Australia.
| Abstract |
|---|
|
|
|---|
METHODS. Forty-three patients with cataract underwent visual acuity (VA) and contrast sensitivity (CS) testing and completed the ADVS. The data were Rasch analyzed and the value of response scale and item reduction explored. A shortened version and the original ADVS were tested for criterion validity by determining correlations with VA and CS.
RESULTS. The ADVS data contained nonnormally distributed items and items with ceiling effects and empty response categories. Therefore, items benefited from shortening the response scale, the optimum length being three responses. There was poor targeting of item difficulty to patient ability, because many patients with cataract were sufficiently able that they had no difficulty with many activities. Items were eliminated if the task was too easy or did not fit with the overall concept of visual disability determined by the Rasch model. A reduced ADVS version was established that had adequate precision, equivalent criterion validity, and improved targeting of item difficulty to patient ability, but this version was still not ideal.
CONCLUSIONS. Despite careful traditional validation, the ADVS data contained inadequacies exposed by Rasch analysis. Through Rasch scaling, particularly with response scale reduction, the ADVS can be improved, but additional questions seem to be needed to suit the more able, including patients undergoing second eye cataract surgery. There remains a need to develop Rasch-scaled measures of visual disability for use in ophthalmic outcomes research.
Questionnaire scores provide a simple and convenient numerical representation of patient-centered outcome. The overall score is usually arrived at by adding up ordinal numerical values assigned to the subjects ratings for a series of questions. Such a response scale is called a Likert scale, and simply adding up the scores is called Likert scoring.19 However, the validity of such scores has been called into question by modern test theory, which includes Rasch analysis.20 21 22 23 At issue is what justification exists for assigning numerical values to responses and how a series of scores should be added together to produce an overall score. Although Likert scaling assumes all questions are equally weighted, Rasch analysis assumes items vary in difficulty. Rasch analysis calculates item difficulty in relation to person ability and weights overall scores accordingly. The scores are on a linear scale, allowing easy comparison of measures. Other benefits of Rasch analysis include powerful investigation of instrument validity particularly the fit of items to the overall construct, and the effectiveness of targeting of items to patients.
In recent times, Rasch analysis has been applied to disability measurement for both the validation or modification of existing scales or the development of new scales in many areas of medicine, including rheumatology,24 rehabilitation medicine,25 gerontology,26 and overall health-related quality of life.27 It has been applied to measures of visual disability in low-vision populations28 29 30 31 and has been used to examine the validity of one visual disability questionnaire suitable for a cataract population: The 14-item Visual Functioning Index (VF-14).23 32
We have chosen the Activities of Daily Vision Scale (ADVS) as the subject of our investigation, because it was the first widely used and (traditionally) validated visual disability questionnaire and remains one of the most commonly used.3 6 8 14 33 34 35 36 37 38 39 Much of what is accepted about patient-centered outcomes of cataract surgery relies on the validity of the ADVS and VF-14. Therefore, we think it is important to examine the psychometric properties of the ADVS in a Rasch model.
| Methods |
|---|
|
|
|---|
Clinical Assessment
Logarithm of the minimum angle of resolution (LogMAR) visual acuity (VA) was measured monocularly and binocularly using standard Early-Treatment Diabetic Retinopathy Study (ETDRS) charts at 4 m with a luminance of 100 cd/m2 and letter-by-letter scoring, and contrast sensitivity (CS) was measured with the Pelli-Robson chart at 3 m with a luminance of 100 cd/m2 and letter-by-letter scoring.16 Only the binocular VA and CS results are included in the analyses for this study, because these have been shown to be more closely related to disability than better or worse eye measurements.40 41
Disability Assessment
Visual disability was assessed with the Activities of Daily Vision Scale (ADVS). The original ADVS contains 22 items, each of which examine the patients ability to perform an activity. These activities are listed in Table 1 . The 22 items are examined with 61 questions, usually 3 questions per item. The first is to assess whether the patient engages in the activity (if not it is "Not Applicable" which is treated as missing data), the second scales patient responses from no difficulty (5), to a little difficulty (4), moderate difficulty (3), and extreme difficulty (2). The third question asks whether the patient is unable to perform the activity because of poor vision (if not, it is missing data; if so then the most disabled score, 1, is assigned). Thus, in Table 1 , many items are assigned three questions: for example, 1ac covers driving at night. For the convenience of discussion and Rasch analysis, scores from such groups of three questions pertaining to the same item are considered to fall on one scale, which is consistent with the original ADVS scoring system.3
|
, and Rasch Analysis. The distribution of responses was examined for compliance with normality (skew and kurtosis), missing data, and ceiling effect (the percentage of responses in the most able end category of the response scale). Various versions of the ADVS were compared with the original ADVS with criterion validity testing by comparing the visual measures of VA and CS with disability scores, using Spearman rank correlations. These analyses were performed on computer (SPSS for Windows; SPSS Sciences, Chicago, IL). Rasch analysis was also performed on computer (Winsteps ver. 3.35, produced by John M. Linacre,42 which calculates the Wright and Masters43 version of Rasch model estimates, by using joint maximum likelihood estimation. In using Rasch analysis, we implicitly assume as a goal reengineering an assessment so that it is better targeted to the people who will be evaluated with it. Targeting implies that the challenge presented by the items and the response categories of the rating scale cover the same range of abilities as are found in the population to be measured. The first step in improving targeting of items to patients was to consider whether the response scale categories were used appropriately across the whole test. If this was not the case then the effect of merging response scale categories was investigated. The third step was to consider the value of removing items from the questionnaire if they were not effective in contributing to the measurement of the abilities of persons of interest.
| Results |
|---|
|
|
|---|
Descriptive Statistics
The 22 items of the ADVS are listed in Table 1 along with several criteria for assessing the quality of these items: compliance with normality (skew and kurtosis), ceiling effect (percentage of responses in the most able end category of the response scale), and the percentage of cases with missing data. The data from these 43 patients suggest that 12 items do not provide normally distributed data, if normality is defined as skew and kurtosis within the range -2.00 to +2.00.
Rasch Analysis
Figure 1 shows a patient ability/item difficulty map determined by Rasch analysis for the original 22-item ADVS. Patients (Xs on the left) appear in ascending order of ability from the bottom of the map to the top, and items (item names on the right) appear in ascending order of difficulty from the bottom to the top. Both patients and items appear along the same scale, which in this case is a linear transformation of the Rasch logit scale to fit a 0 to 100 scale (Winsteps Uscale = 8.10). In this data set, the items are, on the whole, too easy for the abilities of the patients, which is represented by the Xs located higher and item names located lower. This illustrates poor targeting of item difficulty to patient ability. It may appear that there are no items to discriminate between the more able patients (Xs not opposed by any items) at the top of the map. However, Figure 1 presents the item calibrations averaged across the 5-point rating scale. If each step of the rating scale were illustrated, some of these more able patients would be shown to be targeted by the thresholds between no difficulty and a little difficulty for the more challenging items. However, four patients were beyond all steps for all items, and therefore differences in their visual abilities are poorly discriminated by the ADVS. There is a floor effect at the bottom of the map where there are many items targeting few patients (items are too easy). If the items were well targeted to the patients, the means of the two distributions, denoted in Figure 1 by M, would be close to each other. We attempted to address the problems highlighted in Figure 1 through response-scale reduction and item reduction.
|
|
Item Reduction
Rasch analysis was used to improve the ADVS by improving internal consistency (item fit to the model) and improving targeting of the test to the population (reducing redundant or underutilized items). The Rasch model fit statistics, infit and outfit mean square, which compare the predicted responses to those observed, were used to monitor the compatibility of the data with the model. Outfit (outlier-sensitive fit) mean square is the conventional sum of squared standardized residuals and is sensitive to occasional responses that are very different from the expected response. For infit (information-weighted fit) mean square, each squared standardized residual value is first weighted by its variance and then summed. In this way, infit takes less notice of extreme responses as it is weighted to be sensitive to responses that are close to a respondents level of (in this case) visual function. Both infit and outfit mean squares have an expected value of 1, with those less than 0.80 representing items that overfit the model and are too predictable (they have at least 20% less variation than was expected). Overfitting items may be redundant or noncontributory, because they lack variance. Mean squares greater than 1.20 represent misfit (at least 20% more variance than was expected). A high item infit or outfit suggests that the item measures something different than the overall scale.44 A high item outfit may also indicate that an item is influenced by, in this case, visual disability in some patients with cataract but not in all cases. Such items would be acceptable, as long as they are not too extreme. Based on this rationale, we determined that infit mean square should drive item reduction, so more stringent criteria were used for infit and more lenient for outfit. The criteria (in order of priority) used to identify candidate items for removal were: (1) infit mean square outside 0.80 to 1.20; (2) outfit mean square outside 0.70 to 1.30; (3) high proportion of missing data; (4) ceiling effecta high proportion reporting no difficulty; and (5) skew and kurtosis outside -2.00 to +2.00.
Patient fit to the model was checked before considering items for removal to identify whether abnormally fitting patients may be contributing to abnormally fitting items. High item infit and outfit responses could be due to rogue responses from a small number of patients (unlike low valuesi.e., responses that are too predictablewhich are produced by most of the patients). Six patients had infit greater than 1.40. All patients of high infit tended to have no trouble with most items and so much trouble they could not perform one item. This was not a typical response pattern, as most patients who had difficulty with one item had difficulty with multiple items. None of the items that caused these patients difficulty were items with high infit. Therefore, none of these cases of unusual response patterns were driving items to misfit, and all patients were therefore retained in the analysis for the evaluation of items. The adequacy of the sample size was shown by the average SE of the persons (5.0 units) being about half the average size of a rating scale step (9.9 units).
Item reduction was an iterative process, one item removed at a time, fit to the model reestimated accordingly (fit is relative, so removal of items leads to changes in fit). The item with the highest number of candidate criteria, ordered by priority, was removed first. The infit and outfit mean squares for all 22 items are shown in Table 1 . The four items at the bottom of Table 1 (19ac, play cards; 18ac, prepare meals; 17ac, use a screwdriver; and 16ac, use a ruler, yardstick, or tape measure) have low infit mean squares, high skew, kurtosis, and ceiling effect and are seen at the bottom of the map in Figure 1 in a group of six items with mean values that aligned with just one patient. These four items were so easy that most patients could perform them without difficulty, and they were the first four items removed. The next item with the poorest fit statistics was 5ac, use public transport (infit 0.74, outfit 0.39), which had misfitted initially (infit 1.78) before the removal of the first four items. This volatility was probably related to the 72% missing data, and so this item was removed also. The removal of items with low infit improved the fit of several of the items that started with high infit: for example, 2ac, daytime driving (infit became 1.02), and 11ac, reading newspapers (infit became 0.96). However, two items still had high infit mean squares (and 50% missing data) and so were removed (6ac, walk downstairs in daylight, infit 1.38; 7ac, walk downstairs in dim light, infit 1.34). This resulted in a 15-item ADVS with a person separation of 2.22. Person separation is an indication of the precision with which the variability present in the patients is captured by the test, expressed as the ratio of the adjusted SD to the root mean square error. Therefore a higher patient separation (>2.0) is indicative that patients are significantly different in ability across the measurement distribution. However, this questionnaire still contains a poorly fitting item (14ac, writing checks: infit 0.77, outfit 0.26) with a large ceiling effect, skew, and kurtosis. Its removal led to another poorly fitting item. Item removal could be continued until all items fit well, and this occurred with 11-items remaining. However, person separation was decreased to 1.61, so it was decided to retain a poorly fitting item in a 15-item questionnaire, rather than lose person separation (precision). These items could be removed and replaced with other items of a difficulty level that better targets patient ability if such items were available. The original validation of the ADVS found a Cronbach
of between 0.91 and 0.943 ; in this data set it was 0.92. Cronbach
was unchanged at 0.91 for the shortened version (Table 2) .
Criterion Validity
The Spearman rank correlation of VA and CS with the original 22-item ADVS score was -0.43 and 0.45. This was not changed by Rasch scaling alone or by response scale or item reduction (Table 2) .
| Discussion |
|---|
|
|
|---|
The linear scale allows easy comparison of this relative difficulty of items and relative ability of patients. Moreover, because items and persons are measured on the same scale, the targeting of item difficulty to patient ability is readily illustrated. Reading food cans is found at the calibration point for the mean of the item group (Fig. 1) , whereas the mean of the patient group is a long way farther up the scale. This illustrates poor targeting of item difficulty to patient ability. Many of the items are too easy for this cataract population to be troubled by (e.g., preparing a meal, playing cards, or using public transport), and so the visual disability of these patients is poorly measured. Combining the response categories that cater to the most disabled cases improved the targeting of item difficulty to patient ability. This weights the scale toward the less disabled end, thus giving it more power to discriminate between the more able patients. This appears to be a clinically sensible change,45 with the shortened version of the ADVS just having to discriminate between patients with "no difficulty," "a little difficulty," and "at least a moderate amount of difficulty." Response scale reduction has also decreased skew and kurtosis for most of the items.46 Although normal data are not a prerequisite for Rasch analysis,45 47 items that are normally distributed are more likely to contribute to person separation than skewed or kurtotic items. Person separation increases, although not significantly, from 2.37 with a 5-point response scale to 2.53 with a 3-point response scale.
Some of the 22 items in the original ADVS provided poor data due to missing data, ceiling effects, or poor Rasch fit statistics. The misfit for the two walking-downstairs questions may be due to high amounts of missing data (Table 1) and/or may be because these were the only mobility questions. Although mobility is an important component of visual ability, Stelmack et al. have shown that, in a low-vision population, mobility tasks do not tend to fit well with reading tasks in a Rasch model (Stelmack J, Szlyk J, Stelmack T, Ardickas Z, Massof R, ARVO Abstract 3816, 2002). Given that the fit to a Rasch model is in part a function of the items sharing similar content, which in the ADVS is chiefly driving and reading, it is not surprising that two mobility items fit poorly. The 15-item version has one poorly fitting item (14ac writing checks) although when this is removed another fits poorly. Although these items could be removed, any benefit is counteracted by a decrease in person separation. This could be overcome if new items that were relevant to most patients with cataract and of greater visual difficulty, were added.
Our results are similar to those of Velozo et al.32 who looked at the validity of the VF-14, another traditionally validated visual disability questionnaire, in terms of a Rasch model. They too found some poorly fitting items, poor discrimination, redundancy and underutilization of response categories. They found that the VF-14s 5-category response scale could be reduced to a three-category response scale. Again, it was the categories representing the greatest difficulty that were sparingly used and could be combined. They found that the redundancy could be reduced through item removal without loss of internal consistency, but the VF-14 discrimination and redundancy problems could not be solved by response scale and item reduction alone. Velozo et al. added 10 extra items to try to improve discrimination. This was only partly effective, but they produced a VF-10 with psychometric properties superior to those of the VF-14. As all other visual disability questionnaires (e.g., VFI, VAQ, VPQ, and VDA) that have been developed use Likert scoring methods,23 it is likely that they all have the same problems, at least when used to assess visual disability in patients undergoing cataract surgery. Although Rasch scaling could be used to examine the data collected on existing visual disability questionnaires, it is likely that optimal validity will be achieved through further modification of these questionnaires or the development of new questionnaires.23 Rasch analysis has only been used in the development of three visual disability questionnaires, and these have been for low-vision populations.28 29 30 It is unlikely that these would be suitable for cataract outcomes research, because cataract populations are likely to be much less impaired.
The poor targeting of item difficulty to patient ability in the ADVS raises the question of whether our cataract population was typically or less impaired than average. The mean VA in the surgically treated eyes in this study is comparable to that in many other current series,18 48 including unilateral cataract series.49 50 51 The binocular VA is similar to that in series with mixed first eye and second eye surgeries,9 11 12 33 52 but better than that seen in bilateral cataract only series.18 Similarly, the VA in this series is better than that with comorbidity,17 52 53 and that in British patients on waiting lists in the United Kingdom.54 VA is also better in this series than in older series.55 56 It is well known that there have been changing indications for cataract surgery, due to the increased efficiency and safety of the procedure, so that it is now offered at a lower level of impairment.57 This suggests that whereas the ADVS may have been ideal when it was being developed in the late 1980s and early 1990s, it is no longer suited to the more visually able patients who undergo surgery today. It also suggests that the ADVS may be more suited to measuring disability in bilateral cataract and perhaps in cases with comorbidity. To look at the importance of first eye surgery and second eye surgery in the ranking of ability by Rasch analysis, Figure 2 shows which patients were to undergo first eye surgery and which were to have second eye surgery. It can be seen that most of the second eye patients were more able and most of the first eye patients were less able than the average patient. This suggests that the ADVS may be more suitable for patients with bilateral cataracts and less for those needing second eye surgery. However, this is a problem for outcome studies, because after first eye cataract surgery, patients are prospects as preoperative second eye surgery cases. Moreover, the ADVS and the VF-14 have been extensively used to look at the relative benefit of first and second eye cataract surgery.9 12 16 52 Also, after second eye surgery poor targeting of items to patients is an even greater problem. Therefore, new disability scales are needed that can accurately measure visual disability in these groups. Perhaps questionnaires for patients with cataract should also contain items that tap issues of relevance to patients with unilateral visual loss (e.g., stereopsis, anisometropia, and inhibition16 50 56 ) and possibly should include domains of quality of life other than visual disability. This raises the possibility that separate questionnaires may be needed, because the items relevant to patients with binocular visual loss may be different from those relevant to patients with unilateral loss.
|

of 0.92, which is comparable to the original validation.3 This is considered to be exceptionally high and may be indicative of redundancy.58 Indeed, Rasch analysis highlights redundancy within our ADVS data, which was eliminated through removal of items for the 15-item version. Redundancy is a problem if the process of creating the overall score for the questionnaire involves just adding all the item scores together. In such a case, the overall score overweights the importance of the issue that is served by redundant items.59 It is possible to have a high Cronbach
through inclusion of items that are highly correlated. Similarly, because Cronbach
is not independent of the number of items, it may be elevated by including many items. Furthermore, Cortina60 has shown that any test of 20 items would have a high Cronbach
, and its use in this case is therefore not particularly helpful. For these reasons, Cronbach
should probably be considered to be more of a traditional measure than a useful measure.23
Criterion Validity
All four reduced versions also show correlations with VA and CS that are at least as good as those found with the original 22-item version. For the 15-item version, the failure of the removal of 7 items to damage the relationship between vision and ability confirms the criterion validity of the shortened version. Moreover, it is not possible to achieve high correlations with VA if measures fall across only a narrow range, as ours did.61
| Conclusions |
|---|
|
|
|---|
| Footnotes |
|---|
Submitted for publication October 21, 2002; revised December 17, 2002, and January 26 and February 26, 2003; accepted February 28, 2003.
Disclosure: K. Pesudovs, None; E. Garamendi, None; J.P. Keeves, None; D.B. Elliott, None
The publication costs of this article were defrayed in part by page charge payment. This article must therefore be marked "advertisement" in accordance with 18 U.S.C.
1734 solely to indicate this fact.
Corresponding author: Konrad Pesudovs, Department of Optometry, University of Bradford, Richmond Road, Bradford, West Yorkshire BD7 1DP, UK; k.pesudovs{at}bradford.ac.uk.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
E. L. Lamoureux, K. Pesudovs, J. Thumboo, S.-M. Saw, and T. Y. Wong An Evaluation of the Reliability and Validity of the Visual Functioning Questionnaire (VF-11) Using Rasch Analysis in an Asian Population Invest. Ophthalmol. Vis. Sci., June 1, 2009; 50(6): 2607 - 2613. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. d. Toit, A. Palagyi, J. Ramke, G. Brian, and E. L. Lamoureux Development and Validation of a Vision-Specific Quality-of-Life Questionnaire for Timor-Leste Invest. Ophthalmol. Vis. Sci., October 1, 2008; 49(10): 4284 - 4289. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. E. Johnson and P. J. Murphy Measurement of Ocular Surface Irritation on a Linear Interval Scale with the Ocular Comfort Index Invest. Ophthalmol. Vis. Sci., October 1, 2007; 48(10): 4451 - 4458. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Denny, A. H. Marshall, M. R. Stevenson, P. M. Hart, and U. Chakravarthy Rasch Analysis of the Daily Living Tasks Dependent on Vision (DLTV) Invest. Ophthalmol. Vis. Sci., May 1, 2007; 48(5): 1976 - 1982. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. L. Lamoureux, J. F. Pallant, K. Pesudovs, G. Rees, J. B. Hassell, and J. E. Keeffe The Impact of Vision Impairment Questionnaire: An Assessment of Its Domain Structure Using Confirmatory Factor Analysis and Rasch Analysis Invest. Ophthalmol. Vis. Sci., March 1, 2007; 48(3): 1001 - 1006. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. L. Lamoureux, J. F. Pallant, K. Pesudovs, J. B. Hassell, and J. E. Keeffe The Impact of Vision Impairment Questionnaire: An Evaluation of Its Measurement Properties using Rasch Analysis Invest. Ophthalmol. Vis. Sci., November 1, 2006; 47(11): 4732 - 4741. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Pesudovs, E. Garamendi, and D. B. Elliott The Contact Lens Impact on Quality of Life (CLIQ) Questionnaire: Development and Validation. Invest. Ophthalmol. Vis. Sci., July 1, 2006; 47(7): 2789 - 2796. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Subramanian and C. Dickinson Spatial Localization in Visual Impairment Invest. Ophthalmol. Vis. Sci., January 1, 2006; 47(1): 78 - 85. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. J. Smith, C. M. Dickinson, I. Cacho, B. C. Reeves, and R. A. Harper A Randomized Controlled Trial to Determine the Effectiveness of Prism Spectacles for Patients With Age-Related Macular Degeneration Arch Ophthalmol, August 1, 2005; 123(8): 1042 - 1050. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Stelmack, J. P. Szlyk, T. R. Stelmack, P. Demers-Turco, R. T. Williams, D. Moran, and R. W. Massof Psychometric Properties of the Veterans Affairs Low-Vision Visual Functioning Questionnaire Invest. Ophthalmol. Vis. Sci., November 1, 2004; 45(11): 3919 - 3928. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Nutheti, B. R. Shamanna, S. Krishnaiah, V. K. Gothwal, R. Thomas, and G. N. Rao Perceived Visual Ability for Functional Vision Performance among Persons with Low Vision in the Indian State of Andhra Pradesh Invest. Ophthalmol. Vis. Sci., October 1, 2004; 45(10): 3458 - 3465. [Abstract] [Full Text] [PDF] |
||||
![]() |
B A Noble, R S K Loh, S MacLennan, K Pesudovs, A Reynolds, L R Bridges, J Burr, O Stewart, and S Quereshi Comparison of autologous serum eye drops with conventional therapy in a randomised controlled crossover trial for ocular surface disease Br. J. Ophthalmol., May 1, 2004; 88(5): 647 - 652. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |