|
|
||||||||
1From the Centre for Eye Research Australia, The University of Melbourne, Melbourne, Victoria, Australia; the 2Swinburne University of Technology, Melbourne, Victoria, Australia.; the 3National Health and Medical Research Council Centre for Clinical Eye Research, Flinders University and Flinders Medical Centre, Adelaide, South Australia, Australia; and 4Vision Cooperative Research Center, Sydney, New South Wales, Australia.
| Abstract |
|---|
|
|
|---|
METHODS. Three hundred fourteen first-time referrals to low-vision clinics completed the 32-item IVI. The data were Rasch-analyzed with a partial credit model using RUMM2020 software (RUMM Laboratory, Perth, WA, Australia). The overall fit of the model, response scale, individual item fit, differential item functioning, unidimensionality, and person-separation reliability were assessed.
RESULTS. Initially, 26 items displayed disordered thresholds. However, collapsing the response scale to three categories (4 items) and four categories (28 items) produced ordered response thresholds for all items. Four items with high proportions of missing responses, poor spread, high skewness, and deviation between observed and expected model curves were then removed. This adjustment produced overall fit to the Rasch model (itemtrait interaction
2 = 118.3; P = 0.32). The final mean (SD) person and item fit residuals ere 0.06 (0.85) and 0.20 (1.45), respectively. The person-separation reliability was 0.9, indicating that the scale was able to discriminate between several different groups of participants. The revised scale was well targeted to the participants, with similar mean locations for items (0.00) and persons (0.16). A significant difference between participants of mild, moderate, and severe visual impairment (ANOVA; P 0.001) supported the criterion validity of the Rasch-scaled IVI.
CONCLUSIONS. The results provide support for the measurement properties of the Rasch-scaled 28-item version of the IVI and of its potential for assessing outcomes of low-vision rehabilitation. A raw score-to-Rasch person measure conversion is supplied.
The IVI questionnaire provides six response categories for each item (ranging from not at all to cant do because of eye sight) and employs Likert scoring. Although it is implied that Likert values are monotonic with the latent trait they are endeavoring to assess, it is difficult to confirm that they possess an interval measurement component. The validity of the Likert scale, as representing an interval scale, has been questioned by proponents of Item Response Theoryin particular, the application of Rasch analysis.10 11 12 13 14 Rasch analysis offers an elegant approach to addressing several important methodological characteristics associated with scale development and construct validation, as well as providing a transformation of the ordinal raw scores to a linear interval scale permitting the use of parametric statistical techniques.15 Rasch analysis also calculates item difficulty in relation to person difficulty and assesses the scale validityin particular, the item and person fit to the overall construct.16
Although there are currently several visual functioning questionnaires available, only a few have been developed using Rasch analysis.17 18 19 20 21 22 23 24 Others have been Rasch assessed, allowing for improvements to be made to their structure.16 25 26 27 With the exception of scales such as the Veterans Affairs Low Vision Visual Functioning Questionnaire (LV VFQ-48),18 few questionnaires have the range of items to assess difficulty with daily activities and a demonstrated capacity to evaluate activities subsequent to low-vision rehabilitation. The IVI has been designed to assess the restriction of participation in daily living as well as the effectiveness of rehabilitation outcomes in low vision, unlike most vision-specific questionnaires that typically assess visual functioning. To determine whether the IVI possesses the measurement characteristics (interval scale, validity, and reliability) and is an accurate and sensitive evaluation instrument for vision rehabilitation, we used Rasch analysis on the IVI, considering item reduction if necessary.
| Methods |
|---|
|
|
|---|
18 years of age, and the ability to converse in English. Participants signed a consent form, and low-vision rehabilitation files were accessed to obtain clinical data. Ethical approval was obtained from the Royal Victorian Eye and Ear Hospitals Human Research and Ethics Committee. This research adhered to the tenets of the Declaration of Helsinki. Sociodemographic and clinical data were collected.
IVI Questionnaire
A detailed description of the IVI questionnaire has been fully published elsewhere5 and is summarized herein. The questionnaire was developed in three stages. Initially, focus groups, comprising individuals with the most common causes of impaired vision, identified activities causing restriction of participation to daily living.28 In the second stage, issues identified in focus groups were operationalized into a bank of 76 items. Existing instrumentsnamely, the Activities of Daily Vision Scale,29 the Visual Function Questionnaire,30 the National Eye Institute-Visual Function Questionnaire (NEI-VFQ),31 and the Bristol Vision-Related Quality of Life (VQOL)32 were reviewed for content and scaling relevant to the issues identified in the focus groups. Items pertinent to ocular symptoms and a persons limitation in activitiesfor example, in seeing small objectswere not included in the IVI. In the third stage, the IVI was trialed in two consecutive versions before the final 32-item version was derived. All versions of the IVI retained the core 10 questions of the VQOL, and all retained questions in each of the five provisional domains: mobility; household; personal care; consumer and social interactions; and leisure and work and emotional reaction to vision loss.
In this study, the 32-item IVI instrument was either self- or interviewer-administered, as high levels of consistency have been recorded in both methods.5 Proxy answers were not solicited from caregivers or relatives, to avoid biasing the IVI responses to the perception of another persons opinion of the participants ability. Responses to the IVI items were rated with a five-category Likert scale: not at all (0), hardly at all (1), a little (2), a fair amount (3), a lot (4), and cant do because of eyesight (5), with an additional response category, dont do because of other reasons, for 19 items. The later response was not included in computing the average overall or domain score. The wording preceding these items was: In the past month, how much has your eyesight interfered with the following activities. For the remaining 13 items, the rating scale used was: not at all (0), very rarely (1), a little of the time (2), a fair amount of the time (3), a lot of the time (4), and all the time (5). The wording preceding these items was: In the past month, how often has your eyesight made you concerned or worried about the following. Data for this study represent information collected at baseline as part of an intervention study between 2001 and 2002.
Rasch Analysis
The Rasch model was named after the Danish Mathematician Georg Rasch.33 The model specifies what should be an expected pattern of responses to items if measurement (at the interval level) is to be achieved. For the Rasch model, dichotomous33 34 and polytomous34 versions are available. The response patterns achieved are tested against what is expected; a probabilistic form of Guttman scaling35 and a variety of statistics are used to assess fit to the model.36
The Rasch model assumes that the probability of a given respondents affirming an item is a logistic function of the relative distance between the items location and the respondents location on a linear scale. If a persons ability in performing a particular activity is lower than the required ability for that particular task, the probability of the persons rating the task in the highest scoring category (in this case: cant do because of eyesight) is high. Conversely, if a persons level of ability is greater than the ability required for a particular task, the probability of the persons rating the task in the low scoring category (e.g., not at all) is high. Hence, it is expected that the probability of using any particular rating category will increase monotonically with the difference between the persons level of difficulty in performing daily activities and the level of difficulty required for the particular task.
For ease of interpretation of scores the IVI rating scale scoring was reversed for the Rasch analysis (0 as 5, 1 as 4, 2 as 3, 3 as 2, 4 as 1, and 5 as 0). A positive item, measured in logits (the unit of measure used by Rasch for calibrating items and measuring persons) on the Rasch scale indicates that the item requires a higher level of participation than the mean of the items, whereas a negative item logit suggests that the item requires a lower level of participation than the average. A positive person-logit score suggests that the persons level of participation is higher than the mean required level of difficulty for the items. Conversely, if a person-logit score is negative, the persons perceived level of participation is lower than the average required level of difficulty.
The data were evaluated for fit to the Rasch model using the RUMM2020 (Rasch unidimensional measurement models; RUMM Laboratory, Perth, WA, Australia) software,37 with the goal of assessing how well the observed data fit the expectations of the measurement model. The partial-credit approach38 (which allows each item to have its own threshold parameters) was used because the likelihood-ratio test was statistically significant (P < 0.001) indicating that the rating scale model (which requires equivalent thresholds across all items) was not appropriate. The likelihood-ratio test was still statistically significant (P < 0.001) when applied to the two subsets of items (19 and 13 items), suggesting that the partial-credit approached was more suitable. Three overall fit statistics were considered. Two were fit residuals statistics, which represent the residuals between the expected estimate and actual values for each person-item, summed over all items for each person and over all persons for each item. The residuals are transformed to approximate a z-score and represent a standardized normal distribution where perfect fit to the model would have a mean of approximately 0 and an SD of 1. An itemtrait interaction score reported as a
2, which reflects the property of invariance across the trait, was also provided. A statistically nonsignificant probability value (P > 0.05) indicates no substantial deviation from the model. Individual item or person statistics where fit residuals values >2.5 or probability values below the Bonferroni adjusted
value (i.e., 0.05/32 = 0.001) are also used to indicate misfitting to the model. In addition to these overall fit statistics, the RUMM2020 program provides an indication of person-separation reliability using the person-separation index (PSI range, 01) which indicates how well the items of the instrument separate, or spread out, the subjects in the sample. A person-separation reliability value from RUMM of 0.7 is the equivalent of a G value of 1.5, representing the ability to distinguish two distinct strata of person ability.39 40 A value of 0.9 is equivalent to a G value of 3, with the ability to distinguish four strata of person ability.
Misfit of items indicates a lack of the expected probabilistic relationship between the item and other items in the scale. This introduces noise into the measurement, diminishing the instruments quality. In the event of item misfit, two strategies were undertaken to improve the scale. First, a lack of ordered responses (disordered thresholds) was determined. Disordered thresholds occur when the response is selected by participants, with a wide range of abilities on the underlying trait being measured, or a person location between category boundaries will not give that category the greatest probability of being observed. This can occur when there are too many response options, or when the labeling of options is similar to one another, potentially confusing or open to misinterpretation (e.g., not at all, hardly at all, and a little). Collapsing the categories where disordered thresholds occur can often improve overall fit to the model. Initially, the RUMM2020 software will identify disordered thresholds. Thereafter decisions will be made how best to collapse categories (e.g., rescore a five-point scale into a four-point scale). A visual examination of the way in which categories are working will indicate possible ways to collapse categories. For example, if a category is less likely to be chosen or not appropriately used across the whole scale, it could be collapsed with the adjacent category. Where alternative collapsing strategies seem possible, that pattern which produces the best fit for the item is chosen.
Once disordered thresholds are removed, fit of data to the Rasch model is assessed by examining deviations from model expectations, including DIF (differential item functioning). DIF occurs when different groups or person factor within the sample (e.g., degree of visual impairment)despite equal levels of the underlying characteristic being measured (participation in daily living)responds in a different manner to an individual item. DIF can be detected both graphically, by inspection of the item characteristic curves, and statistically, by using analysis of variance comparing scores across each level of the person factor and across different levels of trait (referred to as class intervals).
Once fit of the data to the Rasch model is determined by the appropriate range of statistics at the model, individual item and person levels as described earlier, it is necessary to confirm that the scale is appropriately targeted to the population being assessed. It is also important to confirm the unidimensionality of the questionnaire using principal components analysis (PCA) of the residuals available in RUMM. Unidimensionality is important because it provides further evidence that the instrument is measuring the underlying trait that it is believed to measure. This is demonstrated when there are no associations in the residuals derived from the difference between observed values and model expectations (local independence). Unidimensionality is formally tested by allowing the pattern of factor loadings on the first residual to determine subsets of items. If person estimates derived from these subsets of items differ significantly (using the t-test) from the estimates derived from the full scale, a breach of the assumption of local independence is indicated.41
| Results |
|---|
|
|
|---|
|
2 = 246, P < 0.001), suggesting misfit between the data and model. The mean (SD) fit residual values were 0.42 (1.16) for items and 0.27 (1.68) for persons. Ideally, the mean and SD are expected to be closer to 0 and 1, respectively, suggesting misfit to the model by items and respondents. The person-separation reliability was 0.95. The pattern of item thresholds was first examined for disordering, suggesting that the participants could not reliably discriminate between the categories of difficulty. Disordered threshold is a violation of the measurement construct, in that there is discordance between the category probabilities and the underlying trait. Twenty-six items were found with disordered thresholds. An examination of items with disordered thresholds indicated that not all response categories had a point along the ability continuum where they were the most likely response. For example, for item mob22 safety outside of home (Fig. 1) , response categories 1 and 4 do not have a range along the ability scale where they are the category most likely to be chosen.
|
|
> 0.001, indicating no significant deviation from the model (Table 2) . Individual person-fit statistics showed that 12 (3.8%) participants had fit residuals outside the acceptable range (>2.5). Further analysis of the misfitting participants showed inconsistent patterns in the items where extreme responses were observed. On removal of these persons, the itemtrait interaction statistics improved further (
2 =179; P = 0.002). The persons fit residual also improved for mean (0.27 to 0.20) and SD (1.68 to 1.47).
|
for each of the person factors assessed.
Overall ItemTrait Interaction
Despite effective rescoring and person and item fit residual mean and SD scores approximating 0 and 1, respectively, the itemtrait interaction total value remained statistically significant (
2 =179; P = 0.002), suggesting some remaining misfit to the model. Further removal of persons did not improve the total
2 and probability values. A minimalist approach for item removal was therefore considered based on several additional criteria, including a high level of the irrelevant response category (i.e., dont do because of other reasons), ceiling effect (the percentage in the least-able end of the response sale), and skewness.16 The itemtrait interaction was used to assess scale functioning, rather than the person-separation reliability value. In addition, all items were viewed graphically to determine how well the observed model tended to fit the expected model curve in groups of responders across the trait (called class intervals). Items with good fit tend to show each of the group plots lying on the curve. Those with plots that were steeper than the curve would be considered to be overdiscriminating and those flatter than the curve, underdiscriminating.15 The items paid or voluntary work and going out to sporting events had the highest proportions of irrelevant responses (55.4% and 41.4%). The items favorite pastimes or hobbies and reading a sign across the street had ceiling effect (70%75%).
In addition, these four items showed deviations from the model curves compared with the remaining items (see Fig. 3 , showing items paid or voluntary work and going out to sporting events). Item reduction was an iterative procedure, with one item removed at a time and fit re-estimated accordingly. The item with the highest number of candidate criteria (irrelevancy, spread, skewness, and poor fit to the expected curve) ordered by priority, was removed first. Consequently, the following items were removed individually in the following order: Lei1, paid or voluntary work; Lei5, going out to sports, movies, or plays; Lei2, favorite pastimes or hobbies; and mob15, reading a sign across the street.
|
2= 118, P = 0.32). The final mean person and item fit residual values were 0.068 (SD 0.85) and 0.203 (SD 1.45), respectively. The person separation reliability score was 0.95, which indicates that the scale is able to discriminate between several different groups of participants.
PersonItem Map
The person-item map shown in Figure 4 displays the participants scores on the Rasch calibrated scale (on the lefthand side) and shows the relative difficulty levels of each of the IVI items on the righthand side. Participants having the highest level of participation and the most difficult items are at the top of the diagram. Conversely, the participants having the lowest level of participation and the least difficult items are at the bottom. Estimates of the participants perceived level of participation (in logits) were not significantly different from a normal distribution (Kolmogorov-Smirnov z-test score = 0.57; P = 0.9). There was an even spread of items across the full range of respondents scores, suggesting effective targeting of the IVI items. In addition, the mean person location logit value (0.18) indicates that, overall, the questionnaire was well targeted, with participants on average at a marginally higher level of ability than the average of the scale items (which would be 0 logit). The five most difficult items in the revised IVI were reading ordinary size print, reading labels or instructions on medicine, feeling frustrated or annoyed, worried about eyesight getting worse, and shopping, with logit scores of 2.12, 1.19, 0.75, 0.74, and 0.66, respectively. Conversely, the five least difficult items were general safety at home, spilling or breaking things, feeling lonely and isolated, feeling embarrassed, and visiting friends or family with logit scores of 1.47, 1.39, 1.08, 0.92, and 0.87, respectively.
|
|
Scoring of the IVI Questionnaire
Other investigators wanting to use the IVI questionnaire can use our validation data to convert raw scores into Rasch person measures without having to perform Rasch analysis. This conversion mainly holds for patients with complete data. Raw scores are calculated by, first, reversing scores (0, 1, 2, 3, 4, 5) to (5, 4, 3, 2, 1, 0) to give the better IVI score to the less impaired, as described in the Methods section. Second, categories are collapsed to four (3, 2, 2, 1, 1, 0) or three categories (2, 1, 1, 1, 1, 0) as described earlier in the Results section. The average of the 28 items gives the IVI raw score. This score is related to the IVI Rasch person measure, as illustrated in Figure 5 . The relationship is double asymptotic because the average raw rating has a floor and a ceiling (at 0 and 3). The relationship can be described by the double-asymptotic nonlinear regression44 : IVIperson measure = 19.72log(IVIraw score/3 IVIraw score) + 48.29. This equation can be used to convert raw scores to Rasch person measures.
|
| Discussion |
|---|
|
|
|---|
The use of Rasch analysis has enabled a detailed examination of the operation of the IVI scale. The partial credit rating scale model38 was used to evaluate the ordering of categories (threshold ordering), and the evidence suggests that the response scale of the original version of the IVI was not optimal. The original 32-item IVI used six response categories ranging from not at all to cant do because of eyesight. Analyses indicated significant overlapping between response categories, suggesting that our participants had difficulty consistently discriminating between response options. This was a problem for the very mild end of the response scale: not at all overlapped with hardly at all. Similarly, at the severe end of the scale: a lot overlapped with cant do because of eyesight. After the combination of overlapping categories, further analyses showed that a four-rating category (which could be called not at all, a little, a fair amount and cant do because of eyesight) were effective for 28 items. A three-rating category was used for the remaining four items. The reduction to a three- or four-category response scale in the measurement of visual disability is consistent with findings from other studies that have investigated response category utilization.16 17 19 20 26
The four items that were removed to achieve fit to the overall model recorded high levels of missing data, poor spread, and considerable skewness and showed deviation from the expected model curves. The inadequate fit to the expected model could be due to variability in the visual ability needed to perform specific activities, such as hobbies, or nonvisual factors like relative interest in sport (i.e., going out to sporting events) or the inherent difficulty of the activity (i.e., reading a sign across the street). It has been suggested that the variability in such items generates a substantial level of noise which contributes little to the measurement characteristics of the scale.18
Evidence of substantial construct validity of the Rasch-scaled IVI is supported by the absence of DIF for gender, degree of visual impairment, comorbidity, and effect of comorbidity on daily living. Considering the multicultural composition of the Australian population, future studies could provide a cross-cultural validation of the IVI using the DIF function. The test of local independence revealed no evidence of multidimensionality, which provides support for the unidimensionality of the revised IVI. The criterion validity of the IVI was demonstrated by its ability to discriminate significantly between participants with mild, moderate, and severe visual impairment.
The personitem map of the Rasch-scaled IVI shows good targeting of the scale, with no apparent floor or ceiling effect. We found only a few participants who did not have difficulty performing even the most difficult items and others who had substantial difficulty performing the easiest activities, consistent with a sample of visually impaired people attending low-vision rehabilitation. In addition, the good targeting of item difficulty to patients level of participation suggests that the revised IVI is suitable to assess difficulty in performing daily activities across the spectrum of visual disability in individuals living in the community. The personitem map also reveals one of the critical weaknesses of the Likert scoring which assumes that all items are similar in difficulty and all scores of the same worth and can be used in questionnaire development, to ensure accurate targeting.16 23 For example, reading ordinary size print was identified as a more difficult item to endorse than reading labels or instructions.
The item map also revealed several items representing the same level of difficulty along the ability continuum, suggesting that some items could be removed. However, the revised 28-item IVI is a relatively short questionnaire and, with a reasonable administration time (mean, 12 minutes), it is unlikely to represent a substantial respondent burden. In addition, it has been argued that low-vision rehabilitation enhances remaining vision for specific activities, and patient-specific information about the effectiveness of the intervention may be lost if items are eliminated.18 19 For example, although activities like traveling or using transport and going down steps and stairs have the same level of difficulty (0.42 and 0.41 logits, respectively), they are likely to require specific rehabilitation strategies. Although this is not important for the overall IVI score (because this represents a broad underlying construct), the outcome of low-vision rehabilitation with the IVI questionnaire could also be assessed on a question-by-question basis, in which case individual question content would be important. For example, the items relating to traveling and using transport could be used to guide the development of an intervention designed to improve orientation and mobility skills, as well as strategies to deal with a changing environment. Items relating to going down steps and stairs, especially in the home environment, could be used to assess the efficacy of interventions designed at improving the safety in and around the house and obstacle-negotiation strategies. Considering that the sensitivity of the IVI items to change after low-vision rehabilitation has not yet been established, it would be premature to edit the revised questionnaire further.
Our emotional well-being items fitted on the same scale as items measuring difficulty of performing vision specific tasks, which is different from previous findings such as the NEI-VFQ and warrants discussion. The content of a questionnaire determines the latent trait being sampled. If the content is dominated by visual disability items, then the latent trait is visual disability and items particular to other domains are unlikely to fit. This content-determined fit occurs with the NEI-VFQ, which is predominantly a visual disability questionnaire. However, if a questionnaire samples many aspects of quality of life, without the contents being dominated by a particular domain, then the underling trait is "quality of life." Other quality-of-life questionnaires have been shown to include visual disability items that fit with the overall conceptquality-of-lifebecause they are not a dominant domain (e.g., the Quality of Life Impact of Refractive Correction (QIRC) questionnaire).45 Items from disability and other domains also fit to the IVI for similar reasons. Although more than half of the items in the IVI involve "task ability" items, these are worded to sample participation and so are not strictly visual disability items. It is this complexity of sampling the impact of visual impairment that makes the IVI a global quality of life measure assessing participation of the visually impaired.
In conclusion, this study demonstrated that the application of the Rasch measurement model supports the revised 28-item IVI as a valid scale for measuring perceived restriction of participation associated with daily living activities, making it suitable for use in assessing the outcomes of low-vision rehabilitation programs. A raw score-to-Rasch person measure conversion is provided to allow other investigators to use the revised IVI without needing to use Rasch analysis. The revised 28-item IVI questionnaire is available on request.
| Footnotes |
|---|
Submitted for publication February 28, 2006; revised June 11 and 23, 2006; accepted September 19, 2006.
Disclosure: E.L. Lamoureux, None; J.F. Pallant, None; K. Pesudovs, None; J.B. Hassell, None; J.E. Keeffe, None
The publication costs of this article were defrayed in part by page charge payment. This article must therefore be marked "advertisement" in accordance with 18 U.S.C.
1734 solely to indicate this fact.
Corresponding author: Ecosse L. Lamoureux, Centre for Eye Research Australia, Department of Ophthalmology, University of Melbourne, 32 Gisborne Street, East Melbourne, Victoria, 3002, Australia; ecosse{at}unimelb.edu.au.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
E. L. Lamoureux, J. F. Pallant, K. Pesudovs, G. Rees, J. B. Hassell, and J. E. Keeffe The Effectiveness of Low-Vision Rehabilitation on Participation in Daily Living and Quality of Life Invest. Ophthalmol. Vis. Sci., April 1, 2007; 48(4): 1476 - 1482. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. L. Lamoureux, J. F. Pallant, K. Pesudovs, G. Rees, J. B. Hassell, and J. E. Keeffe The Impact of Vision Impairment Questionnaire: An Assessment of Its Domain Structure Using Confirmatory Factor Analysis and Rasch Analysis Invest. Ophthalmol. Vis. Sci., March 1, 2007; 48(3): 1001 - 1006. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |