Multiple Sclerosis Resource Centre

Welcome to the Multiple Sclerosis Resource Centre. This website is intended for international healthcare professionals with an interest in Multiple Sclerosis. By clicking the link below you are declaring and confirming that you are a healthcare professional

You are here

Investigating the minimal important difference in ambulation in multiple sclerosis: A disconnect between performance-based and patient-reported outcomes?

Journal of the Neurological Sciences, 1-2, 347, pages 268 - 274

Abstract

Objective

We sought to estimate the MID on two patient-reported outcome (PRO) measures that are frequently used in multiple sclerosis (MS) clinical research: the MS Walking Scale and the MS Impact Scale-29. We anchored the Minimally Important Differences with an objective measure of ambulation, the accelerometer.

Methods

This secondary analysis used longitudinal data from an observational study of symptoms and physical activity in 269 people with Relapsing–Remitting Multiple Sclerosis. Participants completed a battery of PRO questionnaires, and then wore an accelerometer for seven days at each data collection time point every six months for 2.5 years. Statistical analysis first defined Change Groups on the basis of the performance-based accelerometer scores, anchored to 0.5 standard deviation change; then change was defined on the basis of published and linked MIDs for the PROs.

Results

The performance-based (accelerometer) and PRO-based change distributions were stable over time. Raw scores among the accelerometer and PRO measures were associated with large effect sizes, and PRO change scores were associated with each other but not with accelerometer change scores.

Conclusions

These findings contradict a central assumption that may underlie clinical research studies: that a cross-sectional correlation implies that change in PROs will correspond with change in behavior/performance. Possible explanations related to accuracy of the performance-based measure, as well as response shift effects on the PROs are discussed.

Highlights

 

  • We estimated the Minimally Important Difference (MID) on patient-reported outcomes (PRO).
  • We anchored the MID with an objective measure of ambulation, the accelerometer.
  • We found that cross-sectionally, accelerometer and PROs were correlated.
  • Change scores over time for accelerometer and PROs were, however, not associated.
  • These findings contradict a central assumption of clinical research studies.

Keywords: Multiple sclerosis, Ambulation, Patient-reported outcomes, Performance measure, Longitudinal construct validity.

1. Introduction

The use of patient-reported outcomes (PROs) in medical outcome research has grown in prominence and sophistication in the past two decades. Increasingly recognized as a source of important information that is not redundant with information reported by clinicians [1] or family-member caregivers [2] , PROs provide the patient's perspective on symptom experience, symptom impact, and quality of life. Often using evaluative measurement tools which emphasize the subjective and idiographic nature of the variable human experience of health and illness, PRO tools face an increasingly rigorous validation process that characterizes and quantifies their reliability, validity, and responsiveness[3] and [4]. Technological advances in statistical software have facilitated these psychometric analyses, enabling the implementation of both classical- and item-response theory-based analyses that quantify aspects of reliability, validity and responsiveness in highly specific ways[5] and [6].

With this growth in technological prowess, the field of PRO research has developed thoughtful methods for evaluating the responsiveness of measurement tools to facilitate the interpretation of these measures [7] . Responsiveness is a key aspect of validity and recent guidelines for assessing responsiveness are useful in distinguishing types of responsiveness and how to evaluate it [8] . This growing research base on responsiveness has suggested that responsiveness is a highly contextual characteristic, affected bywhois being measured forwhatoutcomes in what research or clinical context (where) using what mode of data collection (how) and at what stage of the disease trajectory (when) [9] . Work has focused on understanding how much change is large enough to be discernible and regarded as important [10] . Referred to as theMinimally Important Difference(MID), this has been defined as “the smallest difference in score in the domain of interest which patients perceive as beneficial and which would mandate, in the absence of troublesome side effects and excessive cost, a change in the patient's management” [11] . The MID may be estimated by taking an initial or baseline assessment and a follow-up assessment, and at follow-up asking the patient how much their condition has changed (i.e., a transition rating or global rating of change) [10] . Using this transition rating as an anchor, one can estimate the mean change in the assessment that corresponds to getting worse or getting better. The methodological challenge of using such patient-reported transition ratings is the potential biases due to response shift, recall bias, and implicit theories of change[12], [13], and [14].

These potential biases have perhaps alerted investigators to examine the consistency of MIDs across studies and to note variability and inconsistency in meaningful-change metrics. Even in measures of relatively concrete behaviors, such as ambulation, there seems to be variability in the amount of change that corresponds to a person's impression of clinically-important change [15] . For example, past research on MID of the Multiple Sclerosis Walking Scale-12 (MSWS) [16] has yielded varying MID estimates, ranging from 4 to 10 points on a 100-point scale[15], [17], and [18]. Differences between patient groups or studies in what constitutes an important change could impair the comparability of PRO data on the same instrument(s) across studies [19] .

In response to the challenge of ‘moving goal posts’[20] and [21], we sought to estimate the MID on two PRO measures that are frequently used in multiple sclerosis (MS) clinical research: the MSWS [16] and the MS Impact Scale-29 (MSIS) [22] . We anchored the MIDs with an objective measure of ambulation, the accelerometer [23] . We used the well-documented robustness of the half-standard deviation of the accelerometer change score as a benchmark for clinically important change [24] to estimate the MID of the MSWS and the MSIS. We then investigated relationships between accelerometer change and PRO change over time, and examined self-efficacy as a psychosocial factor that may explain discrepancies between objective and patient-reported change.

2. Methods

2.1. Sample

This secondary analysis used data from an observational study of symptoms and physical activity over 2.5 years in people with Relapsing–Remitting Multiple Sclerosis (RRMS) [25] . The procedures were approved by an Institutional Review Board and all participants who volunteered provided written informed consent. The sample was recruited through a research advertisement posted on the National MS Society (NMSS) website and distributed through 12 mid-western chapters of the NMSS. Those who were interested in the study contacted the research team by either e-mail or a toll-free telephone call. This contact was followed by a scripted conversation with the project coordinator, who described the study procedures and undertook screening for inclusion criteria. The inclusion criteria were: (1) diagnosis of RRMS confirmed by a physician; (2) relapse-free in the previous 30 days; (3) ambulatory with or without assistance (i.e., walk independently or walk with a cane or crutch or walker or rollator); and (4) willingness to complete the study materials every 6 months over 2.5 years. Those who did not satisfy the inclusion criteria were excluded from participation.

We successfully contacted 375 of the 463 people who expressed interest in the study, and 6 were uninterested in participation after the description of the study procedures. The remaining 369 people underwent screening, 44 did not satisfy the inclusion criteria, and 5 declined voluntary participation. We sent an informed consent document (completed by the participant) and RRMS verification form (completed by the participant's treating physician) to the remaining 320 people, and 41 did not return the documents despite 3 attempts for follow-up contact. We sent study materials to the remaining 279 people, and 10 subsequently declined further participation; this distribution of materials occurred in 12 waves of about 25 participants per wave beginning in March of 2008 (wave 1) and ending in February of 2009 (wave 12). There were 269 people with RRMS who provided baseline data. Of the initial 269 people, there were 258, 253, 245, 244, and 238 who provided follow-up data 6, 12, 18, 24, and 30 months later (i.e., 88%–96% of the initial sample). This attrition involved either a change in the participant's residential address or loss of materials through the US Postal Service.

2.2. Procedure

Participants were sent an accelerometer and battery of questionnaires through the U.S. Postal Service. We further provided pre-stamped and pre-addressed envelopes for return postal service. The project coordinator called to make sure the participants received the materials and understood the instructions. The participants then completed the battery of PRO questionnaires, and then wore the accelerometer for seven days. After completing the measures and wearing the accelerometer, participants returned the study materials through the U.S. Postal Service. We contacted participants by telephone and e-mail as a reminder to return the study materials up to 3 times. We further collected any missing questionnaire data based on follow-up telephone calls. This same procedure was completed every six-months over a 2.5-year period of time. All participants received $120 remuneration; this was prorated to be $20 per completion and return of the study materials.

2.3. Measures

2.3.1. PROs

For the purpose of this secondary analysis, we focused our attention on the responsiveness of the MSWS [16] and the MSIS [22] . The MSWS is a 12-item PRO measure of the impact of MS on walking. Scores range from 0- to 100, with higher scores reflecting greater impact of MS on walking. The MSIS is a 29-item PRO that assesses the physical (20 items) and psychological (9 items) impact of MS. Scores range from 0 to 100, with higher scores reflecting greater impact of MS on functioning.

Demographic data and the Patient-Determined Disease Steps (PDDS) [26] PRO were included to describe the sample. The PDDS is a self-report measure that was modeled after and correlates highly with the Extended Disability Status Scale [27] . This measure characterizes patient disability level into 1 of 9 steps (0, normal; 1, mild disability; 2, moderate disability; 3, gait disability; 4, early cane; 5, late cane; 6, bilateral support; 7, wheelchair or scooter; 8, bedridden) [26] .

2.3.2. Performance-based measure

Community ambulation monitoring was done using the ActiGraph model 7164 accelerometer (ActivGraph, Pensacola, FL). This tool samples walking in the context of daily life where it naturally occurs to obtain ecologically ‘valid’ information (i.e., information that is generalizable to real-world, real-life experiences) [28] . This motion sensor is typically worn on a belt around the waist during the waking hours of everyday life over a period of 7 days. Recognized as a possible ‘gold standard’ measure of ambulation in MS [29] , this device captures the overall ambulatory activity undertaken in one's usual environment and across the usual range of activities [29] .

This brand of motion sensor further has acceptable accuracy across the disability spectrum (i.e., EDSS of 0–6.5 or bilateral device for ambulation) and a range of walking speeds (i.e., slow through comfortable and fast) in persons with MS [30] . The ActiGraph model 7164 accelerometer contains a single, vertical axis piezoelectric bender element that generates an electrical signal that is proportional to the force acting on it during ambulation. The acceleration/deceleration signal is digitized by an analog-to-digital converter and numerically integrated over a pre-programmed epoch interval. At the end of each interval, the integrated value of movement counts is stored in random access memory and the integrator is reset.

The monitor is programmed for start time and data collection interval and data are retrieved for analysis via a personal computer interface and software provided with the unit. The data that are downloaded from the accelerometer are then entered into Microsoft Excel for data processing. The epoch was 1 min in this study, and the accelerometers were worn on an elastic belt around the waist at the non-dominant hip during the waking hours, except while showering, bathing, and swimming, for a 7-day period. Waking hours was defined as the moment upon getting out of bed in the morning through the moment of getting into bed in the evening. The participants recorded the time that the accelerometer was worn on a log, and this was verified by inspection of the minute-by-minute accelerometer data.

Regarding data processing, we checked the validity of each day's data based on the criterion of 10 or more hours of wear time without periods of 60 min of continuous zeros (i.e., compliance) and then summed the minute-by-minute counts across each of the valid days and averaged the total daily movement counts across the valid days. This yielded accelerometer data in total movement counts per day averaged over a week, and can theoretically range between 0 and infinity. Higher scores represent more community ambulation.

2.4. Statistical analysis

All analyses were implemented using Stata 13 [31] . This set of analyses focused on understanding change over time on the MSWS and MSIS. The first set of analyses defined Change Groups on the basis of the performance-based accelerometer scores, whereas the second set defined change on the basis of the PROs.

2.4.1. Defining performance-based change

Using change on the accelerometers as the standard for clinically relevant change, we began by examining and comparing the distributions of the accelerometer change scores between each consecutive pair of time points using the Kolmogorov–Smirnov test; by comparing mean change scores using linear regression; and the standard deviation of the change scores using the F-test for the homogeneity of variances. We then calculated the standard deviation of the overall change-score distribution, and operationalizedclinically important changeto be one-half of the standard deviation on the overall accelerometer change distribution.

We then created three Change Groups of patients on the basis of this clinically important overall accelerometer change score: those who worsened over time; those who remained stable, and those who improved. Linear regression modeling tested whether there were mean differences on the MSWS, MSIS Physical and MSIS Psychological by Change Group.

We utilized Fayers and Hays'[10] and [32]linking approach to calculate a MID for MSIS Physical and MSWS, rather than estimating MID values by regression techniques, which can shrink estimates of minimally important differences. We computed correlations among measures to determine which measures were appropriate for the linking approach. The equation considers the correlation among raw scores [33] with respect to the hypothesized effect size [10] , such that measures must have a correlation of at least 0.371 to measure a large effect size in linking analysis; and a correlation of 0.24 to measure a medium effect size in linking analysis. Consequently, the linkage approach was not applied to calculate MID for MSIS-Psychological because the accelerometer and MSIS-Psychological scores were only weakly correlated (r = − 0.10).

We operationalized clinically important change to be one-half of the standard deviation on the overall accelerometer distribution [34] . Thus, the “anchor change” was one-half of the standard deviation on the overall accelerometer distribution. We tested whether the linkage approach to calculate MID change in MSWS and MSIS-Physical translated to a statistically significant difference in accelerometer change. A Pearson's chi-squared test evaluated whether the proportion of individuals meeting MID was the same among those whose accelerometer change score reflected worsening, remaining stable, and improving.

2.4.2. Defining PRO-based change

Using published MID estimates to define change on the PROs as the standard for clinically relevant change, we created Change Groups (worsened, stable, improved) on the basis of a 4, 6[17] and [18], and 10-point change [35] on the MSWS; and 8-point changes on the MSIS Physical and MSIS-Psychological subscales [36] . We compared mean accelerometer change across these groups using linear regression.

3. Results

3.1. Sample

The baseline sample consisted of 223 women and 46 men. The participants were mostly Caucasian (91%), well educated (83% had some college education or were college graduates), and reported a median household income that exceeded $40,000/year (68%). The mean age was 45.9 years (standard deviation [SD] 9.6), and the mean MS disease duration was 8.8 years (SD 7.0). The median PDDS score was 2 (interquartile range 3.0), and the MSWS score was 36.0 (SD 28.2). Those scores indicated that the sample, on average, had minimal walking impairment[16] and [37]. Information based on the PDDS suggested that over the six time-points, the following numbers of patients utilized an assistive device to walk: 57, 52, 60, 58, 60, and 56. These proportions represent approximately 20% of the sample. There were 42 patients whose assistive device status changed over time (i.e., using an assistive device at one time point and not the next, or vice versa). There were 223 people who reported being treated with a disease-modifying therapy; interferon β-1a (50%), glatiramer acetate (31%), and interferon β-1b (13%) represented the most common types of therapy. All 269 participants had a diagnosis of RRMS (see Table 1 ).

Table 1 Sample demographic characteristics.

Variable N = 269 (% or SD)
Age: Mean Years (SD) 45.85 (9.73)
Range: [19–64]
 
Gender: N (%)
Gender (% female) 223 (82.90)
 
Race: N (%)
American Indian 1 (0.37)
Asian 2 (0.74)
Black or African American 11 (4.09)
Caucasian 247 (91.82)
Hispanic/Latino 5 (1.86)
Other 3 (1.12)
 
Education: N (%)
Junior HIGH or some high school 4 (1.48)
High School graduate 42 (15.61)
1–3 years of college 67 (24.91)
College or university graduate 90 (33.46)
Master's degree 53 (19.70)
PhD or Equivalent 13 (4.83)
 
Employment status: N (%)
Unemployed 112 (41.64)
Employed 157 (58.36)
 
Marital status
Married 187 (69.52)
Single 41 (15.24)
Divorced/single 38 (14.13)
Widow/widower 3 (1.12)
BMI continuous: mean (SD) 27.19 (6.96)
 
BMI: N (%)
Underweight; < 18.5 5 (1.86)
Normal; 18.5–24.9 121 (44.98)
Overweight; 25–29.9 67 (24.91)
Obesity; > 30.0 76 (28.25)

3.2. Accelerometer distributions

None of the accelerometer distributions was significantly different from each other (combined D = 0.0792, p = 0.42; combined D = 0.0618, p = 0.72; combined D = 0.393, p = 0.99; combined D = 0.0529, p = 0.91; combined D = 0.415, p = 0.98) ( Fig. 1 ). Whereas the change-score means across time points were similar (F = 1.04, p = 0.38), there were differences in the standard deviation of accelerometer change across time points ( Fig. 2 ). The SD of accelerometer change comparing time points 4 and 5 were significantly different from the other change scores (F = 1.50, p = 0.0040; F = .5755, p = 0.0001).

gr1

Fig. 1 Mean accelerometer scores with 95% confidence intervals at each time point.

gr2

Fig. 2 Mean accelerometer change scores with 95% confidence intervals at each time point.

Accelerometer raw scores were correlated with MSWS at − 0.42 (large effect size), and with MSIS-Physical at − 0.29 (medium effect size) ( Table 2 ).

Table 2 Correlations among PRO raw scores.

  1 2 3 4
1. Multiple sclerosis impact scale — physical 1      
2. Multiple sclerosis impact scale — psychological 0.73 ***

(0.71–0.75)
1    
3. Multiple sclerosis walking scale 0.79 ***

(0.77–0.81)
0.43 ***

(0.39–0.47)
1  
4. Accelerometer score − 0.29 ***

(− 0.34 to − 0.24)
− 0.1 **

(− 0.15 to − 0.049)
− 0.42 ***

(− 0.46 to − 0.37)
1

lowastlowast p < 0.01.

lowastlowastlowast p < 0.001.

3.3. Relationship between change scores

The mean changes on the MSWS (F = 0.51, p = 0.73), MSIS-Physical (F = 1.40, p = 0.2328), and MSIS-Psychological (F = 0.65, p = 0.63) were stable over time. Accelerometer change scores were not significantly related to MSWS change scores or and with MSIS-Physical change (r = − 0.04 in both cases, ns) ( Table 3 ).

Table 3 Correlations among PRO change scores.

  1 2 3 4
1. Change in multiple sclerosis impact scale — physical 1      
2. Change in multiple sclerosis impact scale — psychological 0.58 ***

(0.54–0.61)
1    
3. Change in multiple sclerosis walking scale 0.53 ***

(0.49–0.57)
0.37 ***

(0.33–0.42)
1  
4. Change in accelerometer score − 0.04

(− 0.10–0.018)
0.01

(− 0.050–0.071)
− 0.04

(− 0.10–0.019)
1

lowastlowastlowast p < 0.001.

3.3.1. Performance-based MIDs

There were no statistically significant differences in mean change of the MSWS (F = 0.24, p = 0.79), MSIS-Physical (F = 1.96, p = 0.14), and MSIS-Psychological (F = 0.03, p = 0.97) by Accelerometer Change Group (see Fig. 3 ).

gr3

Fig. 3 Mean change on PROs by accelerometer Change Group with 95% confidence interval.

3.3.2. PRO-based MIDs

There were no differences in mean accelerometer change for all three tested published MIDs for the MSWS (F = 0.43, 0.36, 0.21, p = 0.65, 0.70, 0.81), or for the MSIS-Physical and MSIS-Psychological (F = 1.56 and 0.41, p = 0.21 and 0.66, respectively) by Change Group ( Fig. 4 ).

gr4

Fig. 4 Proportion of patients in each accelerometer Change Group whose change on MSWS or MSIS physical was greater than the linked MIDs.

3.3.3. MID based on linking

The linkage approach led to a calculated MID of 8.72 for MSIS-Physical and 14.90 for MSWS. The MSWS and MSIS-Physical change scores based on the linking-MIDs were not associated with accelerometer change (χ2 = 0.0171 and 1.74, respectively; p = 0.99 and 0.42, respectively).

Correlation analysis of the change scoresrevealed a large effect-size association between changes on all three PRO measures, with the highest associations found between change on the MSIS- Physical and –Psychological subscales, and between MSIS-Physical and the MSWS (r = 0.58 and 0.53, respectively; p < 0.001 in both cases). The PRO change scores were not associated with the accelerometer change scores (r range = − 0.04–0.01, ns), suggesting that the PROs are assessing altogether independent constructs from the performance-based measures. An analysis of correlations among change scores by AD Use Change Group showed that among those patients who were stable, the correlations among change scores remained close to zero, but among those whose AD Use status changed, the correlation was small (r = − 0.138, p = 0.08).

3.3.4. Discussion

This study documents a lack of correlation between changes in PROs and changes in a performance-based measure that is a standard outcome in clinical trials related to exercise interventions with documented ecological validity [23] . Despite large effect sizes in the cross-sectional correlations of this performance-based measure and the physical- and walking-related PROs, the relationship of these measures over the vicissitudes of time were close to zero. Although there was a trend suggesting that among those whose AD Use status changed over follow-up, the relationship between accelerometer and MSIS Physical Change scores was small and negative, these results should be interpreted with caution due to the number of comparisons and the Type I error associated with this comparison. Overall, our findings contradict a central assumption that may underlie clinical research studies: that a cross-sectional correlation implies that change in PROs will correspond with change in behavior/performance. This is clearly not the case.

This study also documents that the published MIDs are much smaller than MIDs derived by linking with this performance-based measure, and that all MID-based change scores – published or derived by linking – were not associated with accelerometer change scores. These findings force one to question the longitudinal construct validity [38] of the measures used in this study. Although similar constructs are measured within PROs both cross-sectionally and with regard to change scores over time, they are not responsive to changes in the performance-based measure.

These findings lead to more questions than answers. If the accelerometer is a meaningful measure of ambulation and change in ambulation, then why does it not relate more strongly to changes in PRO measures of ambulation? What are some possible explanations of this disconnection? One possible explanation is that the accelerometer does not count movements that are specific enough to truly capture or map to the PRO focus of ambulation impact on quality of life. Available accelerometers merely count accelerations, presumably during stepping/walking. Some investigators believe that they are not accurate for slow walkers (< .7 m/s) so they miss important information [39] . Any adventitious leg movement may be interpreted as a stride. Some investigators recommend using bilateral ankle triaxial accelerometers with gyroscopes that reveal with high accuracy what the subject is doing (walk or run at any speed, cycle, etc.), and reveal the speed and distance of each walk, and show any leg asymmetry in stance and swing times [40] .

A second possible explanation is that the PROs are reflecting more than symptom impact. They may be reflecting adaptation, or response shift [41] . When individuals experience changes in health, they may change their internal standards, their values, or their conceptualization of the target construct[42] and [43]. Past research has documented the response shift effects can obfuscate change [44] , or show effects that are paradoxical [45] . In the present work, it is notable that there are larger error bars on PROs in the Worse and Better accelerometer Change Groups ( Fig. 3 ). This heterogeneity of variance may reflect response shift effects, as well as sample size differences between groups. These figures could be interpreted as reflecting negative and positive response shifts; that is situations where the PROs are more negative or more positive than performance-based measures, respectively. Future research should include appraisal assessments (e.g., Quality of Life Appraisal Profile[46] and [47]and cognitive interviews [48] ) to test competing explanations of these findings. For example, in addition to response shift effects, this disconnection between performance-based and PRO change measures could be implicit theories of change[12] and [21], divergent expectations [20] , or some unmeasured third variable (e.g., life event, medical or physical therapy intervention, etc.) [49] .

In summary, our study suggests that there is a fundamental discrepancy between performance-based and PRO-based assessment of ambulation over time, although cross-sectional comparisons support some underlying relationship among the constructs measured. These findings undermine key assumptions relating cross-sectional and longitudinal construct validity, and merit further investigation and replication.

Conflict of interest

The authors have no conflict of interest to disclose related to this scientific work.

Acknowledgments

This work was funded in part by a grant from the National Multiple Sclerosis Society (PI: RW Motl; RG 3926A2/1). We are grateful for helpful input from Bruce H. Dobkin, MD, FRCP.

References

  • [1] M.A. Neben-Wittich, P.J. Atherton, D.J. Schwartz, J.A. Sloan, P.C. Griffin, R.L. Deming, et al. Comparison of provider-assessed and patient-reported outcome measures of acute skin toxicity during a Phase III trial of mometasone cream versus placebo during breast radiotherapy: the North Central Cancer Treatment Group (N06C4). Int J Radiat Oncol Biol Phys. Oct 1 2011;81(2):397-402 [Clinical Trial, Phase III Comparative Study Research Support, N.I.H., Extramural]
  • [2] J.M. Sonder, L.J. Balk, L.V. Bosma, C.H. Polman, B.M. Uitdehaag. Do patient and proxy agree? Long-term changes in multiple sclerosis physical impact and walking ability on patient-reported outcome scales. Mult Scler. Apr 7 2014;
  • [3] FDA. Guidance for Industry. Administration FaD (Ed.) Patient-reported outcome measures: use in medical product development to support labeling claims (U.S. Dept. of Health and Human Services, 2006)
  • [4] B.B. Reeve, K.W. Wyrwich, A.W. Wu, G. Velikova, C.B. Terwee, C.F. Snyder, et al. ISOQOL recommends minimum standards for patient-reported outcome measures used in patient-centered outcomes and comparative effectiveness research. Qual Life Res. Oct 2013;22(8):1889-1905 [Research Support, Non-U.S. Gov't]
  • [5] S.E. Embretson, S.P. Reise. Item response theory for psychologists. (Lawrence Erlbaum Associates, Mahwah, N.J., 2000)
  • [6] R.K. Hambleton. Principles and selected applications of item response theory. R. Linn (Ed.) Educational measurement 3rd ed. (American Council on Education, New York, 1989) 147-200
  • [7] D.A. Revicki, P.A. Erickson, J.A. Sloan, A. Dueck, H. Guess, N.C. Santanello. Interpreting and reporting results based on patient-reported outcomes. Value Health. Nov–Dec 2007;10(Suppl. 2):S116-S124 [Research Support, Non-U.S. Gov't]
  • [8] C.B. Terwee, F.W. Dekker, W.M. Wiersinga, M.F. Prummel, P.M. Bossuyt. On assessing responsiveness of health-related quality of life instruments: guidelines for instrument evaluation. Qual Life Res. 2003;12(4):349-362
  • [9] D.E. Beaton, C. Bombardier, J.N. Katz, J.G. Wright. A taxonomy for responsiveness. J Clin Epidemiol. 2001;54(12):1204-1217
  • [10] P.M. Fayers, R.D. Hays. Don't middle your MIDs: regression to the mean shrinks estimates of minimally important differences. Qual Life Res. Feb 2014;23(1):1-4 [Research Support, N.I.H., Extramural]
  • [11] R. Jaeschke, J. Singer, G.H. Guyatt. Measurement of health status. Ascertaining the minimal clinically important difference. Control Clin Trials. 1989;10(4):407-415
  • [12] G. Norman. Hi! How are you? Response shift, implicit theories and differing epistemologies. Qual Life Res. 2003;12(3):239-249
  • [13] A.K. Kvam, F. Wisloff, P.M. Fayers. Minimal important differences and response shift in health-related quality of life; a longitudinal study in patients with multiple myeloma. Health Qual Life Outcomes. 2010;8:79 [Randomized Controlled Trial Research Support, Non-U.S. Gov't]
  • [14] N. Schwartz, S. Sudman. Autobiographical memory and the validity of retrospective reports. (Springer, New York, 1994)
  • [15] R.W. Motl, Y.C. Learmonth, L.A. Pilutti, D. Dlugonski, R. Klaren. Validity of minimal clinically important difference values for the multiple sclerosis walking scale-12?. Eur Neurol. 2014;71(3–4):196-202
  • [16] J.C. Hobart, A. Riazi, D.L. Lamping, R. Fitzpatrick, A. Thompson. Measuring the impact of MS on walking ability: the 12-item MS Walking Scale (MSWS-12). Neurology. 2003;60:31-36
  • [17] J. Hobart. Prolonged-release fampridine for multiple sclerosis: was the effect on walking ability clinically significant. Mult Scler. 2010;16:S172
  • [18] R.H. Dworkin, D.C. Turk, K.W. Wyrwich, D. Beaton, C.S. Cleeland, J.T. Farrar, et al. Interpreting the clinical importance of treatment outcomes in chronic pain clinical trials: IMMPACT recommendations. J Pain. Feb 2008;9(2):105-121 [Consensus Development Conference Research Support, Non-U.S. Gov't]
  • [19] H.J. Schunemann, G.H. Guyatt. Commentary—goodbye M(C)ID! Hello MID, where do you come from?. Health Serv Res. Apr 2005;40(2):593-597 [Comment]
  • [20] J.A. Finkelstein, H. Razmjou, C.E. Schwartz. Response shift and outcome assessment in orthopedic surgery: is there is a difference between complete vs. partial treatment?. J Clin Epidemiol. 2009;82:1189-1190
  • [21] J.A. Finkelstein, B.R. Quaranto, C.E. Schwartz. Threats to the internal validity of spinal surgery outcome assessment: recalibration response shift or implicit theories of change?. Appl Qual Life Res. 2013;22(9):2255-2264
  • [22] J.C. Hobart, D.L. Lamping, R. Fitzpatrick, A. Riazi, A.J. Thompson. The multiple sclerosis impact scale (MSIS-29): a new patient-based outcome measure. Brain. 2001;124:962-973
  • [23] R.W. Motl, B.M. Sandroff, J.J. Sosnoff. Commercially available accelerometry as an ecologically valid measure of ambulation in individuals with multiple sclerosis. Expert Rev Neurother. Sep 2012;12(9):1079-1088 [Review]
  • [24] G.R. Norman, J.A. Sloan, K.W. Wyrwich. The truly remarkable universality of half a standard deviation: confirmation through another look. Expert Rev Pharmacoecon Outcomes Res. Oct 2004;4(5):581-585 [Comment]
  • [25] R.W. Motl, E. McAuley, B.M. Sandroff. Longitudinal change in physical activity and its correlates in relapsing–remitting multiple sclerosis. Phys Ther. Aug 2013;93(8):1037-1048 [Research Support, Non-U.S. Gov't]
  • [26] M.J. Hohol, E.J. Orav, H.L. Weiner. Disease steps in multiple sclerosis: a simple approach to evaluate disease progression. Neurology. 1995;45:251-255
  • [27] C.E. Schwartz, T. Vollmer, H. Lee. Reliability and validity of two self-report measures of impairment and disability for MS. North American Research Consortium on Multiple Sclerosis Outcomes Study Group. Neurology. 1999;52(1):63-70
  • [28] R.W. Motl, E. McAuley, D. Wynn, Y. Suh, M. Wieikert, D. Dlugonski. Symptoms and physical activity among adults with relapsing–remitting multiple sclerosis?. J Nerv Ment Dis. 2010;198:213-219
  • [29] R.W. Motl, E.M. Snook, E. McAuley, R. Gliottoni. Symptoms, self-efficacy, and physical activity among individuals with multiple sclerosis. Res Nurs Health. 2006;29:597-606
  • [30] B.M. Sandroff, R.E. Klaren, L.A. Pilutti, D. Dlugonski, R.H. Benedict, R.W. Motl. Randomized controlled trial of physical activity, cognition, and walking in multiple sclerosis. J Neurol. Feb 2014;261(2):363-372 [Research Support, Non-U.S. Gov't]
  • [31] Stata 13. (StataCorp LP, College Station, TX, 2013)
  • [32] P.M. Fayers, R.D. Hays. Should linking replace regression when mapping from profile-based measures to preference-based measures?. Value Health. Mar 2014;17(2):261-265 [Research Support, N.I.H., Extramural Research Support, U.S. Gov't, P.H.S.]
  • [33] A. Feingold. Effect sizes for growth-modeling analysis for controlled clinical trials in the same metric as for classical analysis. Psychol Methods. Mar 2009;14(1):43-53 [Research Support, N.I.H., Extramural]
  • [34] G.R. Norman, J.A. Sloan, K.W. Wyrwich. Interpretation of changes in health-related quality of life: the remarkable universality of half a standard deviation. Med Care. 2003;41(5):582-592
  • [35] R.W. Motl, L.A. Pilutti, Y.C. Learmonth, M.D. Goldman, T. Brown. Clinical importance of steps taken per day among persons with multiple sclerosis. PLoS One. 2013;8(9):e73247 [Research Support, Non-U.S. Gov't]
  • [36] L. Costelloe, K. O'Rourke, H. Kearney, C. McGuigan, L. Gribbin, M. Duggan, et al. The patient knows best: significant change in the physical component of the Multiple Sclerosis Impact Scale (MSIS-29 physical). J Neurol Neurosurg Psychiatry. Aug 2007;78(8):841-844 [Validation Studies]
  • [37] O. Hadjimichael, R.D. Kerns, M.A. Rizzo, G. Cutter, T. Vollmer. Persistent pain and uncomfortable sensations in persons with multiple sclerosis. Pain Med. 2007;127:35-41
  • [38] M.H. Liang. Longitudinal construct validity: establishment of clinical meaning in patient evaluative instruments. Med Care. Sep 2000;38(9 Suppl.):II84-II90 [Review]
  • [39] B.H. Dobkin, A. Dorsch. The promise of mHealth: daily activity monitoring and outcome assessments by wearable sensors. Neurorehabil Neural Repair. 2011;25(9):788-798
  • [40] B.H. Dobkin, X. Xu, M. Batalin, S. Thomas, W. Kaiser. Reliability and validity of bilateral ankle accelerometer algorithms for activity recognition and walking speed after stroke. Stroke. Aug 2011;42(8):2246-2250 [Research Support, Non-U.S. Gov't Validation Studies]
  • [41] C.E. Schwartz, E. Andresen, M. Nosek, G. Krahn, Measurement REPoHS. Response shift theory: important implications for measuring quality of life in individuals with disability. Arch Phys Med Rehabil. 2007;88:529-536
  • [42] M.A. Sprangers, C.E. Schwartz. Integrating response shift into health-related quality of life research: a theoretical model. Soc Sci Med. 1999;48(11):1507-1515
  • [43] C.E. Schwartz, M.A. Sprangers. Methodological approaches for assessing response shift in longitudinal health-related quality-of-life research. Soc Sci Med. 1999;48(11):1531-1548
  • [44] F.J. Oort, M.R.M. Visser, M.A.G. Sprangers. An application of structural equation modeling to detect response shifts and true change in quality of life data from cancer patients undergoing invasive surgery. Qual Life Res. 2005;14:599-609
  • [45] C.E. Schwartz, R.G. Feinberg, E. Jilinskaia, J.C. Applegate. An evaluation of a psychosocial intervention for survivors of childhood cancer: paradoxical effects of response shift over time. Psychooncology. 1999;8(4):344-354
  • [46] B.D. Rapkin, C.E. Schwartz. Toward a theoretical model of quality-of-life appraisal: implications of findings from studies of response shift. Health Qual Life Outcomes. 2004;2(1):14
  • [47] Y. Li, B. Rapkin. Classification and regression tree analysis to identify complex cognitive paths underlying quality of life response shifts: a study of individuals living with HIV/AIDS. J Clin Epidemiol. 2009;62:1138-1147
  • [48] E.F. Bloem, F.J. van Zuuren, M.A. Koeneman, B.D. Rapkin, M.R.M. Visser, C.C.E. Koning, et al. Clarifying quality of life assessment: do theoretical models capture the underlying cognitive processes?. Qual Life Res. 2008;17:1093-1102
  • [49] C.E. Schwartz, J. Finkelstein, N.E. Mayo, S. Ahmed, T.T. Sajobi (Eds.) Response shift detection in secondary data analysis: findings and implementation guidelines in: International Society for Quality of Life Research; 2012. (Springer, Budapest, Hungary, 2012)

Footnotes

a DeltaQuest Foundation, Inc., Concord, MA, USA

b Department of Medicine, Tufts University Medical School, Boston, MA, USA

c Department of Orthopaedic Surgery, Tufts University Medical School, Boston, MA, USA

d Department of Kinesiology and Community Health, University of Illinois at Urbana-Champaign, Urbana, IL, USA

lowast Corresponding author at: DeltaQuest Foundation, 31 Mitchell Road, Concord, MA 01742, USA. Tel.: + 1 978 758 1553.