You are here
Short-term suboptimal response criteria for predicting long-term non-response to first-line disease modifying therapies in multiple sclerosis: A systematic review and meta-analysis
Journal of the Neurological Sciences, February 2016, Pages 158 - 167
There is no consensus about short-term suboptimal response to first-line treatments in relapsing-remitting multiple sclerosis.
We searched studies with interferon beta or glatiramer acetate in which a long-term (≥ 2 years (y)) outcome could be predicted using short-term (≤ 1 y) suboptimal response criteria (EDSS-, imaging- and/or relapse-based). We obtained pooled diagnostic accuracy parameters for the 1-y criteria used to predict disability progression between 2–5 y.
We selected 45 articles. Eight studies allowed calculating pooled estimates of 16 criteria. The three criteria with best accuracy were: new or enlarging T2-weighted lesions (newT2) ≥ 1 (pooled sensitivity: 85.5%; specificity:70.2%; positive predictive value:48.0%; negative predictive value:93.8%), newT2 ≥ 2 (62.4%, 83.6%, 55.0% and 87.3%, respectively) and RIO score ≥ 2 (55.8%, 84.4%, 47.8% and 88.2%). Pooled percentages of suboptimal responders were 43.3%, 27.6% and 23.7%, respectively. Pooled diagnostic odds ratios were 14.6 (95% confidence interval: 1.4–155), 9.2 (1.4–59.0) and 8.2 (3.5–19.2).
All criteria had a limited predictive value. RIO score ≥ 2 at 1-y combined fair accuracy and consistency, limiting the probability of disability progression in the next years to 1 in 8 optimal responders. NewT2 ≥ 1 at 1-y had similar positive predictive value, but diminished the false negatives to 1 in 16 patients. More sensitive measures of treatment failure at short term are needed.
- All suboptimal response criteria have limited negative predictive value (74%–94%).
- The RIO score ≥ 2 criterion at 1-year combined fair accuracy and consistency.
- NewT2 ≥ 1 at 1-year had similar predictive value and diminished the false negatives.
- More sensitive measures of treatment failure at short term are needed.
Keywords: Multiple sclerosis, Interferon beta, Glatiramer acetate, Suboptimal response, Disability, Prediction.
Multiple Sclerosis (MS) is a chronic disease of the central nervous system (CNS) characterized by inflammation and destruction of myelin and axons  and . In most patients, the disease has a relapsing-remitting course during the first years, with repeated episodes of relapses. Within 10 years, approximately 50% of patients progress to secondary progressive MS (SPMS). Due to the variability in the clinical presentation and the heterogeneity in the response to disease-modifying therapies (DMTs), the long-term individual prognosis of disease is not yet feasible in an accurate manner.
Several DMTs have demonstrated sustained reduction of relapse rate and delayed disability progression versus placebo in relapsing–remitting multiple sclerosis (RRMS) , , , and . Currently, authorized first-line treatments are considered equally effective, and include interferon beta (IFN-β) and glatiramer acetate (GA). In clinical trials with these therapies, outcomes of non-response (24 months after treatment initiation) have been defined based on several criteria, such as disability progression, relapse rate, increased burden or activity detected by magnetic resonance imaging (MRI), or further neurologic or cognitive impairment , , , and .
Within the first months after DMT initiation, most patients already show persistent clinical activity, which may be considered as a suboptimal response. In these cases, possible strategies include switching to another first-line DMT or to a second-line DMT . Natalizumab and fingolimod have demonstrated high efficacy and effectiveness in patients previously non-responding to IFN-β or GA. An accurate and timely assessment of suboptimal response in this period would allow an early switch before neurological damage progresses too much.
Since the commercialization of the first IFN-β in 1993, many different criteria based on relapses, disability progression, MRI or combinations of these have been proposed for defining suboptimal response. In 2004, the Canadian Multiple Sclerosis Working Group developed recommendations based on monitoring relapses, neurological progression and MRI activity , which were subsequently evaluated in several cohorts  and . The Rio score has been recently proposed. This score is based on disability progression, relapses and MRI , which was tested (and further modified and refined) in other cohorts , , and . Other criteria have also been defined by the European Medicines Agency in the label specifications of second-line drugs  or by the drug agencies of several countries, such as Italy . The absence of an international consensus definition is probably due to several causes: the lack of a criterion with high predictive value and/or validated in sufficient number of patients, different follow-up procedures among centers, different regulatory criteria among countries, etc.
In the selection of an optimal predictive criterion, there is usually a trade-off between the measures of performance. The best criterion should be characterized by the lowest possible number of “false positives” (i.e. patients in whom treatment is unnecessarily switched) and by the lowest possible number of “false negatives” (i.e. patients in whom treatment is not switched despite having suboptimal response). Since both categories are inversely correlated and strongly associated to the degree of restriction imposed by the selected definition (Fig. 1), the more restrictive criteria fail to detect a significant number of suboptimal responders, whereas the less restrictive ones lead to an unacceptable rate of false positives.
The objectives of the present systematic review were: to describe all criteria that have been used in the literature to define long-term (≥ 2 years since treatment initiation) and short-term (≤ 2 years since treatment initiation) non-response to IFN-β or GA; to describe the predictive value of the short-term suboptimal response criteria for long-term non-response and to calculate the pooled diagnostic accuracy parameters in those criteria used in more than one cohort with similar definition of long-term non-response (increase in EDSS ≥ 1 between 2 and 5 years after treatment initiation).
See extended version in Annex 1 (online).
2.1. Literature search and eligibility criteria
Studies were searched in Pubmed, SCOPUS (MEDLINE, EMBASE), Web of SCIENCE and the lists of references of articles. The search was done on July 2nd, 2014 (Fig. 2).
We limited the search to articles published between 1993 and 2014 and written in English language (see Annex 2 for detailed search strategy). The retrieved manuscripts were further selected according to the additional criteria: adults aged 18 years and over; RRMS diagnosis; treatment with IFN-ß or glatiramer; one or more short-term suboptimal response criteria (including at least EDSS and/or MRI parameters and/or relapse rate) measured post-treatment initiation (at a maximum of 24 months after treatment initiation) and at least one long-term efficacy outcome (measured at least 24 months from treatment initiation). Only conventional MRI parameters (gadolinium-enhancing (Gd +) lesions, T2-weighted lesions and T1 hypointense lesions) were considered as short-term suboptimal response criteria.
2.2. Data extraction
The extracted data included absolute numbers of true positives, false positives, true negatives and false negatives and diagnostic accuracy parameters (specificity [proportion of responders at long term that are classified as optimal responders at short term], sensitivity [proportion of non-responders at long term that are classified as suboptimal responders at short term], positive predictive value (PPV)[proportion of sub-optimal responders at short term that will be true non-responders at long term], negative predictive value (NPV) [proportion of optimal responders at short term that will be true responders at long term] and diagnostic odds ratio (DOR)). When these values were not reported, they were requested to the authors of the publication or calculated as previously described . The data was abstracted by NV and AE.
2.3. Data analysis
For the meta-analysis, we first selected the subgroup of manuscripts that: 1) had evaluated short-term response at 1 year and 2) had used the EDSS progression between 2 and 5 years after treatment initiation, defined as an increase in EDSS ≥ 1 (or EDSS ≥ 1.5 for baseline EDSS = 0 and/or ≥ 0.5 for baseline EDSS > 5.0) as long-term non-response. This led to the selection of 13 studies evaluating 29 short-term suboptimal response criteria. Second, we performed meta-analyses to obtain pooled values of the suboptimal response rate and the diagnostic accuracy for all the criteria evaluated in more than one cohort (16 criteria). The Meta-DiSc 1.4 software  was used for the analyses. The pooled parameters were calculated using a random effects model (DerSimonian–Laird) with correction for overdispersion. The heterogeneity was evaluated by Cochran's Q-statistic, as well as the I2-statistic; a p value < 0.10 in Q-statistic, or I2 > 50% indicated heterogeneity.
Diagnostic accuracy was assessed with the combined results of likelihood ratio (LR) + and LR- (very good (LR + > 10, LR − < 0.1); good (LR + 5–10, LR − 0.1–0.2); fair (LR + 2–5, LR − 0.2–0.5); or poor (LR + 1–2, LR − 0.5–1)), and also using balanced accuracy (arithmetic mean of sensitivity and specificity).
3.2. Criteria for long-term non-response
There was a wide variability long-term non-response criteria (34 definitions, online Fig. 1). An increase in EDSS ≥ 1 point and the absolute EDSS value were the most common.
3.3. Increase in long-term non-response over time
Percentages of non-responder patients defined as those with EDSS increase ≥ 1 point tended to increase over time (online Fig. 2), following an approximately linear relationship, although the percentage of variance explained by time was only 19%. At 2-years, percentages ranged from 9% to 31%.
3.4. Criteria and timing used to evaluate short term suboptimal response
There was also a wide variability in the criteria used to define suboptimal response (online Figs. 3 and 4). Radiological criteria were the most commonly used, followed by clinical criteria and combined clinical-radiological criteria. Among MRI criteria, there was a similar use of Gd-enhancing lesions (Gd +), new or enlarging T2-weighted lesions (newT2) lesions or both (online Fig. 3). Overall, there were 75 different criteria (online Fig. 4).
3.5. Frequency of suboptimal responders at 1 year
Pooled percentages of suboptimal responders at 1 year in the 16 criteria that had been evaluated in more than one cohort (Table 1) ranged from 5.8 to 52.5%. There was an inverse relationship between number of lesions considered in MRI and obtained percentages of suboptimal responders (online Fig. 5). For each one more lesion required, the suboptimal response rate decreased by approximately 10 percentage points. There was a higher variability in the suboptimal responder rates observed according to new T2 lesions than in those described in studies evaluating Gd-enhancing lesions.
|Ref.||Suboptimal response criteria (1 year)||N||% of sub. resp.||Spe.||Sens.||PPV||NPV||LR +||LR −||DOR||BAC||TP||FP||FN||TN|
|||Gd + ≥ 1||394||20.8||88.3||41.7||70.0||77.6||3.6||0.66||5.4||65.0||50||32||70||242|
|Pooled estimates (95% CI)||22.5 (19.1–26.0)||83.7 (75.2–92.3)||41.9 (41.2–42.6)||45.3 (16.1–74.6)||81.8 (73.1–90.4)||2.7 (1.6–4.6)||0.68 (0.60–0.77)||4.0 (2.1–7.4)||62.8|
|||Gd + ≥ 2||370||10.3||91.4||18.2||31.6||83.7||2.1||0.89||2.4||54.8||26||54||278||12|
|||T2 (for each additional lesion)||560||NR||NR||NR||NR||NR||NR||NR||1.04||NR||NR||NR||NR||NR|
|||New T2 ≥ 1||394||39.3||83.2||91.1||71.1||95.4||5.4||0.11||49.1||87.2||109||46||11||228|
|Pooled estimates (95% CI)||43.3 (35.3–51.4)||70.2 (46.1–94.4)||85.5 (71.3–99.6)||48.0 (7.0–89.0)||93.8 (90.2–97.3)||3.1 (1.0–9.8)||0.21 (0.06–0.84)||14.6 (1.4–155)||77.8|
|||New T2 ≥ 2||394||26.1||92.0||67.5||78.6||86.6||8.4||0.35||23.8||79.7||81||22||39||252|
|Pooled estimates (95% CI)||27.6 (24.6–30.6)||83.6 (67.9–99.2)||62.4 (48.8–75.9)||55.0 (9.7–100)||87.3 (85.8–88.9)||4.3 (1.1–17.0)||0.47 (0.27–0.81)||9.2 (1.4–59.0)||73.0|
|||New T2 ≥ 3||394||14.7||96.4||40.0||82.8||78.6||11.0||0.62||17.6||68.2||48||10||72||264|
|Pooled estimates (95% CI)||16.5 (12.9–20.1)||90.5 (79.6–100)||38.2 (33.3–43.0)||56.3 (8.5–100)||82.0 (74.9–89.0)||5.0 (1.0–24.3)||0.68 (0.56–0.84)||7.3 (1.3–40.6)||64.4|
|||MRI > 2||370||26.2||78.0||45.5||30.9||86.8||2.1||0.70||2.9||61.7||30||67||36||237|
|Pooled estimates (95% CI)||31.6 (23.9–39.3)||73.8 (66.6–80.9)||54.5 (41.2–67.9)||32.9 (23.2–42.6)||87.3 (82.4–92.2)||2.2 (1.8–2.6)||0.64 (0.54–0.77)||3.4 (2.3–5.1)||63.0|
|||R (for each additional relapse)||560||NR||NR||NR||NR||NR||NR||NR||1.5||NR||NR||NR||NR||NR|
|||Disabling R ≥ 2||175||20||81.7||23.3||40.0||67.1||1.3||0.94||1.4||52.5||14||21||46||94|
|||R ≥ 1||394||27.7||78.8||42.7||47.8||75.3||2.0||0.73||2.8||60.8||51||58||69||216|
|Pooled estimates (95% CI)||25.6 (20.4–30.9)||79.2 (76.8–81.6)||41.8 (30.7–52.9)||37.5 (26.1–49.0)||82.0 (76.8–87.2)||2.0 (1.6–2.4)||0.76 (0.66–0.88)||2.7 (1.9–3.7)||61.5|
|||∆ EDSS ≥ 1/1.5||73||35.6||78.4||68.2||57.7||85.1||3.2||0.41||7.8||73.3||15||11||7||40|
|Pooled estimates (95% CI)||17.9 (5.8–30.1)||87.4 (81.2–93.6)||39.0 (11.7–66.2)||44.1 (29.3–59.0)||84.9 (75.9–93.9)||3.0 (1.7–5.3)||0.61 (0.33–1.11)||5.0 (1.8–13.8)||63.2|
|||∆ EDSS ≥ 1/1.5 + R ≥ 1||73||9.60||96.1||22.7||71.4||74.2||5.8||0.80||7.2||59.4||5||2||17||49|
|Pooled estimates (95% CI)||6.6 (3.2–10.0)||96.0 (94.8–97.2)||16.9 (9.0–24.7)||52.0 (27.0–77.0)||81.9 (73.2–90.7)||3.9 (1.8–8.3)||0.88 (0.80–0.97)||4.5 (1.9–10.5)||56.4|
|Clinical and radiological|
|||R ≥ 1/Gd + ≥ 1/new T2 ≥ 2||370||45.9||59.5||71.2||27.6||90.5||1.8||0.48||3.6||65.4||47||123||19||181|
|||R ≥ 1 + [Gd + ≥ 1/new T2 ≥ 2]||370||13.8||88.5||24.2||31.4||84.3||2.1||0.86||2.5||56.4||16||35||50||269|
|||R ≥ 1 + Gd + ≥ 1||370||10.3||91.8||19.7||34.2||84.0||2.4||0.87||2.7||55.7||13||25||53||279|
|||R ≥ 1 + T2 ≥ 9||370||18.9||83.2||28.8||27.1||84.3||1.7||0.86||2.0||56.0||19||51||47||253|
|||MRI > 2 + R ≥ 1||73||15.1||86.3||18.2||36.4||71.0||1.3||0.95||1.4||52.2||4||7||18||44|
|Pooled estimates (95% CI)||14.0 (12.5–15.5)||89.1 (85.4–92.7)||26.0 (18.9–33.0)||37.7 (19.0–56.5)||82.5 (72.1–92.9)||2.6 (1.5–4.3)||0.84 (0.73–0.96)||3.1 (1.6–6.0)||57.6|
|||ModRIO ≥ 1||370||25.7||77.6||40.9||28.4||85.8||1.8||0.76||2.4||59.3||27||68||39||236|
|||ModRIO ≥ 2||370||5.4||95.4||9.1||30.0||82.9||2.0||0.95||2.1||52.2||6||14||60||290|
|Pooled estimates (95% CI)||15.5 (0.8–30.2)||89.1 (76.7–100)||26.4 (11.0–41.8)||50.0 (36.8–63.2)||74.5 (63–85.9)||2.2 (1.1–4.2)||0.88 (0.78–0.99)||2.5 (1.3–4.8)||57.5|
|||Refined-ModRIO ≥ 2||365||NR||66.5||54.9||51.6||69.3||1.6||0.68||2.4||60.7||79||74||65||147|
|||∆ EDSS ≥ 1/1.5 + MRI > 2||73||6.8||96.1||13.6||60.0||72.1||3.5||0.90||3.9||54.9||3||2||19||49|
|Pooled estimates (95% CI)||7.7 (6.9–8.4)||95.7 (95.2–96.2)||20.8 (8.3–33.3)||55.2 (50.9–59.5)||82.6 (71.5–93.7)||5.6 (2.7–11.5)||0.85 (0.75–0.97)||6.6 (2.7–16.3)||58.2|
|||MRI > 2 + [∆ EDSS ≥ 1/1.5/R ≥ 1]||222||16.2||89.6||36.7||50.0||83.3||3.5||0.71||5.0||63.2||18||18||31||155|
|Pooled estimates (95% CI)||20.6 (11.1–30.1)||86.1 (79.4–92.8)||46.8 (26.6–66.9)||46.2 (33.8–58.5)||86.4 (77.3–95.4)||3.6 (2.3–5.6)||0.61 (0.41–0.90)||5.6 (2.9–10.6)||66.4|
|Pooled estimates (95% CI)||39.0 (38.3–39.8)||64.9 (63.8–66.1)||56.7 (56.2–57.2)||20.9 (5.3–36.4)||90.2 (81.8–98.6)||1.6 (1.2–2.1)||0.66 (0.49–0.90)||2.4 (1.4–4.3)||60.8|
|||RIO score ≥ 1||222||46.4||58.4||63.3||30.1||84.9||1.5||0.63||2.4||60.8||31||72||18||101|
|Pooled estimates (95% CI)||52.5 (41.3–63.7)||53.6 (46.0–61.3)||76.6 (52.1–100)||29.6 (14.2–45.1)||90.0 (80.1–99.9)||1.7 (1.5–2.0)||0.22 (0.03–1.61)||7.1 (1.0–49.8)||65.1|
|||RIO score ≥ 2||222||18.5||87.9||40.8||48.8||84.0||3.4||0.67||5.0||64.3||20||21||29||152|
|Pooled estimates (95% CI)||23.7 (10.1–37.4)||84.4 (76.8–92.1)||55.8 (28.3–83.4)||47.8 (34.7–60.9)||88.2 (79.9–96.6)||3.6 (2.6–5.1)||0.38 (0.14–1.01)||8.2 (3.5–19.2)||70.1|
|||RIO score = 3||222||4.9||97.1||12.2||54.5||79.6||4.2||0.90||4.7||54.7||6||5||43||168|
|Pooled estimates (95% CI)||5.8 (1.2–10.4)||96.7 (95.1–98.3)||15.6 (4.4–26.8)||54.5 (29.3–79.7)||81.8 (73.2–90.4)||4.2 (1.8–9.5)||0.90 (0.82–0.98)||4.8 (1.9–12.1)||56.1|
a Ref.  used a slightly modified version of original mTOR criteria.
Note: when the criterion is displayed without the symbols “Δ” (= increase) or “↓” (= decrease), it means that the absolute value was considered; in EDSS criteria, when two number are displayed with a “/” symbol, it means that the required increase differed according to baseline EDSS; in combined criteria, the “/” symbol means “or” and the “+” symbol means “and”.
% of sub. resp. = percentage of suboptimal responders; BAC = balanced accuracy; CI = confidence interval; DOR = diagnostic odds ratio; EDSS = Expanded Disability Status Scale; FN = false negatives; FP = false positives; Gd = number of gadolinium-enhancing lesions; LR + = likelihood ratio positive; LR- = likelihood ratio negative; m = months; MRI = magnetic resonance imaging; n.s. = not significant; new T2 = number of new or enlarging T2-weighted lesions; NPV = negative predictive value; NR = not reported; PPV = positive predictive value; R = relapse; Sens. = sensitivity; Spe. = specificity; T1 = volume of T1-weighted lesions; T2 = volume of T2-weighted lesions; TN = true negatives; TP = true positives; y = years; Canadian TOR: [∆ EDSS ≥ 0.5 + R = 1-mild]/[R ≥ 1-mod/sev]/[R > 1-mild]/[∆ EDSS ≥ 2/1]; EMA: R ≥ 1 + [Gd + ≥ 1/T2 ≥ 9]; AIFA: R ≥ 2/[R ≥ 1 + [Gd + ≥ 1/T2 ≥ 9] + EDSS ≥ 2]; RIO score: 1 point per each of the following criteria: [∆ EDSS ≥ 1/MRI > 2/R ≥ 1]; ModRIO: 1 [[1 relapse + ≤ 5 new T2] or [> 5 new T2 + 0 relapses]]/2 [[≤ 5 new T2 + ≥ 2 relapses] or [> 5 new T2 + 1relapse]]/3 [> 5 new T2 + ≥ 2 relapses]; RefinedModRIO: 1 [[[1 relapse + ≤ 5 new T2] or [> 5 new T2 + 0 relapses]] at 1 y]/2 [[[≤ 5 new T2 + ≥ 2 relapses] or [> 5 new T2 + 1relapse]] at 1 year] or [≥ 1 relapse or ≥ 2 new T2 between 1 and 1.5 y]/3 [> 5 new T2 + ≥ 2 relapses]).
3.6. Predictive value of suboptimal response criteria at 1 year
Table 1 and Fig. 3 summarize the diagnostic accuracy parameters of the 29 suboptimal response criteria evaluated at 1 year and the pooled estimates for the 16 criteria that could be assessed in more than 1 cohort.
Overall, the most predictive criteria were those based only on MRI (pooled DOR 5.9) or on combined clinical and radiological measures (pooled DOR 3.3), whereas the criteria based only on clinical measures had, in general, lower diagnostic accuracy (pooled DOR 2.8). Among the MRI-based criteria, those based on new or enlarging T2 lesions had higher diagnostic accuracy than those based on Gd enhancing lesions: new T2 ≥ 2 had a balanced accuracy of 73% and ≥ 1 of 77.8%, whereas Gd + ≥ 2 had a value 54.8% and Gd + ≥ 1, 62.8%.
The three criteria with a balanced accuracy higher than 70% were newT2 ≥ 1, newT2 ≥ 2 and RIO score ≥ 2. The criterion of newT2 ≥ 1 had the highest balanced accuracy (77.8%), with a pooled sensitivity of 85.5% and a pooled specificity of 70.2%, and classified 43.3% of patients as suboptimal responders. The criteria of newT2 ≥ 2 and RIO score ≥ 2 were more restrictive (suboptimal responder rates: 27.6% and 23.7%, respectively), which increased specificity to 83.6% and 84.4%, respectively, although the sensitivity was considerably decreased in both cases (62.4% and 55.8%). Pooled DOR was highest for the newT2 ≥ 1 criterion and similar between the other two, but heterogeneity was much higher for the MRI-based ones, which resulted in a much wider 95% CIs and heterogeneous results (see Table 1).
Fig. 4 shows the relationship diagnostic accuracy and percentage of suboptimal responders. A lower restrictiveness of the criterion was strongly and positively associated with sensitivity (R2 = 0.87, p < 0.001), and negatively associated with specificity (R2 = 0.95, p < 0.001). There was also a trend towards a positive association with DOR, but it did not achieve statistical significance (R2 = 0.12, p = 0.094).
Table 2 summarizes the diagnostic odds ratios according to different cut-off points (at least 1, 2 or 3 events) for the main criteria used in the clinical practice (Gd + lesions, newT2 lesions, relapses and RIO score). The presence of at least one newT2 lesion was the most predictive event, followed by at least two newT2 lesions or two points in the RIO score. The appearance of 1 or 2 isolated relapses or 1 or 2 isolated Gd + lesions had the lowest predictive value.
|Number of lesions/relapses/points (as applicable)||Diagnostic odds ratio|
|Gd +||New T2||Relapses||Rio score (increase in EDSS ≥ 1 and/or MRI > 2 and/or R ≥ 1)|
a ≥ 2 “disabling” relapses;
NR = not reported.
3.7. Sensitivity analyses excluding early events
When data was available, we repeated the calculations of the diagnostic accuracy parameters excluding events (EDSS progressions or relapses) that occurred in the first 3 or 6 months of therapy (no information was available regarding MRI lesions in the first 3 or 6 months). In general, there were minor increases in specificity and decreases in sensitivity, without relevant changes in overall accuracy.
To our knowledge, this is the first systematic review providing pooled estimates of diagnostic accuracy parameters for the most common clinical and radiological short-term suboptimal response criteria used in the literature. Recently, Dobson et al.  performed a meta-analysis including only MRI-based criteria, and concluded that either ≥ 2 new T2 lesions or ≥ 1 new Gd + lesions within the first year had a significant predictive value on disability progression in the next years. Our results agree with their findings.
There was a wide variability in both long-term non-response and short-term suboptimal response definitions, reflecting the lack of an international consensus in this field. An increase in EDSS ≥ 1 point, the most commonly used long-term non-response criterion, displayed a high between-study variability in 2-year values, which could be due to several reasons: different methodology in assessments , different efficacy of treatments (dosage, compliance, etc.) or different baseline severity (higher baseline EDSS is related to faster progression ). Since PPV and NPV are influenced by the prevalence of the predicted event, this variability introduced a source of heterogeneity in the pooled estimates.
When we selected the subgroup of studies that had evaluated increase in EDSS ≥ 1 between 2 and 5 years after treatment initiation we noticed that the criteria based only on MRI or on combined measures had higher predictive value than clinical measures alone, confirming the usefulness of MRI for assessing suboptimal response to DMTs . The reason is that MRI is able to reveal subclinical inflammatory events that occur more often than clinical events.
Among the MRI-based criteria, those based on new or enlarging T2-weighted lesions had a higher diagnostic accuracy than those based on Gd enhancing lesions. However, they had also a higher variability in suboptimal responder rates. Possible reasons could be the difficulty for measuring changes in T2 lesions burden with accuracy in the clinical practice due to suboptimal repositioning  and, moreover, the appearance of new T2 lesions before the beginning of the treatment effect, potentially reducing the predictive power of MRI. Accordingly, a reference scan could be obtained 3 to 6 months after initiating therapy, so that new T2 lesions on subsequent follow-up scans may be interpreted without the uncertainty of whether they developed before the drug became effective.
Although sensitivity and specificity are the parameters of most interest from a statistical point of view, PPV and NPV are the parameters of most interest from a clinical point of view. In this case, NPV should be prioritized since it maximizes the “true negatives” (i.e. “true” optimal responders), and minimizes the “false negatives” (i.e. suboptimal responders in whom treatment is not switched and therefore not receive therapy that might be helpful). Overall, the predictive value of all criteria was limited, since the pooled PPVs did not exceed 60% in any case and NPVs were lower than 90%, except for newT2 ≥ 1 (94%), Canadian TOR (90.2%) and RIO score ≥ 1 (90%), which would allow to correctly classify 9 in 10 optimal responders at short term. With these limitations, the three criteria with the best balanced accuracy were newT2 ≥ 1, newT2 ≥ 2 and RIO score ≥ 2. The criterion of newT2 ≥ 1, that classifies 43% of subjects as suboptimal responders, had the highest sensitivity (85%), but at expenses of lower specificity than newT2 ≥ 2 and RIO score ≥ 2 (70% vs 84% in both cases). In addition, one of the cohorts that evaluated this criterion and also newT2 ≥ 2  included patients that were followed up for more than 5 years (up to 13 years), which probably led to an overestimation of long-term non-response rate and corresponding predictive values. As a consequence, the meta-analysis of these two criteria showed a high heterogeneity and the pooled estimates were less reliable than those obtained with the RIO score ≥ 2.
Considering a long-term non-response rate around 20% (as observed in the included studies) the PPVs were 48% (newT2 ≥ 1), 55% (newT2 ≥ 2) and 48% (RIO score ≥ 2), and NPVs 94%, 87% and 88%, respectively. Thus, with the use of these three criteria, only 1 in 2 patients classified as suboptimal responders would be long-term non-responders and 1 in 8 (newT2 ≥ 2 and RIO score ≥ 2) or 1 in 16 (newT2 ≥ 1) patients considered optimal responders would develop disability progression in the next years.
From a practical point of view, the three criteria have some difficulties in the clinical practice, since they require both a baseline and a 1-year MRI, and many centers do not include MRI in the routine follow-up of patients. In addition, there may be some inter-center variability in EDSS and relapse assessments that may influence the predictive value of the RIO score. Although the minimum time needed to confirm EDSS increases is unclear, it seems that a 6-months period would minimize the chance to include patients with false or transitory increases of disability. Also patients need to be evaluated regularly (3 to 6 months) in order to avoid underestimating the number of relapses. Since the exclusion of early events (first 3 months of therapy) does not imply a significant change in the predictive value of the considered criteria, and is associated with slight decreases in sensitivity, we would not recommend it. If fact, Sormani et al.  found that new T2 lesions over the first 6 months mediated 44% of the treatment effect on relapses in the subsequent year, which also supports the need for taking into account the information from the first months.
The diagnostic accuracy values presented here reflect one more time how relevant the presence of new T2 lesions on MRI is in monitoring patients on treatment (with IFN-β or GA), since they have the highest predictive value along with the Rio Score.
Unfortunately, the specific characteristics of MS hinder to obtain criteria with acceptable predictive accuracy . Typical EDSS increase per year in range EDSS 0–6 is estimated to be 0.2 for DMT-treated patients . However, progression rates differ between patients. Approximately one quarter have “aggressive” MS and two thirds have “mild” MS . This finding has two relevant implications. First, the significance of the same amount of activity during the first year after DMT onset may differ substantially depending on what type of MS the patient has, aggressive or mild MS. In the formers, there may be still a treatment effect that slows down the progression, but in the latter the disease activity is much more probably associated to real suboptimal response. This explains the absence of high PPVs across all the criteria. Second, considering that all the cohorts have included approximately the same amount of “aggressive” patients and that the treatment is reducing the progression rate by 50%  and , there will still be one quarter of patients progressing by 1 EDSS-point within the first 2 years. This estimation is consistent with the observed percentages in the included studies (7 of 8 within the range 16 to 31%).
Therefore, new and specific markers of “aggressive” disease, feasible to use in the clinical practice and, preferably, which could be assessed in a continuous way, are needed. Some examples could include biomarkers from the immunological system or from the CNS tissue (such as neutralizing antibodies (NAb) , , , and , or serum or gene expression levels of some cytokines as interleukin (IL)IL-7 , IL-9 , IL-17 , , and  and IL-27 ). However, current immunological and pharmacogenomic studies have several limitations in design such as lack of placebo group. These markers of “aggressive” disease, if present, could allow intensifying treatment before any CNS damage occurs and, if absent, identifying the subgroup of patients with non-aggressive disease in whom the sensitivity of the traditional markers would be greater. Nevertheless, the successful identification of reliable response biomarkers will strongly depend on reaching of a consensus on the definitions of treatment response. At present, studies do not allow for discrimination between natural history of the disease and true response to treatment.
There are some of limitations in our review. The selection of studies for meta-analysis was based on a relatively homogeneous definition of disability progression but the measurement of the EDSS increase was slightly different between cohorts and, in some cases, the follow-up time was not uniform. A considerable heterogeneity was noticed in almost all comparisons, which may be related to the diverse populations evaluated and to differences in treatment dosing, schedule or compliance rates. The low number of studies using each suboptimal response criteria did not allow us to perform meta-regression for evaluating the influence of covariates of interest (such as baseline EDSS, prior relapse rate or time since diagnosis), which could have explained some of the observed heterogeneity.
In conclusion, there is a large variability in the clinical and radiological suboptimal response criteria used to predict long-term response to IFN-β or glatiramer acetate in the literature. All current criteria have a limited predictive value, with none of them achieving the 100% NPV that would allow the identification of all long-term non-responders within the first year of therapy. The RIO score ≥ 2 criterion at 1 year combines a relatively high fair accuracy and consistency across cohorts, identifying a percentage of around one quarter of patients as suboptimal responders and limiting the probability of disability progression in the next years to 1 in 8 of those patients being classified as optimal responders in the first year. The presence of a new T2 lesion on the MRI at 1 year has similar PPV than RIO score ≥ 2 (nearly 50%) and diminishes the false negatives to 1 in 16 patients (highest NPV among all evaluated criteria), which remarks the relevance of having persistent activity on MRI despite the current treatment to identify those patients on risk to develop disability progression in the next years. However, this measure has a poor specificity and accuracy. The fact that, with both criteria, 1 in 2 of patients with suboptimal response will not develop disability progression after two or more years remarks the need for investigating new, more sensitive measures of disease course and treatment failure at short term.
The following are the supplementary data related to this article.
J Rio has received speaking honoraria and personal compensation for participating on Advisory Boards from: Almirall; Bayer-Schering Healthcare; Biogen-Idec; Genzyme; Merck-Serono; Novartis; Teva and Sanofi-Aventis.
The writing of this manuscript was funded by Novartis S.A.
The manuscript does not contain clinical studies or patient data.
We would like to acknowledge Dr. Prosperini and Dr. Rojas for sending us additional data from their previously published manuscripts. David Rigau, from the Iberoamerican Cochrane Center (Hospital de la Santa Creu i Sant Pau, Barcelona) contributed to the interpretation of data and review of the manuscript. Neus Valveny, PhD, from TFS Develop, S.A., analyzed the data and provided medical writing support.
-  B.D. Trapp, J. Peterson, R.M. Ransohoff, R. Rudick, S. Mörk, L. Bö. Axonal transection in the lesions of multiple sclerosis. N. Engl. J. Med.. 1998;338:278-285 10.1056/NEJM199801293380502 Crossref
-  M. Sospedra, R. Martin. Immunology of multiple sclerosis. Annu. Rev. Immunol.. 2005;23:683-747 10.1146/annurev.immunol.23.021704.115707 Crossref
-  Interferon beta-1b is effective in relapsing-remitting multiple sclerosis. I. Clinical results of a multicenter, randomized, double-blind, placebo-controlled trial. The IFNB Multiple Sclerosis Study Group . Neurology. 1993;43:655-661
-  K.P. Johnson, B.R. Brooks, J.A. Cohen, C.C. Ford, J. Goldstein, R.P. Lisak, et al. Copolymer 1 reduces relapse rate and improves disability in relapsing-remitting multiple sclerosis: results of a phase III multicenter, double-blind, placebo-controlled trial. Neurology. 2001;57:S16-S24
-  L.D. Jacobs, D.L. Cookfair, R.A. Rudick, R.M. Herndon, J.R. Richert, A.M. Salazar, et al. Intramuscular interferon beta-1a for disease progression in relapsing multiple sclerosis. The Multiple Sclerosis Collaborative Research Group (MSCRG). Ann. Neurol.. 1996;39:285-294 10.1002/ana.410390304 Crossref
-  PRISMS (Prevention of Relapses and Disability by Interferon beta-1a Subcutaneously in Multiple Sclerosis) Study Group. Randomised double-blind placebo-controlled study of interferon beta-1a in relapsing/remitting multiple sclerosis. Lancet. 1998;352:1498-1504
-  M.S. Freedman. Treatment options for patients with multiple sclerosis who have a suboptimal response to interferon-β therapy. Eur. J. Neurol. Off. J. Eur. Fed. Neurol. Soc.. 2014;21:377-387 e18–20 10.1111/ene.12299
-  M.S. Freedman, D.G. Patry, F. Grand'Maison, M.L. Myles, D.W. Paty, D.H. Selchen. Treatment optimization in multiple sclerosis. Can. J. Neurol. Sci.. 2004;31:157-168
-  M.S. Freedman, F.G. Forrestal. Canadian treatment optimization recommendations (TOR) as a predictor of disease breakthrough in patients with multiple sclerosis treated with interferon β-1a: analysis of the PRISMS study. Mult. Scler.. 2008;14:1234-1241 10.1177/1352458508093892 Crossref
-  F. Grand'Maison, V. Bhan, M.S. Freedman, M.L. Myles, D.G. Patry, D.H. Selchen, et al. Utility of the Canadian treatment optimization recommendations (TOR) in MS care. Can. J. Neurol. Sci.. 2013;40:527-535 Crossref
-  J. Rio, J. Castillo, A. Rovira, M. Tintore, J. Sastre-Garriga, A. Horga, et al. Measures in the first year of therapy predict the response to interferon beta in MS. Mult. Scler.. 2009;15:848-853 10.1177/1352458509104591 Crossref
-  M.P. Sormani, J. Rio, M. Tintore, A. Signori, D. Li, P. Cornelisse, et al. Scoring treatment response in patients with relapsing multiple sclerosis. Mult. Scler. J.. 2013;19:605-612 10.1177/1352458512460605 Crossref
-  M.P. Sormani, A. Signori, M.L. Stromillo, N.D. Stefano. Refining response to treatment as defined by the modified Rio score. Mult. Scler. J.. 2013;19:1246-1247 10.1177/1352458513483892 Crossref
-  L. Prosperini, C.R. Mancinelli, L. De Giglio, F. De Angelis, V. Barletta, C. Pozzilli. Interferon beta failure predicted by EMA criteria or isolated MRI activity in multiple sclerosis. Mult. Scler.. 2013;20:566-576 10.1177/1352458513502399
-  European Medicines Agency, Gilenya®. Product Information, (n.d.). http://www.ema.europa.eu/docs/en_GB/document_library/EPAR_-_Product_Information/human/002202/WC500104528.pdf.
-  G.L. Mancardi, G. Tedeschi, M.P. Amato, R. D'Alessandro, F. Drago, C. Milanese, et al. Three years of experience: the Italian registry and safety data update. Neurol. Sci. Off. J. Ital. Neurol. Soc. Ital. Soc. Clin. Neurophysiol.. 2011;31(Suppl. 3):295-297 10.1007/s10072-010-0356-8 Crossref
-  D.G. Altman. Practical Statistics for Medical Research. Edición: 1st ed. (Chapman and Hall/CRC, London; New York, 1990)
-  J. Zamora, V. Abraira, A. Muriel, K. Khan, A. Coomarasamy. Meta-DiSc: a software for meta-analysis of test accuracy data. BMC Med. Res. Methodol.. 2006;6:31 10.1186/1471-2288-6-31 Crossref
-  L. Prosperini, V. Gallo, N. Petsas, G. Borriello, C. Pozzilli. One-year MRI scan predicts clinical response to interferon beta in multiple sclerosis. Eur. J. Neurol.. 2009;16:1202-1209 Crossref
-  M.P. Sormani, D.K. Li, P. Bruzzi, B. Stubinski, P. Cornelisse, S. Rocak, et al. Combined MRI lesions and relapses as a surrogate for disability in multiple sclerosis. Neurology. 2011;77:1684-1690 10.1212/WNL.0b013e31823648b9 Crossref
-  J. Rio, A. Rovira, M. Tintore, E. Huerga, C. Nos, N. Tellez, et al. Relationship between MRI lesion activity and response to IFN-beta in relapsing-remitting multiple sclerosis patients. Mult. Scler.. 2008;14:479-484 10.1177/1352458507085555 Crossref
-  J. Rio, A. Rovira, M. Tintore, J. Sastre-Garriga, J. Castillo, C. Auger, et al. Evaluating the response to glatiramer acetate in relapsing-remitting multiple sclerosis (RRMS) patients. Mult. Scler.. 2014; 10.1177/1352458514527863
-  J.I. Rojas, L. Patrucco, J. Miguez, C. Besada, E. Cristiano. Brain atrophy as a non-response predictor to interferon-beta in relapsing-remitting multiple sclerosis. Neurol. Res.. 2014;36:615-618 10.1179/1743132813y.0000000304 Crossref
-  X. Montalban. The importance of long-term data in multiple sclerosis. J. Neurol.. 2006;253:9-15 10.1007/s00415-006-6003-x
-  K. O'Rourke, C. Walsh, G. Antonelli, M. Hutchinson. Predicting beta-interferon failure in relapsing-remitting multiple sclerosis. Mult. Scler.. 2007;13:336-342 Crossref
-  J.L. Ruiz-Peña, P. Duque, G. Izquierdo. Optimization of treatment with interferon beta in multiple sclerosis. Usefulness of automatic system application criteria. BMC Neurol.. 2008;8:3 10.1186/1471-2377-8-3
-  R. Dobson, R.A. Rudick, B. Turner, K. Schmierer, G. Giovannoni. Assessing treatment response to interferon-β: is there a role for MRI?. Neurology. 2014;82:248-254 10.1212/WNL.0000000000000036 Crossref
-  C. Liu, L. Blumhardt. Disability outcome measures in therapeutic trials of relapsing-remitting multiple sclerosis: effects of heterogeneity of disease course in placebo cohorts. J. Neurol. Neurosurg. Psychiatry. 2000;68:450-457 10.1136/jnnp.68.4.450 Crossref
-  R.A. Rudick, J.C. Lee, G.R. Cutter, D.M. Miller, D. Bourdette, B. Weinstock-Guttman, et al. Disability progression in a clinical trial of relapsing-remitting multiple sclerosis eight-year follow-up. Arch. Neurol.. 2010;67:1329-1335
-  P.D. Molyneux, D.H. Miller, M. Filippi, T.A. Yousry, E.W. Radü, H.J. Adèr, et al. Visual analysis of serial T2-weighted MRI in multiple sclerosis: intra- and interobserver reproducibility. Neuroradiology. 1999;41:882-888 Crossref
-  M.P. Sormani, B. Stubinski, P. Cornelisse, S. Rocak, D. Li, N. De Stefano. Magnetic resonance active lesions as individual-level surrogate for relapses in multiple sclerosis. Mult. Scler.. 2011;17:541-549 Crossref
-  M.P. Sormani, N. De Stefano. Defining and scoring response to IFN-beta in multiple sclerosis. Nat. Rev. Neurol.. 2013;9:504-512 10.1038/nrneurol.2013.146 Crossref
-  M.G. Brown, M. Asbridge, V. Hicks, S. Kirby, T.J. Murray, P. Andreou, et al. Estimating typical multiple sclerosis disability progression speed from clinical observations. PLoS ONE. 2014;9 e105123 10.1371/journal.pone.0105123
-  T.F. Scott, C.T. Hackett, M.R. Quigley, C.J. Schramke. Relapsing multiple sclerosis patients treated with disease modifying therapy exhibit highly variable disease progression: a predictive model. Clin. Neurol. Neurosurg.. 2014;127:86-92 10.1016/j.clineuro.2014.09.008 Crossref
-  A. Al-Sabbagh. Neutralizing antibodies in MS therapy: reviewing the Rebif experience. Mult. Scler.. 2007;13:S8-S13
-  A. Bertolotto, M. Capobianco, M.P. Amato, E. Capello, R. Capra, D. Centonze, et al. Guidelines on the clinical use for the detection of neutralizing antibodies (NAbs) to IFN beta in multiple sclerosis therapy: report from the Italian Multiple Sclerosis Study group. Neurol. Sci.. 2014;35:307-316 Crossref
-  P.I. Creeke, R.A. Farrell. Clinical testing for neutralizing antibodies to interferon-beta in multiple sclerosis. Ther. Adv. Neurol. Disord.. 2013;6:3-17 10.1177/1756285612469264 Crossref
-  G. Antonelli, F. Bagnato, C. Pozzilli, E. Simeoni, S. Bastianelli, M. Currenti, et al. Development of neutralizing antibodies in patients with relapsing-remitting multiple sclerosis treated with IFN-beta1a. J. Interf. Cytokine Res.. 1998;18:345-350 Crossref
-  L.-F. Lee, R. Axtell, G.H. Tu, K. Logronio, J. Dilley, J. Yu, et al. IL-7 promotes T(H)1 development and serum IL-7 predicts clinical response to interferon-β in multiple sclerosis. Sci. Transl. Med.. 2011;3:93ra68 10.1126/scitranslmed.3002400 Crossref
-  G. Ruocco, S. Rossi, C. Motta, G. Macchiarulo, F. Barbieri, M. De Bardi, et al. T helper 9 cells induced by plasmacytoid dendritic cells regulate interleukin-17 in multiple sclerosis. Clin. Sci. Lond. Engl.. 1979;2015 10.1042/CS20140608
-  L. Pasquali, C. Lucchesi, C. Pecori, M.R. Metelli, S. Pellegrini, A. Iudice, et al. A clinical and laboratory study evaluating the profile of cytokine levels in relapsing remitting and secondary progressive multiple sclerosis. J. Neuroimmunol.. 2015;278:53-59 10.1016/j.jneuroim.2014.12.005 Crossref
-  R.C. Axtell, B.A. de Jong, K. Boniface, L.F. van der Voort, R. Bhat, P. De Sarno, et al. T helper type 1 and 17 cells determine efficacy of interferon-beta in multiple sclerosis and experimental encephalomyelitis. Nat. Med.. 2010;16:406-412 10.1038/nm.2110 Crossref
-  Z. Babaloo, M. Aliparasti, F. Babaie, S. Almasi, B. Baradaran, M. Farhoodi. The role of Th17 cells in patients with relapsing-remitting multiple sclerosis: interleukin-17A and interleukin-17F serum levels. Immunol. Lett.. 2015; 10.1016/j.imlet.2015.01.001
-  C.M. Sweeney, R. Lonergan, S.A. Basdeo, K. Kinsella, L.S. Dungan, S.C. Higgins, et al. IL-27 mediates the response to IFN-β therapy in multiple sclerosis patients by inhibiting Th17 cells. Brain Behav. Immun.. 2011;25:1170-1181 10.1016/j.bbi.2011.03.007 Crossref
a Centre d'Esclerosi Múltiple de Catalunya (CEM-Cat), Servei de Neurologia-Neuroimmunologia, Hospital Universitari Vall d'Hebron, Psg. Vall d'Hebron 119–120, Barcelona 08035, Spain
b Unidad de Esclerosis Múltiple, Hospital Universitario Virgen Macarena, Avd. Dr Fedriani, 3, Sevilla 41071, Spain
⁎ Corresponding author at: Unitat de Neuroimmunología Clínica, 2ª planta antiga EUI, Hospital Universitari Vall d'Hebron, Psg. Vall d'Hebron 119–120, Barcelona 08035, Spain.
© 2015 Elsevier B.V., All rights reserved.