Multiple Sclerosis Resource Centre

Welcome to the Multiple Sclerosis Resource Centre. This website is intended for international healthcare professionals with an interest in Multiple Sclerosis. By clicking the link below you are declaring and confirming that you are a healthcare professional

You are here

Bifactor structure of clinical disability in relapsing multiple sclerosis

Multiple Sclerosis and Related Disorders, 2, 3, pages 176 - 185



Multiple sclerosis (MS) can affect virtually every neurological function which complicates the conceptualization and assessment of disability. Similar challenges are encountered in other medical fields including child cognitive development and psychiatry, for instance. In these disciplines progress in diagnosis and outcome measurement has been recently achieved by capitalizing on the concept of bifactor model.


To present in accessible terms an application of bifactor confirmatory factor analysis to study the clinical disability outcomes in MS.


Data included 480 assessments on 301 patients with relapsing–remitting MS who participated in the North American interferon beta-1a clinical trial (Avonex). Measures consisted of the Expanded Disability Status Scale (EDSS), the three components of the Multiple Sclerosis Functional Composite (MSFC), and five other clinical measures of neurological functions. We determined which of three confirmatory factor analysis models (unidimensional, multidimensional, and bifactor) best described the structure of the data.


EDSS scores ranged from 0 to 8 (94% between 0 and 4). The final bifactor model fitted the data well, explained 59.4% of total variance, and provided the most useful representation of the data. In this model, the nine measures defined a scoring dimension of global neurological function (63.1% of total composite score variance) and two auxiliary dimensions of extra variability in leg and cognitive function (17.1% and 9% of total composite score variance).


Bifactor modeling is a promising approach to further understanding of the structure of disability in MS and for refining composite measures of global disability.



  • Disability assessed with the EDSS and 7 clinical measures of performance had a bifactor structure.
  • Three scoring dimensions emerged: global disability, and residual leg- and cognitive disability.
  • The concept of bifactor modeling is introduced in simple terms.
  • This approach is promising to refine clinical outcome measures of global disability in MS.

Keywords: Multiple sclerosis, Disability evaluation, Outcome measure, Multiple Sclerosis Functional Composite (MSFC), Factor analysis, Bifactor model.

1. Introduction

Multiple sclerosis (MS) can affect nearly every neurological system. This characteristic of the disease considerably complicates the conceptualization and measurement of disability ( Rudick et al., 1996 ). Expanded Disability Status Scale (EDSS; Kurtzke, 1983 ) and the Multiple Sclerosis Functional Composite (MSFC; Cutter et al., 1999 ) are two of the disability outcome measures most widely used in clinical trials of MS, but both have well-known limitations and lack simple interpretation (Hobart et al, 2000, Fox et al, 2007, and Cohen et al,). In its current form, the MSFC gives equal weight to the different domains of disability and thus does not attempt to account for the variable degree of association among disability domains. Part of the confusion and disagreements about what EDSS and MSFC scores represent could be alleviated with a composite measure of global neurological disability that does not over- or under-represent specific domains of disability.

The measurement problems associated with multi-faceted, not directly observable constructs such as disability in MS are encountered in other medical fields where rapid progress has recently been made by capitalizing on applications of the bifactor model in factor analysis (FA) and item response theory (IRT) analysis (e.g., in psychiatry, Simms et al., 2012 ; pediatric cognitive development, Willoughby et al., 2010 ; quality of life, Reeve et al., 2007 ). In this manuscript, we present an intuitive introduction to the fundamentals of confirmatory factor analysis (CFA) with special emphasis on bifactor CFA. We also present a substantive example of how these methods can be used as a starting point for the development of unconfounded composite outcome measures of global neurological disability in MS.

2. Materials and methods

In Table 1 we provide a list of (crude) rule of thumbs to guide interpretation of key assumptions, methodological options, and study results.

Table 1 Guide to interpretation of assumptions, methods, and results.

Topic Rule of thumb
Correlations among indicator measures for FA to be worth performing
 Proportion of correlations greater than 0.30 ≥50%
 Correlations suggesting indicator redundancy (to be avoided) >0.80
Normality of indicator measures
 Acceptable skew ≤2.0
 Acceptable kurtosis ≤7.0
Essential unidimensionality versus multidimensionality
 EFA: Ratio of the variance explained (i.e., “eigenvalue”) by the first factor extracted to that explained by the second factor extracted >4.0
 Bifactor CFA: Correlations between the general factor and the indicator measures are >0.30 and substantially greater than their counterparts on the secondary factors  
CFA model estimation
 Continuous indicators Maximum likelihood
 At least one ordered categorical indicator Robust weighted least square
Criteria of CFA model fit
 Chi-square goodness of fit test P>0.05
 Root Mean Square Error Approximation (RMSEA) ≤0.05–0.08
 Comparative Fit Index (CFI) ≥0.90–0.95
 Tucker–Lewis Index (TLI) ≥0.90–0.94
 Mean absolute residual correlation <0.10
Interpretation of standardized loadings (indicator-factor correlations)
 Trivial-to-weak 0.20–0.39
 Moderate 0.40–0.60
 Strong >0.60
Interpretation of between-factor correlations
 Weak <0.40
 Moderate 0.40–0.60
 Strong >0.60
Interpretation of indicator residual variance
 Large >50%

Note: The above guidelines are not fully settled and do not apply to all situations. Alternative methods or more sophisticated criteria are sometimes needed, in particular to determine the degree of unidimensionality of a set of indicator measures.

References: Beauducel and Herzberg (2006) , Bollen (1989) , Cook et al. (2009) , Jackson et al. (2009) , Norman and Streiner (2008) , Reeve et al. (2007) , Reise et al. (2007) , Reise et al. (2013) , and Schmitt (2011) .

2.1. Introduction to confirmatory factor analysis (CFA) and bifactor CFA

FA can be defined as a modeling technique used to studylatent variablesorfactors, such as global neurological disability, executive function, or depression that are conceptualized as existing on a continuum, but cannot be directly observed or measured. Under both its exploratory (EFA) and confirmatory forms (CFA), FA assumes that patients' scores on a pool of observable proxy measures, orindicators, reflect patients' positions on one or several underlying factors. FA's purpose is thus to infer the characteristics of the factors and their scales of measurement from the variances and covariances among the observable measures. EFA is primarily descriptive whereas in CFA, pre-specified hypotheses about the factors and their connections to the indicators are tested against the data. We first describe CFA and, then, bifactor CFA (most specifically as implemented in version 6 of the Mplus software package; Muthén and Muthén, 1998–2011 ).

CFA has close ties with simple linear regression. Fig. 1 a and Eq. (1) below illustrates this relationship for the simplestunidimensionalcase where each of a set of hypothetical indicator measures (e.g., a battery of cognitive neuropsychological tests) is influenced by only one common factor that one is trying to estimate (e.g., global cognitive function). For each indicator, a linear regression equation is specified so that


(1) stripin: si0001.gif

Fig. 1 Comparison of unidimensional, multidimensional, and bifactor confirmatory factor analysis models. Model 1a—Unidimensional:Yi=intercepti+loadingi(G)+residual errori. Model 1b—Multidimensional uncorrelated:Yi=intercepti+loadingF1i(F1)+loadingF2i(F2)+loadingF3i(F3)+residual errori. The variance that could be explained by a general factor is very small. As a result there are no meaningful correlations among the factors F1, F2 and F3. Model 1c—Multidimensional correlated:Yi=intercepti+loadingF1i(F1)+loadingF2i(F2)+loadingF3i(F3)+residual errori. The variance that could be explained by a general factor is substantial. This translates in noticeable correlations among factors (double-headed, curved arrows). Model 1d—Bifactor (restricted):Yi=intercepti+loadingGi(G)+loadingF1’i(F1′)+loadingF2’i(F2′)+loadingF3’i(F3′)+residual errori. This represents an alternative way to model data represented in model 1c. A general factorGis introduced in the model.Note: Each horizontal bar represents the variance of one of six indicator measures Y1–Y6. For convenience, the measures are assumed to have been standardized to have equal variance. Variables enclosed in circles are factors (i.e. latent variables). Arrows indicate correlations (standardized loadings) between factors and indicator measures. Variance explained:fx1general factorG,fx2factors F1,fx3F2,fx4F3, andfx5residual/unexplained variance. F1, F2 and F3 are usual common factors; F1′, F2′, and F3′ are the specific/auxiliary factors corresponding to F1, F2 and F3 in the bifactor model after the variance explained by the general factorGis parsed out. See Section 2.1 for further details.

“Loading” is a usual regression coefficient defining themetric of measurement; that is, by how much the score on the indicator is expected to change for one-unit change in factor score. When factor scores are assumed to be normally distributed with a mean of zero and a variance of one, as is generally the case, patients located at the mean on the factor have average rating on an indicator equal to the intercept for this indicator. Finally, when all the indicators are standardized (i.e., converted toz-scores), the resulting “standardized loadings” represent correlations between the indicators and the common factor.

In Fig. 1 a–d, data are assumed to be available for six observable measures (Y1–Y6), each represented by a horizontal bar with length equal to the variance of the measure (identical for all the measures, for simplicity). In Fig. 1 a (unidimensionalcase), the common factorGexplains a large fraction (black) of the variance of each measure. Since the variance of a measure is generally interpreted as the amount of information that it provides, there is little information in each measure that is not explained by the factor (residual variance, white). Under these and other assumptions of the classical test theory, an “MSFC-like” composite of the six indicators would have the properties of a unidimensional rating scale measuringGwith greater accuracy and reliability than any of the indicator measures individually. Group differences in composite scores would also have meaningful interpretation once the metric of the composite is defined.

Fig. 1 b presents one of several possible forms ofmultidimensional CFA models, in which most of the variance of the measures Y1 and Y2 is explained by factor F1, whereas most of the variance of the measures Y3–Y4 and Y5–Y6 is explained by the factors F2 and F3, respectively. In contrast with Fig. 1a, only a negligible fraction of the information in each measure could have been explained by a factor common to all six measures (black). As a result, the correlations among the three factors are negligible and a composite of the six measures would amount to mixing apples, carrots, and milk in variable proportions to obtain a single score of “thick or thin puree”. In a randomized clinical trial, a difference in mean score between the treatment groups would indicate that there is more “puree” in one group than the other, but there would be no clear answer to the question of what constitutes a minimal important difference score. Furthermore, because the components of the composite are confounded with each other, inconsistent conclusions would be reached across studies about which patient characteristics independently predict higher scores.

The structure of measurement described in Fig. 1 c combines elements of Fig. 1 a and b. Like the structure of disability in MS as previously described, it is neither nearly unidimensional nor strictly multidimensional. CFA suggests that three factors are needed to explain the variances and covariances among the six measures, but the model does not explicitly recognize that a substantial fraction of the variance of each measure (black) could be explained by a common factor; instead, the model allows for noticeable correlations among the factors (represented by the double-headed, curved arrows joining each pair of factors). In this case, the three components (F1, F2, and F3) of the multidimensional composite of the six measures would still be confounded with each other, though to a lesser degree than in Fig. 1 b owing to the correlations among factors.

In Fig. 1 d, we assume that abifactor CFA modelis fitted to the data that yielded model 1c ( Gibbons and Hedeker, 1992 ). The most important difference between models 1c and 1d is that a general factorGis specified in model 1d to account for the important fraction of the variance of each measure represented in black in model 1c. In the bifactor model 1d, threespecificorauxiliary factorsF1′, F2′, and F3′ each explain an additional fraction of the variance of the measures in the doublets Y1–Y2, Y3–Y4, and Y5–Y6. In Fig. 1 d, the bifactor model is “restricted”, which means that correlations among the four factors are imposed to be zero (no curved arrows joining the factors). In other words, the factors F1′, F2′, and F3′ in Fig. 1 d represent the part of the reliable variances of F1, F2 and F3 in Fig. 1 c that is not explained by the general factorG.

The concept of bifactor model was conceived in the 1930s ( Holzinger and Swineford, 1937 ), but it is only in the last few years that its unique potential for the study and measurement of broad, multi-faceted constructs has been fully recognized ( Reise, 2012 ). In psychiatry, the bifactor approach has been used to investigate the lack of separation among DSM-IV mental disorders ( Regier et al., 2009 ). Simms et al. (2012) have shown, for example, that the substantial overlap in symptom ratings among patients with syndromes of depression, anxiety, and somatization could be advantageously represented with a restricted bifactor model similar to that of Fig. 1 d. The main appeal of the model was its capacity to decompose the variance in indicator measures of depression, anxiety and somatization into independent and easily interpretable components: the general factor captured the overall severity of general psychological distress, whereas the specific factors estimated uncorrelated residual variations in depressive, anxiety and somatic symptoms. These results suggested that it might be possible to compare patients on an unambiguous scale of overall psychological distress, using a well-defined, unidimensional definition of minimal important score difference, and to simultaneously classify them into diagnostic categories based on their profiles of residual variability in psychiatric symptoms. The study also showed that predictors of general psychological distress severity could be isolated from predictors of excess severity for specific symptom domains.

In line with these results, we hypothesized that neurologic disability in MS can be decomposed into a general component of global disability and uncorrelated components of extra-variability in specific domains of neurologic dysfunction.

2.2. Sample and observed measures

Data for this proof-of-concept study included baseline and 2-year follow-up assessments of 301 men and women with relapsing–remitting MS (RRMS) recruited in 1990–1993 in the North American interferon beta-1a clinical trial (a subset of the original pooled database assembled by the Task Force on Clinical Outcomes Assessments for the development of the MSFC;Jacobs et al, 1996 and Fischer et al, 1999). The rationale for choosing this dataset was that participants were evaluated with the EDSS and multiple clinical measures of cognitive, hand, and leg function, including the three components of the MSFC—the timed 25-foot walk test (T25FW), the nine hole peg test (9-HPT) and the 3-second paced auditory serial addition test (PASAT)—as well as the ambulatory index (AI), the gait speed test (GST), the box and blocks test (BBT), the symbol digit modality test (SDMT), and total score on the controlled oral word association test (COWAT). All patients had EDSS of 1-to-3.5 at baseline. Two-year follow-up observations were recorded for 179 subjects (60%).

Approval was obtained from the institutional review board of the University of Alabama at Birmingham (as opposed to the board for the original trial).

2.3. Preliminary analyses

We used EFA and hierarchical item cluster analysis ( Revelle, 1979 ) to obtain a preliminary description of data structure and degree of multidimensionality (Reeve et al, 2007 and Lai et al, 2006). Providing an introduction to these exploratory techniques is beyond the scope of this paper (see(Norman and Streiner, 2008) and (Studerus et al, 2010)for an elementary description). Analyses were conducted with thepsychpackage ( Revelle, 2010 ) of the statistical software R ( R Development Core Team, 2011 ).

2.4. A priori models

We compared the following three CFA models in terms of fit to the data: (1) a unidimensional model where all nine observed measures defined a single factor of neurological disability (similar to Fig. 1 a); (2) a multidimensional model including three correlated factors of arm, leg, and cognitive dysfunction (similar to Fig. 1 c); and (3) a restricted bifactor model where all nine measures contributed to a general factor of global disability and subgroups of the same nine measures also contributed nontrivially to specific (auxiliary) factors of leg, hand, and cognitive dysfunction (similar to Fig. 1 d).

Composite measures are rarelystrictlyunidimensional, because they virtually always include observed measures influenced by more than one underlying factor. Should a unidimensional model fail to fit the observed data, it is important to evaluate whether the data are to be considered asessentiallyunidimensional (i.e., unidimensional in first approximation; Reise et al., 2007 ). In this study, this amounted to judging whether the domain-specific factors not accounted for in the unidimensional model were sufficiently weak so as not to introduce clinically important bias in the single factor.

The multidimensional model of neurological disability assumed that the nine observed measures had too little in common to be the manifestation of a single factor. Instead we assumed that three correlated factors (arm, leg, and cognitive) were necessary to account for the heterogeneity in levels of correlation among the observed measures.

Finally, the bifactor model was seen as an alternative to both the essentially unidimensional and the multidimensional model ( Reise et al., 2007 ).

2.5. Model estimation

A robust weighted least square procedure is generally recommended when the indicators include ordered-categorical measures such as the EDSS and the AI. We used the weighted least squares with mean and variance adjustment (WLSMV) in Mplus. All CFA models were adjusted for possible correlations of residual errors between baseline and follow-up measurements (i.e., parameters for these correlations were freely estimated instead of being fixed to zero by default).

2.6. Assessment of model fit and degree of unidimensionality

Many indicators of CFA model fit have been proposed, but all have limitations; therefore, we examined several of them as recommended ( Jackson et al., 2009 ). We used chi-square goodness of fit test corrected for WLSMV estimation (a non-significant result suggests acceptable fit), Root Mean Square Error Approximation (RMSEA), Tucker–Lewis Index (TLI), Comparative Fit Index (CFI), and mean absolute residual correlation criterion ( Reeve et al., 2007 ).

Several complementary criteria have also been proposed to inform decisions about the degree of unidimensionality versus multidimensionality of a set of measures. The criteria reported in this article are described in Table 1 .

3. Results

3.1. Sample characteristics and data transformations

In the sample, 73.4% of patients were women. At baseline, mean age was 37.7 years (SD, 7.4) and mean disease duration 6.5 years (SD, 5.8). The sample size was relatively small; therefore we used all the baseline and follow-up data available in the analyses. Baseline distributions of patient characteristics were comparable in groups with and without follow-up data (sex ratio: 0.36 versus 0.29,P=0.30; disease duration: 6.3 versus 6.9 years,P=0.39; other characteristics: not shown). Because of the sparse numbers of high EDSS and AI scores, we collapsed EDSS scores of 4–4.5 and scores ≥5 into two categories. Similarly, AI scores ≥4 were collapsed into a single category. Count measures (e.g., PASAT) were treated as continuous. T25FW and 9-HPT scores were inverse transformed, and GST scores log transformed to achieve acceptable levels of normality ( Beauducel and Herzberg, 2006 ; Table 1 ). Before CFA, all continuous scores were standardized (mean of zero, unit variance) and coded so that a higher score indicates a higher disability level.

After transformations, most correlations among observed measures were low to moderate, but 21 out of 36 were greater than 0.30 (Table 1 and Table 2). The highest correlations were among measures assessing the same neurological function (cognitive, hand, or leg). EDSS scores correlated moderately with the measures of hand and leg function, but only marginally with the measures of cognitive function. The lowest correlations were between measures of cognitive and leg functions.

Table 2 Correlations among clinical outcome measures.

SDMT 0.557              
COWAT 0.467 0.415            
BBT 0.400 0.453 0.363          
9-HPT 0.302 0.424 0.245 0.741        
EDSS 0.183 0.270 0.091 0.425 0.479      
AI 0.164 0.226 0.061 0.455 0.450 0.643    
T25FW 0.145 0.171 0.023 0.342 0.290 0.483 0.657  
GST 0.113 0.134 0.032 0.329 0.292 0.399 0.599 0.404

Note: Differences with previously published tables (Fischer et al, 1999 and Cutter et al, 1999) are due to data transformations and differences in methods of correlation calculation. For maximum model accuracy, the weighted least squares with mean and variance adjustment (WLSMV) estimation procedure (see Section 2.4 ) calculates correlations between pairs of variables differently depending on whether both measures are continuous (Pearson's correlation), one measure is continuous and the other categorical (polyserial correlation), or both measures are categorical (polychoric correlation).

3.2. Preliminary analyses

In EFA, 42.4% of total variability was on the first factor extracted. The ratio of the variability explained by the first factor extracted to that explained by the second factor extracted was 2.2—a result suggesting that data were unlikely to be strictly, or even essentially unidimensional (see Table 1 and Norman and Streiner, 2008 for interpretation). Instead, all preliminary analyses supported a multidimensional model with two or more correlated factors, or a bifactor model.

3.3. Unidimensional CFA model

As expected, the one-factor CFA model ( Fig. 2 ) provided poor fit (chi-square test,P<0.001; RMSEA, 0.144; CFI, 0.720; TLI, 0.694). Only BBT, 9-HPT, EDSS, and AI had high correlations (i.e., “standardized loadings” in FA parlance) of about 0.70 or higher. The factor explained 36.9% of total variance in clinical measures. The pattern of residual correlations indicated that a substantial fraction of the covariations between measures of cognitive function and arm function were not accounted for by the model (mean absolute residual correlation, 0.122).


Fig. 2 Unidimensional CFA model of neurological disability in RRMS.Note: S.E. stands for standard error. All the performance measures were scaled to have variance one; therefore, the variance of a performance measure explained by the factor equals one minus the residual variance for this measure. The standardized loadings have interpretation of correlations between the factor and each of the performance measures. By design, no intercept was significantly different from zero.

3.4. Three-factor CFA model

Fig. 3 displays the solution for the three-factor CFA model. Model fit was excellent (chi-square test,P=0.21; RMSEA, 0.020; CFI, 0.995; TLI, 0.994). As hypothesized, PASAT, SDMT and CA had significant correlations with the first factor (cognitive), BBT and 9-HPT with the second factor (arm), and AI, T25FW and GST with the third factor (leg). EDSS correlated moderately with the leg factor (0.52;P<0.001), weakly with the arm factor (0.25;P<0.001), and not at all with the cognitive factor (−0.02;P=0.82). We observed moderate-to-strong correlations between the arm factor and the other two factors (arm–cognitive, 0.61; arm–leg, 0.55), and only a weak correlation between the leg and cognitive factors (0.25). In combination, the number of large measure-factor correlations (i.e., ≥0.70), the relatively high correlations of the arm factor with the other two factors, and the correlation of EDSS with arm and leg factors suggested that ratings on the nine observed measures were not strictly multidimensional. The three factors explained 57.8% of total variance.


Fig. 3 Three-factor CFA model of neurological disability in RRMS.Note: see Fig. 1 note for interpretation. In Cutter et al. (1999) , the EDSS correlated 0.52 with the T25FW, 0.33 with the 9-HPT, and 0.23 with the PASAT.

3.5. Bifactor CFA model

The restricted bifactor model with three auxiliary factors of cognitive, hand and leg function (i.e., similar to Fig. 1 d) failed to converge. However, an excellent solution was obtained with a general factor of global neurological function and only two auxiliary factors of cognitive and leg function ( Fig. 4 ). This bifactor representation of the nine observed measures fitted the data significantly better than the unidimensional model (P<0.001) and as well as the three-factor model (chi-square test of model fit,P=0.15; RMSEA, 0.024; TLI, 0.991; CFI, 0.994). The two measures of arm performance were very strongly correlated with the general factor (BBT, 0.91; 9-HPT, 0.86) and, thus, did not constitute a third specific factor. Correlations of observed measures with the general factor ranged from 0.29 to 0.91 (median, 0.49) whereas correlations ranged from 0.51 to 0.70 (median, 0.54) with the cognitive factor, and 0.47–0.83 (median, 0.53) with the leg factor. Only EDSS had a stronger correlation with the general factor than with its secondary factor ( Table 1 ); the reverse was true for PASAT, SDMT, COWAT, AI, GST and T25FW.


Fig. 4 Bifactor CFA model of neurological disability in RRMS.Note: see Fig. 1 note for interpretation.

The model explained 59.2% of total variance in the data and 89.2% of the variance of composite sum score of the nine measures. More specifically, 63.1% of composite sum score variance was accounted for by the general factor, 17.1% by the leg factor, 9.0% by the cognitive factor, and 10.8% was unexplained. In other words, the composite score, as an estimate of global neurological disability, was “70.1% unidimensional” (63.1 divided by 63.1+17.1+9.0; Zinbarg et al., 2005 ) and “29.9% confounded by residual variations in leg and cognitive dysfunctions” ( Reise et al., 2013 ).

3.6. Correlations among individual score estimates

Because of the substantial proportion of patients lost to follow-up, correlations among the different individual score estimates were calculated at baseline only. As expected, individual scores on the specific dimensions of the bifactor model were not significantly correlated with each other (the observed weak correlations are in part random, in part an artifact of the method of estimation; DiStefano et al., 2009 ). Individual scores of global neurologic dysfunction estimated using the bifactor model were less strongly correlated with standard MSFC scores (r=0.77;P<0.001) than with individual scores estimated under the unidimensional model (r=0.95;P<0.001; Table 3 ), which suggests that the MSFC is not entirely unidimensional. Similarly, the cognitive sub-scores were moderately correlated with the MSFC scores (r=0.35;P<0.001), suggesting that the contribution of PASAT to the standard MSFC composite is weighted too high for MSFC to be considered an unconfounded measure of global neurological disability.

Table 3 Baseline correlations among factor scores and MSFC scores. a

  Global b (bifactor) Global (unidimensional) MSFC b Cognitive (bifactor) Leg (bifactor)
Global (bifactor) 1.00        
Global (unidimensional) 0.95 *** 1.00      
MSFC 0.77 *** 0.85 *** 1.00    
Cognitive (bifactor) 0.08 0.17 * 0.35 *** c 1.00  
Leg (bifactor) −0.10 0.16 * 0.14 c −0.10 1.00

a Factor scores are regression scores (i.e., expected a posteriori; DiStefano et al., 2009).

b Coded so that a higher score indicates a higher level of disability.

c In Fischer and Rudick (1999) , the overall MSFC score correlated 0.68 with the PASAT and 0.67 with T25FW.

lowast Sidak-adjusted significance level: P<0.05.

lowastlowastlowast Sidak-adjusted significance level: P<0.00.

Sidak-adjusted significance level: **P<0.01.

3.7. Sensitivity analysis

Because our sample size was relatively small, we used all the baseline and follow-up data available in the main analyses. Excluding baseline data for the 122 subjects with incomplete follow-up had no substantive effect on study results.

4. Discussion

This study differs in two important ways from previous studies of clinical outcome measures in MS. First, it relied on bifactor analysis to assess the dimensional structure of neurological disability in RRMS as described by the nine outcome measures considered; second, it incorporated the EDSS among the indicator measures of disability to be modeled, instead of treating it as an external “gold standard” of disability.

Our main finding is that, in this sample of patients with RRMS, neurological disability was best represented by a general factor of global neurological dysfunction and two uncorrelated auxiliary factors of leg and cognitive dysfunction. This factor structure shares similarities with that recently described for Guy's neurological disability scale (GNDS; Mokkink et al., 2011 ). The GDDS exhibited a bifactor structure where each of the 12 scale domains correlated with a general factor of disability, while 10 domains also defined a spinal factor (lower limb, upper limb, bladder, bowel, and sexual domains), a mental factor (cognition, mood and fatigue domains), and a bulbar factor (speech and swallow domains). Furthermore, the upper limb domain had a strong correlation with the general factor (0.740), as in our study, and only a negligible correlation with the spinal factor (0.184).

Together, our results and those of Mokkink et al. (2011) support the notion that MS disability can be well represented by a global disability factor and uncorrelated domain-specific disability factors. These results make sense from neuroanatomic and clinical standpoints, since the structures subserving “leg function”, for instance, are neuroanatomically distinct from those subserving cognition. It is also understandable that tests that purport to measure T25W, AI, and GST would be strongly correlated with each other as all purport to measure mobility.

In this research, neurological disability in MS exhibited bifactor structure ( Figs. 1 d and 4 ) and did not satisfy basic criteria of essential unidimensionality. This suggests that functional performance tests in MS should be conceptualized as markers of both global and domain-specific disability (possibly with the exception of measures such as those of arm functioning which loaded only on the general factor in this study). Therefore, an association between external variables (e.g., disease duration or age) and a standard composite measure such as the MSFC could be due to a relation between the variable and global disability, domain-specific disability, or both. The associations of the composite outcome with the external variables might also vary across patient samples, and possibly underestimate, or overestimate the associations of the true level of global neurological disability with these variables.

A related concern with the current MSFC scoring scheme is that equal nominal weights are assigned to T25FW, 9-HPT and PASAT. However, in our CFA models, the measures of cognitive function provided less information toward assessing the level of global neurological function than the measures of leg and especially arm functions. Moreover, factor scores specific of the cognitive specific domain correlated substantially and significantly with the MSFC scores. Together, these results suggest that the current scoring scheme is a source of confounding in the investigations of the relations between overall level of neurological dysfunction and patient characteristics or treatment.

Our proof-of-concept study suffered from several limitations. The study sample included only RRMS patients with baseline EDSS≤3.5. Restricting participant selection to those scoring within a narrow range of an observed measure is discouraged in FA ( Bollen, 1989 ). A consequence of the restriction in EDSS range at baseline is that correlations among observed measures, and between observed measures and factors, are likely to have been weaker than those that would have been observed in a more heterogeneous sample. No primary measure was available for several known dimensions of disability (e.g., vision, sphincter). Therefore, there might be more than two auxiliary factors of neurological dysfunction, and the measures of arm function might contribute to one of these auxiliary factors, as was the case in the GNDS study ( Mokkink et al., 2011 ). Finally, we did not assess the reliability, the measurement invariance, or the concurrent and predictive validity of the scores generated.

5. Conclusion

Our results suggest that it might be possible to apply bifactor FA/IRT methods to other existing datasets that include a broad range of clinical outcome measures. Such efforts might help refine the conceptualization of the clinical measurement of global disability in MS and, ultimately, develop a bifactor model of neurological disability in MS that is invariant across samples of MS patients and situations (Cook and Petersen, 1987, McHorney and Cohen, 2000, and Reeve et al, 2007). Once validated, such a pre-calibrated model would allow one to calculate unbiased scores of global neurological disability with only a subset of the performance measures employed for model development (conceivably just three or four measures; Lord, 1980 ). This feature of IRT models is relied upon, for instance, in applications of item banks and computer adaptive testing for patient-reported outcomes ( Reeve et al., 2007 ). The methods of score linking might also be used to compare scores of different outcome measures of neurological disability on a common metric ( Yen and Fitzpatrick, 2006 ). Then, various nomograms or conversion tables could be developed to facilitate the interpretation of the least intuitive measures (e.g., MSFC-like composites) by establishing stable connections between these and easier-to-interpret measures such as the EDSS.

Funding acknowledgement

This work was supported by the National Multiple Sclerosis Society [award number HCO127]. The funding agency played no role in study design, analysis, results interpretation, or report writing.

Conflict of interest statement

EC serves on the NMSS Task Force on Clinical Disability Measures. IK has no conflict of interest to declare. GRC served on scientific advisory boards for sanofi-aventis, Cleveland Clinic, Daiichi Sankyo, GlaxoSmithKline, Genmab A/S, Eli Lilly and Company, Medivation, Inc., Modigenetech, Ono Pharmaceutical Co. Ltd., PTC Therapeutics, Inc., Teva Pharmaceutical Industries Ltd., Vivus Inc., University of Penn, the NIH (NHLBI, NINDS, NICHD) and NMSS; serves on the editorial board ofMultiple Sclerosis; has received speaker and consulting honoraria from Alexion Pharmaceuticals, Inc., Bayhill Therapeutics, Bayer Schering Pharma, Novartis, Genzyme Corporation, Nuron Biotech, Peptimmune, Somnus Pharmaceuticals, Sandoz, Teva Pharmaceutical Industries Ltd., UT Southwestern, and Visioneering Technologies, Inc.; has received funding for travel and speaker honoraria from Consortium of MS Centers and Bayer Schering Pharma; and has received research support from the NIH (NICHHD, NIDDK, NINDS, NHLBI, NIAIDS), NMSS, the Consortium of MS Centers, and Klein-Buendel Incorporated.


We thank Dr. R.A. Rudick and the National Multiple Sclerosis Society (NMSS) for their permission to use the database assembled by the NMSS Task Force on Clinical Outcomes Assessments for secondary analyses pertaining to the development of new outcome measures in MS.


  • Beauducel and Herzberg, 2006 A. Beauducel, Y.P. Herzberg. On the performance of maximum likelihood versus means and variance adjusted least squares estimation in CFA. Structural Equation Modeling. 2006;13:186-203
  • Bollen, 1989 K.A. Bollen. Structural equations with latent variables. (John Wiley & Son, New York, 1989)
  • Cohen et al., Cohen JA, Reingold SC, Polman CH, Wolinsky JS, et al., for the International Committee on Clinical Trials in Multiple Sclerosis. Disability outcome measures in multiple sclerosis clinical trials: current status and future prospects. Lancet Neurology 2012;11:467–76.
  • Cook and Petersen, 1987 L. Cook, N. Petersen. Problems related to the use of conventional and item response theory equating methods in less than optimal circumstances. Applied Psychological Measurement. 1987;11:225-244
  • Cook et al., 2009 K.F. Cook, M.A. Kallen, D. Amtmann. Having a fit: impact of number of items and distribution of data on traditional criteria for assessing IRT's unidimensionality assumption. Quality of Life Research. 2009;18(4):447-460
  • Cutter et al., 1999 G.R. Cutter, M.L. Baier, R.A. Rudick, DL Cookfair, JS Fischer, J Petkau, et al. Development of a multiple sclerosis functional composite as a clinical trial outcome measure. Brain. 1999;122:871-882
  • DiStefano et al., 2009 C. DiStefano, M. Zhu, D. Mindrila. Understanding and using factor scores: Considerations for the applied researcher. Pract Assess Res Eval. 2009;14(20):1-11
  • Fischer et al., 1999 J.S. Fischer, R.A. Rudick, G.R. Cutter, S.C. Reingold. The Multiple Sclerosis Functional Composite (MSFC): an integrated approach to MS clinical outcome assessment. The National MS Society Clinical Outcomes Assessment Task Force. Multiple Sclerosis. 1999;5:244-250
  • Fox et al., 2007 R.J. Fox, J.C. Lee, R.A. Rudick. Optimal reference population for the multiple sclerosis functional composite. Multiple Sclerosis. 2007;13:909-914
  • Gibbons and Hedeker, 1992 R.D. Gibbons, D.R. Hedeker. Full-information item bi-factor analysis. Psychometrika. 1992;57:423-436
  • Hobart et al., 2000 J. Hobart, J. Freeman, A. Thompson. Kurtzke scales revisited: the application of psychometric methods to clinical intuition. Brain. 2000;123:1027-1040
  • Holzinger and Swineford, 1937 M.G. Holzinger, F. Swineford. The bi-factor model. Psychometrika. 1937;2:40-54
  • Jackson et al., 2009 D.L. Jackson, J.A. Gillaspy, R. Purc-Stephenson. Reporting practices in confirmatory factor analysis: an overview and some recommendations. Psychological Methods. 2009;14:6-23
  • Jacobs et al., 1996 L.D. Jacobs, D.L. Cookfair, R.A. Rudick, RM Herndon, JR Richert, AM Salazar, et al. Intramuscular interferon beta-1a for disease progression in relapsing multiple sclerosis. Annals of Neurology. 1996;39:285-294
  • Kurtzke, 1983 J.F. Kurtzke. Rating neurologic impairment in multiple sclerosis: an expanded disability status scale (EDSS). Neurology. 1983;33:1444-1452
  • Lai et al., 2006 J.S. Lai, P.K. Crane, D. Cella. Factor analysis techniques for assessing sufficient undimensionality of cancer related fatique. Quality of Life Research. 2006;15:1179-1190
  • Lord, 1980 F.M. Lord. Applications of item response theory to practical testing problems. (Erlbaum, Hillsdale, NJ, 1980)
  • McHorney and Cohen, 2000 C.A. McHorney, A.S. Cohen. Equating health status measures with item response theory: illustrations with functional status items. Medical Care. 2000;38(Suppl. II):II-43-II-59
  • Mokkink et al., 2011 L.B. Mokkink, D.L. Knol, B.M. Uitdehaag. Factor structure of Guy's Neurological Disability Scale in a sample of Dutch patients with multiple sclerosis. Multiple Sclerosis. 2011;17:1498-1503
  • Muthén and Muthén, 1998–2011 L.K. Muthén, B.O. Muthén. Mplus user's guide. 6th ed. (Muthén & Muthén, Los Angeles, 1998–2011) [computer software]
  • Norman and Streiner, 2008 G.R. Norman, D.L. Streiner. Biostatistics: the bare essentials. 3rd ed. (BC Decker Inc, Hamilton, 2008)
  • R Development Core Team, 2011 R Development Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2011. ISBN 3-900051-07-0. Available from: 〈〉 .
  • Reeve et al., 2007 B.B. Reeve, R.D. Hays, J.B. Bjorner, KF Cook, PK Crane, JA Teresi, et al. Psychometric evaluation and calibration of health-related quality of life item banks: plans for the Patient-Reported Outcomes Measurement Information System (PROMIS). Medical Care. 2007;45(Suppl 1):S22-S31
  • Regier et al., 2009 D.A. Regier, W.E. Narrow, E.A. Kuhl, DJ Kupfer. The conceptual development of DSM-V. American Journal of Psychiatry. 2009;166:645-650
  • Reise et al., 2007 S.P. Reise, J. Morizot, R.D. Hays. The role of the bifactor model in resolving dimensionality issues in health outcomes measures. Quality of Life Research. 2007;16:19-31
  • Reise, 2012 S.P. Reise. The rediscovery of bifactor measurement models. Multivariate Behavior Research. 2012;47:667-696
  • Reise et al., 2012 S.P. Reise, W.E. Bonefay, M.G. Haviland. Scoring and modeling psychological measures in the presence of multidimensionality. Journal of Personality Assessment. 2013;95:129-140
  • Revelle, 1979 W. Revelle. Hierarchical cluster analysis and the internal structure of tests. Multivariate Behavioral Research. 1979;14:57-74
  • Revelle, 2010 Revelle W. Psych: procedures for psychological, psychometric, and personality research. Evanston: North-Western University. R package version 1.1.12; 2010 [computer program].
  • Rudick et al., 1996 R. Rudick, J Fischer, J. Antel, C. Confavreux, G Cutter, G Ellison, et al. Clinical outcomes assessment in multiple sclerosis. Annals of Neurology. 1996;40:469-479
  • Schmitt, 2011 T.A. Schmitt. Current methodological considerations in exploratory and confirmatory factor analysis. Journal of Psychoeducational Assessment. 2011;29(4):304-321
  • Simms et al., 2012 L.J. Simms, J.J. Prisciandaro, R.F. Krueger, DP Goldberg. The structure of depression, anxiety and somatic symptoms in primary care. Psychological Medicine. 2012;42:15-28
  • Studerus et al., 2010 E. Studerus, A. Gamma, F.X. Vollenweider. Psychometric evaluation of the Altered States of Consciousness Rating Scale (OAV). PLoS One. 2010;5(8):e12412
  • Willoughby et al., 2010 M.T. Willoughby, C.B. Blair, R.J. Wirth, M. Greenberg. The measurement of executive function at age 3: psychometric properties and criterion validity of a new battery of tasks. Psychological Assessment. 2010;22:306-317
  • Yen and Fitzpatrick, 2006 W.M. Yen, A.R. Fitzpatrick. Item response theory. R.L. Brennan (Ed.) Educational measurement 4th ed. (American Council on Education and Praeger Publishers, Westport, CT, 2006)
  • Zinbarg et al., 2005 R.E. Zinbarg, W. Revelle, I. Yovel, W Li. Cronbach's α, Revelle's β, and McDonald's ΩH: their relations with each other and two alternative conceptualizations of reliability. Psychometrika. 2005;70(1):123-133


a Department of Epidemiology, University of Alabama at Birmingham School of Public Health, 1665 University Blvd, Suite 217H, Birmingham, AL 35294-0022, USA

b NYU-Multiple Sclerosis Care Center, Department of Neurology, NYU School of Medicine, 240 East 38th Street, NY 10016, USA

c Department of Biostatistics, UAB School of Public Health, 1665 University Blvd, Suite 410B, Birmingham, AL 35294-0022, USA

lowast Corresponding author. Tel.: +1 205 934 7176; fax: +1 205 934 8665.

1 Tel.: +1 212 598 6305.

2 Tel.: +1 205 975 5048.