You are here
A comparison of multiple patient reported outcome measures in identifying major depressive disorder in people with multiple sclerosis
Journal of Psychosomatic Research, In Press, Corrected Proof, Available online 1 September 2015, Available online 1 September 2015
Depression is one of the most prominent and debilitating symptoms in individuals with multiple sclerosis (MS), yet there is currently no consensus on the best instruments for depression screening in MS. More head to head comparisons of available screening instruments are needed to advise MS researchers and clinicians.
A cross-sectional comparison of the effectiveness of screening for MDD using multiple patient reported outcome (PRO) screeners against a modified SCID telephone interview was completed in 164 individuals with MS. Stratum goals were set for depression levels to ensure participation by people with borderline and higher levels of depression. Criterion standard was a modified SCID MDD module. PRO measures included the PHQ-9, BDI-FS, PROMIS depression, Neuro-QOL depression, M-PHQ-2, PHQ-2, and CESD.
48 (29%) individuals met the modified SCID criteria for MDD. The sensitivity of the PRO measures ranged from 60% to 100% while specificity ranged from 46% to 86%. The ROC area for the PRO measures ranged from 0.79 to 0.83. Revised (higher) cutoff scores were suggested by the ROC analyses for most self-reported screeners.
Enrollment was stopped early because of difficulties with recruitment. Several SCID recording could not be reviewed and diagnosis confirmed.
CESD-10 and PHQ9 had the best diagnostic performance using optimal cutoffs, but no one PRO measure stood out as significantly better than any other. Even when revised cutoff scores were used, none of the self-reported screeners identified people with MDD with adequate accuracy. More accurate self-reported screeners would facilitate diagnosing of MDD for both research and clinical purposes.
- We compare the screening accuracy for major depressive disorder in people with MS.
- 7 self-reported screeners were compared to the SCID interview.
- No screener reached adequate diagnostic accuracy.
- Self-reported screeners performed with similar accuracy.
- Higher cutoff scores were suggested for most screeners.
Keywords: Depression, PHQ, CESD, PROMIS, Neuro-QOL, Beck depression, SCID, Diagnostic accuracy.
Multiple sclerosis (MS) is a chronic neurologic disease characterized by inflammation, demyelination, and neurodegeneration in the central nervous system. People with MS often report physical, cognitive and psychological symptoms, of which depression is one of the most prominent and debilitating  . The life-time prevalence of major depressive disorder (MDD) in individuals with MS has been estimated at over 50%  . In addition, the 12-month prevalence of MDD is approximately twice that of the general population at 15.7%  . Depression is associated with a host of poor outcomes in people with MS, including poorer overall health, non-adherence to disease modifying medications  , loss of employment or reduction in work hours  , an increased risk of suicidal ideation and completed suicide , , and , greater cognitive dysfunction  , and an overall reduction in quality of life  and . Despite these poor outcomes in MS, MDD remains under-recognized and undertreated , , and .
Numerous instruments are available to assess depressive symptoms in people with MS, and they can also be used to identify depression cases in need of treatment. In the MS literature these measures are commonly referred to as depression “screening” instruments or measures  and . However, only a few published studies have compared the agreement between depression measures and structured diagnostic interviews for MDD in people with MS. In a series of newly diagnosed individuals with MS, the original Beck Depression Inventory (BDI)  (cutoff 13) produced 71% sensitivity, 79% specificity when compared to the Diagnostic Interview Schedule for DSM-III disorders  . A more recent similar study of the BDI-II in a clinical MS population produced similar results at 85% sensitivity and 76% specificity compared to the Schedules for Clinical Assessment in Neuropsychiatry (SCAN)  . A two-item measure adapted from the Primary Care Evaluation of Mental Disorders (PRIME-MD)  and  was compared to MDD diagnoses derived from the Structured Clinical Interview for DSM-IV Disorders (SCID)  and reported a 99% sensitivity and 87% specificity  in one study and 80% sensitivity and 93% specificity in a second  . The depression subscale of the Hospital Anxiety and Depression Scale (HADS) has also been validated against the SCID or the Schedules for Clinical Assessment in Neuropsychiatry. A cutoff of eight in two studies found 90% sensitivity and 87.3% specificity  and 85% sensitivity and 82% specificity  while a third study found 77% sensitivity and 81% specificity at a cutoff of 11  . Recently a study of the Center for Epidemiologic Studies Depression scale (CESD) found that it provided 95% sensitivity and 73% specificity compared to the SCID  . These findings provide important information regarding the utility of each measure, however many instruments have only been examined once and in small clinical samples and still more depression instruments are available that have not been examined.
There is currently no consensus on best practices for what instruments to use to assess depression in people with MS. As noted by the American Academy of Neurology in their recent review of the evidence for depression screening measures in MS  , a number of depression measures used in the MS field lack strong evidence for their utility in identifying cases of depression, particularly relative to other commonly used measures. More head to head comparisons of measures are needed to advise researchers and clinicians. Among the many common measures whose utility for case identification in MS have not been well studied are the Beck Depression Inventory-Fast Screen (BDI-FS)  , Patient Health Questionnaire-9 (PHQ-9)  , and the CESD  . The BDI-FS is made up of a subset of items from the Beck scale proposed for use in MS  , while the CESD and the PHQ-9 are two of the most commonly used instruments in the literature. Additionally, new instruments were recently developed with modern psychometric methods, and their clinical utility as MDD screeners in MS has yet to be examined. These include the Patient Reported Outcomes Measurement Information Systems (PROMIS®) and Neurological Quality of Life (Neuro-QOL) depression item banks. Both instruments provide population norms and have an added benefit of the availability of administration through computerized adaptive testing (CAT)  and . The Neuro-QOL measure was specifically developed for use in neurological populations, including MS, and – the psychometric properties of the PROMIS Depression short form have been examined in people with MS and found to be acceptable  and .
A recently published evidence-based guideline on the assessment and management of psychiatric disorders in individuals with MS emphasized the need for research comparing different self-report and diagnostic instruments for identifying psychiatric disorders, including MDD  . Therefore, the purpose of this study was to: (1) examine the correspondence between the standard diagnostic interview (SCID) and multiple self-report depression measures which are commonly used as tools for identifying MDD in MS; and (2) examine the published cutoffs for each measure and potentially identify optimal cutoffs for identifying people with MS for MDD.
Between September 2011 and March 2012 individuals with MS were recruited through invitation letters, print advertisements, and referrals from active research studies at the University of Washington (UW) in Seattle, WA. Individuals were sent invitation letters if they had participated in past research studies at the UW and indicated interest in future studies or were members of the UW disability registry. Individuals were required to be 18 years or older, self-report a definitive MS diagnosis by a physician, be able to read and understand English, and have access to a telephone from which they could answer sensitive questions. In order to ensure the participant pool represented all levels of depressive symptoms and included more participants with borderline depression, enrollment targets were stratified by PHQ-9 scores at the time of screening. This was done in order to ensure that the performance of the instruments was examined in participants across the whole depression continuum and individuals with no or low depressive symptoms, who are the most likely to volunteer for research studies, were not overrepresented. Initial recruitment targets were set at 200 total with 10% having no or minimal depressive symptoms (PHQ-9 score < 5), 20% mild (PHQ-9 score 5–9), 30% moderate (PHQ-9 score 10–14), 30% moderate-severe (PHQ-9 score 15–19), and 10% severe (PHQ-9 score ≥ 20). Initial mailings to past research participants were done through random selection. However, individuals reporting a moderate to high level of depressive symptoms in past survey studies were specifically targeted in later recruitment mailings in order to try to meet recruitment goals for those strata. If a participant did not respond to the invitation recruitment mailing an attempt was made to call them approximately two weeks after the mailing. Once recruitment goals were met for a stratum, individuals scoring within that range upon screening were determined ineligible for study enrollment. All procedures, including written informed consent, were approved by the UW Human Subjects Division. Participants were paid $25 upon study completion.
At screening, potential participants completed the PHQ-9 on the telephone in order to determine study eligibility. Those who were eligible and interested were mailed a packet containing the six self-report depression measures described below. They opened and completed the self-report measures at the scheduled time of their telephone interview. The average time between initial screening into the study and the interview (that included both the SCID and responding to self-report measures) was 10 days (range: three to 29 days). The research interviewer was available for questions but did not otherwise participate or interact with the participants while they responded to the self-reported measures. Upon completion of the self-report measures, the researcher conducted the SCID MDD module with the participant  . Thus, all self-report measures were completed within minutes of the SCID as they were done while the interviewer waited on the phone prior to administering the interview. Participants were then instructed to mail the self-report packet back to research staff.
2.3.1. Disease and demographic characteristics
Participants were asked to provide their age, gender, race and ethnicity, education level, and employment status. In addition, participants reported the year of their MS diagnosis and completed a self-report version of the mobility section of the Expanded Disability Status Scale (EDSS)  in order to estimate MS severity level.
2.3.2. Self-report depression measures
Participants completed six different paper and pencil self-report measures of depressive symptoms. The following measures were selected for inclusion: PROMIS Depression short form (SF), Neuro-QOL depression SF, PHQ-9, modified PHQ-2, BDI-FS, and the CESD. Order of administration was counterbalanced to address potential order effects  .
22.214.171.124. PROMIS depression (PROMIS-D) SF
The Patient Reported Outcomes Measurement Information System (PROMIS®, www.nihpromis.org ) depression SF version 1.0 includes eight items that were selected from the PROMIS-D item bank using CAT simulation results, item information, and content  . The PROMIS-D item bank was developed using item response theory (IRT), and scores on the SF are directly comparable to CAT scores or other SF scores. Unlike most depression measures, the PROMIS-D item bank does not measure behavioral and somatic indicators  . All eight questions are rated on a five point Likert scale (1 = never; 5 = always), and respondents are asked to recall how they felt over the past seven days. Scores are reported on a T-score metric [mean = 50; standard deviation (SD) = 10] that is centered on a sample that is representative of the United States general population by age, gender, and race/ethnicity  and . The short form was scored following the PROMIS instructions using the raw score/scale score look-up tables  . Higher scores indicate higher levels of depression. PROMIS-D SF has been used in MS  and . No studies to date have identified an optimal cutoff on the PROMIS-D in identifying a MDD. However, prior research linking the PROMIS-D to the PHQ-9 and CESD suggests that a score of 59.9 is equivalent to a PHQ-9 score of 10, and a score of 56.2 is equivalent to a CESD score of 16  . These two cutoffs have been recommended for use in identifying those with at least moderate depression , , and .
126.96.36.199. Neuro-QOL depression (Neuro-QOL-D) SF
Similar to PROMIS, the Neuro-QOL measurement system used IRT to develop item banks and SFs to measure depression (and other symptoms and quality of life indicators) in neurological research  . Five neurological populations, including MS, were involved in the development of Neuro-QOL instruments. The Neuro-QOL-D version 1.0 SF was included in this study  . This SF consists of eight items on a five point Likert scale (1 = never; 5 = always) and asks participants to recall how they have felt in the last seven days. Like PROMIS, the item bank does not include somatic indicators, and IRT-based scores are converted to T-scores that are centered on a general population age, gender, and race/ethnicity census matched calibration sample [mean = 50; SD = 10]. Higher scores indicate greater levels of depression  . Because of some item overlap with the PROMIS-D SF, Neuro-QOL and PROMIS-D items were administered together as one “scale.” This was possible as the banks were designed with the same exact response categories, instructions, and item formatting. The short form was scored following the Neuro-QOL instructions using the raw score to T-score lookup tables  . No published cutoffs for identifying MDD are currently available for the NeuroQOL depression measure.
On the Patient Health Questionaire-9 (PHQ-9) respondents are asked to rate how often they have been bothered by each depressive symptom over the last two weeks on a four point scale  . The nine items parallel the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV) criteria for MDD  . Items are summed, and total scores range from zero to 27, with scores of 10 or greater suggesting a Major Depressive Episode (MDE)  and . A scoring algorithm paralleling the SCID DSM-IV diagnosis of MDD has also been recommended for identifying those with MDD  . The PHQ-9 has been used to measure depression in multiple medical populations, including MS , , and .
The Patient Health Questionaire-2 was developed in primary care populations to quickly screen for possible depression  . The PHQ-2 consists of the first two questions from the PHQ-9, and when summed the total score can range from zero to six. The items also can be dichotomized into yes/no at a threshold of one or more (some days) or at a threshold of two or more (more than half the days). Results of analyses examining the cutoff of scoring a two or more on either item were completed. In this study the PHQ-2 was not administered separately from the PHQ-9 but rather the two items were scored separately to generate the PHQ-2 score.
188.8.131.52. Modified PHQ-2 (M-PHQ-2)
This screener is based on the first two questions of the Primary Care Evaluation of Mental Disorders (PRIME-MD) depression section  , and captures depressed mood and anhedonia, the two essential criteria of DSM-IV MDD. The two questions are identical to the PHQ-2, with changes to the response options only. Rather than the four response options available on the PHQ-2, patients were asked to answer with either a “yes” or “no.” Answering yes to either question has been used to indicate probable MDD in primary care patients,  urgent care veteran patients,  and people with MS  . We included this modified measure specifically to try and replicate the results found by Mohr et al. in their 2007 study in MS patients.
The Beck Depression Inventory-Fast Screen has seven groups of statements, and respondents are asked to choose the statement that most reflects how they were feeling within the last two weeks  . Each statement set has scores ranging from zero to three, and scores are summed with a final score range of zero to 21. A cutoff of four or greater is recommended for identifying those with at least mild depression  . The BDI-FS was specifically designed to be used in medical settings and, therefore, does not include somatic symptoms. Preliminary evidence supports the use of the BDI-FS in persons with MS  .
184.108.40.206. CESD-20 and CESD-10
The full version of the Center for Epidemiologic Studies Depression Scale has 20 items, and respondents are asked to rate the frequency of both positive and negative feelings within the “past week” on a four-point scale  . Scores are summed and can range from zero to 60. A cutoff of 16 has been recommended for identifying those with clinically significant levels of depressive symptoms  and . The CESD was developed for use in primary care settings but evidence supports its use in individuals with MS  . The CESD-10 is a SF that includes half of the original CESD scale items  . As with the full version, scores are summed across all items. The score ranges from zero to 30 and a cutoff of 10 or higher on the SF has been recommended to identify those with clinically significant depressive symptoms  . In this study, the CESD-10 was not administered separately, rather the subset of ten items from the full CESD were scored separately in order to create a CESD-10 score.
2.3.3. MDD diagnosis
The criterion standard in this study was a diagnosis of MDD during the past month based on telephone administration of SCID  . The SCID has been validated for use over the telephone, and the interview questions and decision trees are directly comparable to in person administration  and . Bachelor's level research assistants underwent training in SCID administration by observing videos and completing in person training with a doctoral level Advanced Registered Nurse Practitioner and a Masters level research therapist. The questions from the MDD module of the SCID were asked using the specific wording and established guidelines for administration and coding of SCID responses  and . All recorded SCIDs were reviewed in early weeks of the study to confirm that each administrator was achieving 90% inter rater reliability (IRR) with the research therapist. Ongoing feedback was provided to research staff in order to improve interview technique and accuracy. The trained research assistants conducted the SCID interviews and based on the published protocols rendered a preliminary diagnosis of MDD. All telephone interviews were recorded for quality assurance, but 14 recordings were unintelligible. One of two clinicians with extensive experience in SCID administration listened to all interview recordings to confirm or revise the MDD status assigned by research assistants. This final diagnosis of MDD was used in the analyses because the authors consider the diagnosis by clinicians more accurate due to their greater clinical training and experience with diagnosing depression in people with MS. Because the diagnosis of MDD was made in most cases by the clinicians, rather than the person who administered the SCID, we refer to our procedure as the “modified SCID” in the results. Both the research assistants and clinicians were unaware of the participants' self-reported depression scale scores when conducting or listening to the SCID interviews.
In order to evaluate the performance of each self-report depression measure we compared previously recommended published cutoffs or scoring algorithms with the results of the modified SCID MDD diagnosis. Criterion validity was evaluated by calculating sensitivity, specificity, positive and negative likelihood ratios, and Cohen's kappa. Sensitivity is defined as probability of screening positive for MDD among those who have MDD (using the modified SCID interview as the gold standard). Specificity is similarly defined but is the probability of screening negative on the PRO measure among those who do not have MDD according to their modified SCID interview. Positive likelihood ratios are defined as sensitivity divided by 1 minus specificity (LR + = sensitivity/(1-specificity) and negative likelihood ratios are defined as one minus sensitivity divided by specificity (LR − = 1-sensitivity)/specificity). A positive likelihood ratio (LR +) of greater than 1 indicates the test result is associated with the disease while a negative likelihood ratio (LR −) less than 1 indicates that the result is associated with absence of the disease. A LR + greater than 10 has been suggested to indicate a conclusive increase in likelihood of disease while a LR − of < 0.1 suggests a large and often conclusive decrease in likelihood of disease  . In addition, receiver operating characteristic curves (ROC) were generated for each measure and the area under the ROC curve was calculated. Optimal cutoff scores for each measure (with exception of PHQ-2 and the M-PHQ-2) were calculated by maximizing the Youden Index (sensitivity + specificity − 1)  . For each of the newly calculated optimal cutoffs, performance was again evaluated by examining sensitivity, specificity, positive and negative likelihood ratios, and Cohen's kappa. A sensitivity analysis evaluating optimal cutoffs only in individuals with clinician confirmed diagnoses was conducted to evaluate the potential impact of using the research assistants' initial evaluation in the 14 cases without intelligible recordings. Sensitivity analyses using the research assistants' initial evaluation of MDD on the SCID interview for all participants were also conducted. All analyses were conducted using Stata 12.1 software  .
A total of 372 individuals from past research studies were mailed invitation letters, 28 were referred from active research studies, and 11 responded to print advertisements. Of these, 50 declined to participate, 72 were determined ineligible at the time of enrollment (70 due to strata reaching recruitment capacity), three were deceased, one was unable to complete the interview due to severe comorbid conditions, and 119 did not respond or were lost to follow-up. A total of 166 individuals completed the study, but two cases were excluded from the main analyses after clinician reviews determined that these SCID interviews lacked the detail required to make a diagnosis. The mean age of the sample was 53.2 years, and mean MS duration was 15 years. Consistent with the distribution of MS in the population, the majority of participants were female (77%), white (90%), married or living with a partner (64%), and unemployed (79%; see Table 1 for more sample characteristics). Of the 164 participants, 14 had missing or inaudible audio recordings of their SCID interviews. For these 14 participants the research assistant's initial assessment of MDD status was used as the criterion standard as they could not be re-evaluated by the research clinicians. Of the 150 SCID interviews that were re-evaluated by the research clinicians, 20 had their MDD status changed from the initial research assistant's assessment (87% agreement). Analyses to identify optimal cutoffs were repeated after dropping these 14 individuals and all identified optimal cutoffs remained consistent with initial analyses inclusive of these individuals. Results of sensitivity analyses using the research assistants' initial MDD assessment for all 166 completed interviews are provided in Appendix 1 . Recruitment was ceased prior to reaching the initial goal of 200 participants due to difficulties in recruitment and resource limitations. The only screening stratum that reached capacity during recruitment was the lowest PHQ-9 stratum (PHQ-9 score < 5). The distribution of PHQ-9 intake screening scores for the final 164 participants is provided in Table 1 .
|Mean ± SD
|Age (years)||53.2 ± 11.1|
|MS duration (years)||15.0 ± 9.2|
|0.5 Male||37 (23%)|
|Married/living with partner||105 (64%)|
|Currently employed||34 (21%)|
|White non-Hispanic||147 (90%)|
|High School (or less)||25 (15%)|
|Some College/Associates Degree||80 (49%)|
|Bachelor's Degree||35 (21%)|
|Advanced Degree||24 (15%)|
|EDSS Group (n = 163)|
|Mild (≤ 4.0)||40 (24%)|
|Moderate (4.5–6.5)||89 (55%)|
|Severe (≥ 7.0)||34 (21%)|
|PHQ-9 Screening Intake Group|
|None (< 5)||20 (12%)|
|Mild (5–9)||40 (24%)|
|Moderate (10–14)||52 (32%)|
|Moderate–severe (15–19)||40 (24%)|
|Severe (≥ 20)||12 (7%)|
SD: Standard deviation; EDSS: Expanded disability status scale;
PHQ: Patient health questionnaire.
3.2. Depression measures comparison
Of the 164 participants included in the primary analyses, 48 (29%) were identified as meeting criteria for a major depressive disorder during the past month using the modified SCID criterion standard. The sensitivity, specificity, likelihood ratios, and kappa for all previously defined or published cutoffs for each of the PRO depression measures are presented in Table 2 . The percentage of people identified as having significant depression by the PRO measures ranged from 27.4% (PHQ-9 MDD algorithm)) to 67.7% (M-PHQ-2). Sensitivity values (true positives) ranged from 60.4% (PHQ-9 MDD scoring algorithm) to 100% (M-PHQ-2 and CESD) while specificity values (true negatives) ranged from 45.7% (M-PHQ-2) to 86.2% (PHQ-9 MDD scoring algorithm). Only 13% of the sample was incorrectly classified as not being depressed (compared to the modified SCID) by any one of the eight cutoffs presented in Table 2 . Alternatively, 50% of individuals were incorrectly identified as having significant depression (compared to the modified SCID) by at least one cutoff in Tables 2 , and 31% were incorrectly categorized by four or more of the measures in Table 2 . Kappa values ranged from 0.33 (PRIME-MD-2 and BDI) to 0.49 (PHQ-2). With the exception of the PHQ-2 and M-PHQ-2 (binary outcomes), the ROC curves for each PRO measure are presented in Fig. 1 and underlying sensitivity and specificity at all cutoffs for each measure is provided in Appendix 2 . The area under the ROC curves is also provided for comparison in Table 3 .
|Outcome measure||Cutoff or scoring method||Percent at or above cutoff (n)||Sensitivity (95% CI)||Specificity (95% CI)||Youden Index||LR +
|PROMIS Depression SF||≥ 59.9||42.1% (69)||79.2% (65.0–89.5)||73.3% (64.3–81.1)||0.525||2.96 (2.12–4.14)||0.28 (0.16–0.50)||0.46 (0.08)|
|PROMIS Depression SF||≥ 56.2||64.0% (105)||97.9% (88.9–99.9)||50.0% (40.6–59.4)||0.479||1.96 (1.6–2.4)||0.04 (0.01–0.29)||0.36 (0.06)|
|PHQ-9||MDD algorithm||27.4% (45)||60.4% (45.3–74.2)||86.2% (78.6–91.9)||0.466||4.38 (2.63–7.29)||0.46 (0.32–0.66)||0.47 (0.08)|
|PHQ-9||≥ 10||54.8% (90)||93.8% (82.8–98.7)||61.2% (51.7–70.1)||0.55||2.42 (1.90–3.07)||0.10 (0.03–0.31)||0.43 (0.07)|
|PHQ-2||≥ 2 on either item||33.7% (56)||70.8% (55.9–83.0)||81% (72.7–87.7)||0.518||3.73 (2.46–5.67)||0.36 (0.23–0.56)||0.49 (0.08)|
|M-PHQ-2||“Yes” to either item||67.7% (111)||100.0% (92.6–100.0)||45.7% (36.4–55.2)||0.457||1.84 (1.56–2.18)||0.0 (N/A)||0.33 (0.06)|
|BDI-FS||≥ 4||66.3% (108)||97.9% (88.9–99.9)||47.0% (37.6–56.5)||0.449||1.85 (1.55–2.20)||0.04 (0.01–0.31)||0.33 (0.06)|
|CESD-20||≥ 16||62.8% (103)||100.0% (92.6–100.0)||52.6% (43.1–61.9)||0.526||2.11 (1.74–2.55)||0.0 (N/A)||0.39 (0.06)|
|CESD-10||≥ 10||67.1% (110)||100.0% (92.6–100.0)||46.6% (37.2–56.0)||0.466||1.87 (1.58–2.22)||0.0 (N/A)||0.34 (0.06)|
BDI-FS: Beck depression inventory fast screen; CESD: Center for epidemiologic studies depression scale; CI: Confidence interval; LR: Likelihood ratio; MDD: Major depressive disorder; PHQ: Patient health questionnaire; PROMIS: Patient reported outcomes measurement information systems; SE: Standard error; SF: Short form.
|Outcome measure||Newly identified optimal cutoff||ROC area
|Percent at or above new optimal cutoff threshold (n)|
|PROMIS Depression SF||≥ 58.8||0.86 (0.80–0.91)||0.60||91.7% (80.0–97.7)||68.1% (58.8–76.4)||2.87 (2.17–3.80)||0.12 (0.05–0.32)||0.50 (0.07)||49.4% (81)|
|Neuro-QOL Depression SF||≥ 53.6||0.88 (0.82–0.93)||0.57||87.5% (74.8–95.3)||69.8% (60.6–78.0)||2.90 (2.16–3.90)||0.18 (0.08–0.38)||0.48 (0.07)||47.0% (77)|
|PHQ-9||≥ 12||0.89 (0.83–0.94)||0.65||91.7% (80.0–97.7)||73.3% (64.3–81.1)||3.43 (2.51–4.69)||0.11 (0.04–0.29)||0.56 (0.07)||45.7% (75)|
|BDI-FS||≥ 7||0.86 (0.80–0.92)||0.60||81.3% (67.4–91.1)||78.3% (69.6–85.4)||3.74 (2.58–5.42)||0.24 (0.13–0.44)||0.54 (0.08)||39.3% (64)|
|CESD-20||≥ 22||0.89 (0.84–0.94)||0.63||93.8% (82.8–98.7)||69.0% (59.7–77.2)||3.02 (2.28–4.00)||0.09 (0.03–0.27)||0.52 (0.07)||49.4% (81)|
|CESD-10||≥ 17||0.89 (0.84–0.94)||0.66||83.3% (69.8–92.5)||82.8% (74.6–89.1)||4.83 (3.18–7.34)||0.20 (0.11–0.38)||0.54 (0.08)||36.6% (60)|
BDI-FS: Beck depression inventory fast screen; CESD: Center for epidemiologic studies depression scale; CI: Confidence interval; LR: Likelihood ratio; NeuroQOL: Quality of life in neurological disorders depression short form; PHQ: Patient health questionnaire; PROMIS: Patient reported outcomes measurement information systems; ROC: Receiver operating characteristic; SF: Short form.
Optimal cutoffs for each PRO measure in Fig. 1 were identified by choosing the score on each that maximized the Youdin Index (i.e. maximized the sum of the sensitivity and specificity). With the exception of one of the PROMIS-D cutoffs, in all cases the identified optimal cutoff for people with MS was higher than that suggested by the scale developers. In addition, none of the identified cutoffs (optimal or developer suggested) met the Youden Index threshold (sensitivity + specificity − 1 ≥ 0.8) suggested for an adequate diagnostic test  . The sensitivity and specificity values for the identified optimal cutoffs ranged from 81% to 94% for sensitivity and 68% to 83% for specificity ( Table 3 ). Kappa values for the optimal cutoffs ranged from 0.48 to 0.56 ( Table 3 ).
The purpose of this study was to examine the correspondence between the diagnosis of MDD based on the SCID and six self-report instruments used to identify people with MDD. Our results suggest that none of the self-report instruments identified people with MDD with adequate accuracy, suggesting a need for development of a more accurate screening instrument or scoring algorithm that results in more accurate diagnosis in persons with MS. Depending on their purpose, the researchers/clinicians may choose one measure over another based on the measure's other characteristics, such as whether somatic indicators should be included, use of the specific instrument in previous studies for comparability, ability to administer via CAT, availability of norms, or the length of questionnaire. This finding is generally consistent with the recent evidence-based guidelines on the assessment and management of psychiatric disorders in individuals with MS  , which found insufficient evidence to support or refute any of the depression screening measures (BDI, M-PHQ-2, CES-D) examined in their review. It was also notable that the BDI-FS did not outperform many of the other measures, despite preliminary evidence for it in MS  and its inclusion in the widely used MACFIMS protocol  . Using the Youden index with previously published cutoffs, the PHQ-9 was slightly more effective in identifying individuals with MDD in the current study than the other instruments. However, if using the optimal cutoffs in Table 3 , the CESD-10 and PHQ-9 performed slightly better than the other measures based solely on the Youden Index. All measures except for the PHQ-9 MDD scoring algorithm overestimated the percentage of people with MDD diagnosis, many by over 30%. Of those who were mis-categorized as significantly depressed by at least four of the measures in Table 2 , scores on the PHQ-9 ranged from four to 24 with a mean of 12, and scores on the CESD-20 ranged from nine to 50 with a mean of 27. This indicates that some individuals can score very high on the self-report measures, yet still not have significant depression when interviewed by a clinician.
Our results are directly comparable to previously published results on the PHQ-9, M-PHQ-2 and CESD-20. For the rest of the measures we couldn't directly compare our results to previous studies because either the information was not available in people with MS (PROMIS), has never been studied before (NeuroQOL), or a different version or scoring of the instrument was used (BDI, PHQ-2). For M-PHQ-2 we did not replicate the previously published results by Mohr et al.  . In our sample sensitivity was similar, but specificity was considerably lower indicating that the instrument overestimated the number of people with MDD by almost 40%  . For CESD-20, our results at the published cutoff of 16 for sensitivity were similar to that of Pandya et al.  and slightly higher than that of Patten et al.  . However, our specificity is substantially lower than that of Patten et al. at that cutoff (85.4% versus 52.6%). Interestingly, in both Patten et al. and this study, optimal cutoffs on the CESD were substantially higher (21 and 22 respectively). In addition, both studies found that raising the cutoff also increase the performance of the PHQ-9. As with the CESD, PHQ-9 sensitivity in this study was similar to Patten et al. at the recommended cutoff but sensitivity was substantially lower (88% vs 61%). In addition, it is important to note that though we could not directly compare our results with other measures as described above, the ranges of specificity and sensitivity found in the current study are very similar to those of other published studies which examined other measures in MS , , and .
Our results also suggest that higher cutoffs for all measures except PROMIS improve diagnostic effectiveness in people with MS. However, even the new cutoffs do not reach the recommended effectiveness suggested for an adequate diagnostic test  . The sensitivity and specificity achieved using the different cutoffs can be helpful for researchers and clinicians in selecting the instrument and the cutoff that best accomplishes their goals. In addition, although the measures may not reach the recommended effectiveness for a diagnostic test, this does not undermine their use as tools for monitoring depressive symptoms in epidemiological studies.
Finally, it is interesting to note that the relatively recent addition of IRT-based Neuro-QOL instrument that was developed specifically for people with neurologic conditions (including people with MS) does not seem to provide any advantage over the IRT-based PROMIS-D that was developed to be used across different populations. This suggests that IRT-based instruments are population invariant providing the calibrating samples are representative of the population. It also suggests that disease specific instruments do not necessarily result in greater precision of measuring universal constructs, such as depression, that appear to be the same in people with MS as in otherwise healthy people or people with other health conditions.
This study had several limitations. First, we did not conduct a sample size calculation prior to completing the study, but rather chose our estimated target for enrollment with the aim to examine functioning across the depression continuum and based on available study resources. When compared to similar studies in MS, such as Patten et al.  , our sample size is sufficient to estimate sensitivity and specificity with adequate precision. We believe that it was not the sample size, but the sampling strategy of recruiting more people with moderate levels of depressive symptoms that affected the confidence intervals around our estimates. We also did not reach our target of 200 participants due to difficulties with recruitment and this reduced sample size may have resulted in wider confidence intervals around our accuracy estimates. Second, bachelor's level research staff conducted the SCIDs resulting in two cases that had to be dropped because the SCID was stopped prematurely. In addition, in 14 cases where the recording failed, it was not possible to review the SCID by the experienced clinicians, and the diagnosis assigned by the research staff was used in the analyses. To examine the impact of these cases we re-ran all the analyses without the 14 cases and found that even when dropping these cases, the identified optimal cutoffs for all measures were identical, suggesting these 14 cases had minimal (if any) impact on our results. In addition, in the standard administration of the SCID the diagnosis of MDD is made by the person who administered the SCID. In this study we changed some of the initial diagnoses by the research assistants based on review by more experienced clinicians, resulting in non-standard administration. In our experience, many studies use expert oversight of raters and this procedure resulted in more accurate data. However, for full disclosure we also provided the analyses using the preliminary diagnosis in Appendix 1 . Third, we used the inclusive method to score the SCID and the self-report measures  . That is, we scored potentially transdiagnostic symptoms (e.g., fatigue, poor concentration) as contributing to a diagnosis of depression, if present, and did not attempt to ascertain whether transdiagnostic symptoms were more likely to be attributable to the primary effects of MS than to MDD. Some research suggests transdiagnostic symptoms such as fatigue should be down-weighted to account for this overlap  . On the other hand, depression screening measures constructed to exclude transdiagnostic symptoms, such as the BDI-FS, PROMIS and Neuro-QOL, did not perform significantly better than measures that included these symptoms, such as the PHQ-9. Whether the hospital anxiety and depression scale (HADS)  , another commonly used measure developed to exclude such symptoms, would have had a different outcome is unknown and worthy of future research. Fourth, the PHQ-9 instrument was administered twice. Once by phone at the initial screening to assign participant to intake severity category, and second on paper along with all remaining self-report instruments that were only administered once. It is possible that the repeated administration of the PHQ-9 (average 10 days later) bias the results, although test-retest reliability of the PHQ-9 has been reported to be excellent (ICC = 0.96)  and we do not believe this has affected its diagnostic effectiveness. Nevertheless, perhaps selecting a different instrument, one not examined in this study, would have been preferable. Fifth, we did not collect information about prior or current treatment of depression, nor did we exclude individuals based on those criteria. This may have led to an overestimation in the accuracy of the screening instruments, as exclusion of these individuals in a sensitivity analyses by Patten et al.  resulted in decreased predictive value of the instruments. Finally, because we set enrollment targets related to the level of depression, in this study we could not examine the prevalence of MDD in MS or the positive and negative predictive values of the measures. The stratified enrollment targets may also explain why the self-report instruments did not perform as highly in our study as in some other studies and why our confidence intervals around our estimates are wider than other similar studies with fewer participants (see  ). While this strategy was a disadvantage in some respects, it can also be considered a strength because sufficient numbers of people along the whole continuum of depression are unlikely to be achieved in clinic-based studies. In addition, if these self-report measures are to be used as case finding instruments in borderline cases, our stratification approach should provide more accurate estimates of sensitivity and specificity than in studies that enroll clinic based samples that include either mostly people with low depressive symptoms (such as found in the general medical practice) or mostly those with high depressive symptoms (such as those found in mental health clinics).
In summary, our results as well as the Minden et al.  evidence based review suggest that there is much work to be done to improve accurate identification of MDD using screening measures in this population. It is possible that because of complexity of diagnosing MDD self-reported screening instruments are limited in their ability to accurately identify MDD in people with MS. It may be useful to examine whether a set of questions drawn from all the instruments would result in more accuracy, however development of such an instrument would involve potentially difficult to resolve intellectual property issues. While the measures examined in this study are commonly used in MS clinical care and research and have published cutoffs, no instrument clearly outperformed others or could be recommended as a gold standard. Future research may want to examine the use of these measures in specific contexts (e.g., outpatient MS specialty care center, epidemiologic study), including determining acceptable/desired specificity and sensitivity for that specific context a priori. Because the current literature is equivocal with regards to the importance of including or excluding transdiagnostic symptoms (e.g., fatigue, concentration difficulties), it will be important for future investigators to examine the relationship of individual items to MDD diagnosis (i.e., which of the items are the best predictors of MDD diagnosis) to better understand the role of such symptoms in diagnosing depression.
The following are the supplementary data related to this article.
The contents of this manuscript were developed under a grant from the Department of Education, NIDRR grant number H133B080025. However, those contents do not necessarily represent the policy of the Department of Education, and you should not assume endorsement by the Federal Government. This review was also supported in part by a grant from the National Multiple Sclerosis Society, grant number MB 0008. Neither funding agency played any role in the study design or interpretation, or the preparation or decision to submit the manuscript for publication.
-  American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (Text Revision). (American Psychiatric Association, Washington, DC, 2000)
-  D. Amtmann, J. Kim, H. Chung, A.M. Bamer, R.L. Askew, S. Wu, K.F. Cook, K.L. Johnson. Comparing CESD-10, PHQ-9, and PROMIS depression instruments in individuals with multiple sclerosis. Rehabil. Psychol.. 2014;59:220-229 Crossref
-  E.M. Andresen, J.A. Malmgren, W.B. Carter, D.L. Patrick. Screening for depression in well older adults: evaluation of a short form of the CES-D. Am. J. Prev. Med.. 1994;10:77-84
-  P.A. Arnett, C.I. Higginson, W.D. Voss, B. Wright, W.I. Bender, J.M. Wurst, J.M. Tippin. Depressed mood in multiple sclerosis: relationship to capacity-demanding memory and attentional functioning. Neuropsychology. 1999;13:434-446 Crossref
-  A.T. Beck, R.A. Steer, G.K. Brown. BDI-Fast Screen for Medical Patients: Manual. (Pearson, San Antonio, TX, 2000)
-  A.T. Beck, C.H. Ward, M. Mendelson, J. Mock, J. Erbaugh. An inventory for measuring depression. Arch. Gen. Psychiatry. 1961;4:561 Crossref
-  R.H. Benedict, I. Fishman, M.M. McClellan, R. Bakshi, B. Weinstock-Guttman. Validity of the Beck depression inventory-fast screen in multiple sclerosis. Mult. Scler.. 2003;9:393-396 Crossref
-  R.H. Benedict, D. Cookfair, R. Gavett, M. Gunther, F. Munschauer, N. Garg, et al. Validity of the minimal assessment of cognitive function in multiple sclerosis (MACFIMS). J. Int. Neuropsychol. Soc.. 2006;12:549-558
-  J. Bowen, L. Gibbons, A. Gianas, G.H. Kraft. Self-administered Expanded Disability Status Scale with functional system scores correlates well with a physician-administered test. Mult. Scler.. 2001;7:201-206
-  D. Cella, J.S. Lai, C.J. Nowinski, D. Victorson, A. Peterman, D. Miller, F. Bethoux, A. Heinemann, S. Rubin, J.E. Cavazos, A.T. Reder, R. Sufit, T. Simuni, G.L. Holmes, A. Siderowf, V. Wojna, R. Bode, N. McKinney, T. Podrabsky, K. Wortman, S. Choi, R. Gershon, N. Rothrock, C. Moy. Neuro-QOL: brief measures of health-related quality of life for clinical research in neurology. Neurology. 2012;78:1860-1867 Crossref
-  D. Cella, W. Riley, A. Stone, N. Rothrock, B. Reeve, S. Yount, D. Amtmann, R. Bode, D. Buysse, S. Choi, K. Cook, R. DeVellis, D. DeWalt, J.F. Fries, R. Gershon, E.A. Hahn, J.-S. Lai, P. Pilkonis, D. Revicki, M. Rose, K. Weinfurt, R. Hays. The Patient-Reported Outcomes Measurement Information System (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005–2008. J. Clin. Epidemiol.. 2010;63:1179-1194 Crossref
-  S.W. Choi, T. Podrabsky, N. McKinney, B.D. Schalet, K.F. Cook, D. Cella. PROsetta Stone Analysis Report: A Rosetta Stone for Patient Reported Outcomes. (Northwestern University, 2013) Available at: http://www.prosettastone.org/AnalysisReport/Documents/PROsettaStoneAnalysisReportVol1.pdf
-  D.S. Conway, D.M. Miller, R.G. O'Brien, J.A. Cohen. Long term benefit of multiple sclerosis treatment: an investigation using a novel data collection technique. Mult. Scler.. 2012;18:1617-1624 Crossref
-  P.C. Cozby. Methods of Behavioral Research. Tenth edition (McGraw-Hill, New York, NY, 2009)
-  S. D'Alisa, G. Miscio, S. Baudo, A. Simone, L. Tesio, A. Mauro. Depression is the main determinant of quality of life in multiple sclerosis: a classification-regression (CART) study. Disabil. Rehabil.. 2006;28:307-314 Crossref
-  D.M. Ehde, C.H. Bombardier. Depression in persons with multiple sclerosis. Phys. Med. Rehabil. Clin. N. Am.. 2005;16:437-448 (ix) Crossref
-  A. Feinstein. Multiple sclerosis, depression, and suicide. BMJ. 1997;315:691-692 Crossref
-  A. Feinstein. An examination of suicidal intent in patients with multiple sclerosis. Neurology. 2002;59:674-678 Crossref
-  A. Feinstein. Multiple sclerosis and depression. Mult. Scler.. 2011;17:1276-1281 Crossref
-  A. Feinstein, S. Magalhaes, J.F. Richard, B. Audet, C. Moore. The link between multiple sclerosis and depression. Nat. Rev. Neurol.. 2014;10:507-517 Crossref
-  S.J. Ferrando, J. Samton, N. Mor, S. Nicora, M. Findler, B. Apatoff. Patient Health Questionnaire-9 to screen for depression in outpatients with multiple sclerosis. Int. J. MS Care. 2007;9:99-103 Crossref
-  M. First, R. Spitzer, M. Gibbon, J. Williams. Structured Clinical Interview for DSM-IV-TR Axis I Disorders — Patient Edition (SCID-I/P, 4/2005 Revision). (Biometrics Research Department, New York State Psychiatric Institute, New York, 2005)
-  M. First, R. Spitzer, J. Williams, M. Gibbon. Structured Clinical Interview for DSM-IV (SCID). (American Psychiatric Association, Washington, DC, 1995)
-  R.C. Gershon, J.S. Lai, R. Bode, S. Choi, C. Moy, T. Bleck, D. Miller, A. Peterman, D. Cella. Neuro-QOL: quality of life item banks for adults with neurological disorders: item development and calibrations based upon clinical and general population testing. Qual. Life Res.. 2012;21:475-486 Crossref
-  Goldman Consensus Group. The Goldman Consensus statement on depression in multiple sclerosis. Mult. Scler.. 2005;11:328-337
-  D.A. Grimes, K.F. Schulz. Refining clinical diagnosis with likelihood ratios. Lancet. 2005;365:1500-1505 Crossref
-  D.D. Gunzler, A. Perzynski, N. Morris, R. Bermel, S. Lewis, D. Miller. Disentangling multiple sclerosis and depression: an adjusted depression screening score for patient-centered care. J. Behav. Med.. 2014; (Epub 2014/06/02)
-  K. Honarmand, A. Feinstein. Validation of the Hospital Anxiety and Depression Scale for use with multiple sclerosis patients. Mult. Scler.. 2009;15:1518-1524 Crossref
-  K. Kroenke, R.L. Spitzer, J.B. Williams. The Patient Health Questionnaire-2: validity of a two-item depression screener. Med. Care. 2003;41:1284-1292 Crossref
-  K. Kroenke, R.L. Spitzer, J.B.W. Williams. The PHQ-9. J. Gen. Intern. Med.. 2001;16:606-613 Crossref
-  K. Kroenke, R.L. Spitzer, J.B.W. Williams, B. Löwe. The Patient Health Questionnaire somatic, anxiety, and depressive symptom scales: a systematic review. Gen. Hosp. Psychiatry. 2010;32:345-359 Crossref
-  H. Liu, D. Cella, R. Gershon, J. Shen, L.S. Morales, W. Riley, R.D. Hays. Representativeness of the Patient-Reported Outcomes Measurement Information System Internet panel. J. Clin. Epidemiol.. 2010;63:1169-1178 Crossref
-  B. Lowe, J. Unutzer, C.M. Callahan, A.J. Perkins, K. Kroenke. Monitoring depression treatment outcomes with the Patient Health Questionnaire-9. Med. Care. 2004;42:1194-1201
-  S.L. Minden, A. Feinstein, R.C. Kalb, D. Miller, D.C. Mohr, S.B. Patten, C. Bever Jr., R.B. Schiffer, G.S. Gronseth, P. Narayanaswami. Evidence-based guideline: assessment and management of psychiatric disorders in individuals with MS: report of the Guideline Development Subcommittee of the American Academy of Neurology. Neurology. 2014;82:174-181 Crossref
-  D.C. Mohr, S.L. Hart, L. Julian, E.S. Tasch. Screening for depression among patients with multiple sclerosis: two questions may be enough. Mult. Scler.. 2007;13:215-219 Crossref
-  National Institute of Neurological Disorders and Stroke. User manual for the Quality of Life in Neurological Disorders (Neuro-QOL) measures. Version 1.0. http://www.neuroqol.org/HowDoI/UserManual/User%20Manual/Neuro-QOL_UserManual.pdf (2010) Available at:
-  R. Pandya, L. Metz, S.B. Patten. Predictive value of the CES-D in detecting depression among candidates for disease-modifying multiple sclerosis treatment. Psychosomatics. 2005;46:131-134 Crossref
-  S.B. Patten, C.A. Beck, J.V. Williams, C. Barbui, L.M. Metz. Major depression in multiple sclerosis: a population-based perspective. Neurology. 2003;61:1524-1527 Crossref
-  S.B. Patten, J.V. Williams, D.H. Lavorato, M. Koch, L.M. Metz. Depression as a predictor of occupational transition in a multiple sclerosis cohort. Funct. Neurol.. 2013;28:275-280
-  S.B. Patten, J.M. Burton, K.M. Fiest, S. Wiebe, A.G. Bulloch, M. Koch, et al. Validity of four screening scales for major depression in MS. Mult. Scler.. 2015; (Epub ahead of print)
-  P.A. Pilkonis, S.W. Choi, S.P. Reise, A.M. Stover, W.T. Riley, D. Cella. Item banks for measuring emotional distress from the Patient-Reported Outcomes Measurement Information System (PROMIS®): depression, anxiety, and anger. Assessment. 2011;18:263-283 Crossref
-  M. Pompili, A. Forte, M. Palermo, H. Stefani, D. Lamis, G. Serafini, M. Amore, P. Girardi. Suicide risk in multiple sclerosis: a systematic review of current literature. J. Psychosom. Res.. 2012;73:411-417 Crossref
-  PROMIS. Depression: A Brief Guide to the PROMIS Depression Instruments. Patient Reported Outcomes Measurement Information System (, 2014) (Available at: https://www.assessmentcenter.net/documents/PROMIS%20Depression%20Scoring%20Manual.pdf )
-  L.S. Radloff. The CES-D scale: A self-report depression scale for research in the general population. Appl. Psychol. Meas.. 1977;1:385-401 Crossref
-  P. Rohde, P.M. Lewinsohn, J.R. Seeley. Comparability of telephone and face-to-face interviews in assessing axis I and II disorders. Am. J. Psychiatry. 1997;154:1593-1598 Crossref
-  A. Senders, D. Hanes, D. Bourdette, R. Whitham, L. Shinto. Reducing survey burden: feasibility and validity of PROMIS measures in multiple sclerosis. Mult. Scler.. 2014;20:1102-1111 Crossref
-  G.E. Simon, D. Revicki, M. VonKorff. Telephone assessment of depression severity. J. Psychiatr. Res.. 1993;27:247-252 Crossref
-  K. Sjonnesen, S. Berzins, K.M. Fiest, A.G.M Bulloch, L.M. Metz, B.D. Thombs, S.B. Patten. Evaluation of the 9-item Patient Health Questionnaire (PHQ-9) as an assessment instrument for symptoms of depression in patients with multiple sclerosis. Postgrad. Med.. 2012;124:69-77 Crossref
-  M. Skokou, E. Soubasi, P. Gourzis. Depression in multiple sclerosis: a review of assessment and treatment approaches in adult and pediatric populations. ISRN Neurol.. 2012;2012:427102
-  R.L. Spitzer, K. Kroenke, J.B.W. Williams. Validation and utility of a self-report version of PRIME-MD. JAMA. 1999;282:1737-1744 Crossref
-  R.L. Spitzer, J.B. Williams, K. Kroenke, M. Linzer, F.V. deGruy III, S.R. Hahn, D. Brody, J.G. Johnson. Utility of a new procedure for diagnosing mental disorders in primary care. The PRIME-MD 1000 study. JAMA. 1994;272:1749-1756 Crossref
-  StataCorp. Stata Statistical Software: Release 12. (StataCorp LP, College Station, TX, 2011)
-  M.J. Sullivan, B. Weinshenker, S. Mikail, S.R. Bishop. Screening for major depression in the early stages of multiple sclerosis. Can. J. Neurol. Sci.. 1995;22:228-231
-  M. Tarrants, M. Oleen-Burkey, J. Castelli-Haley, M.J. Lage. The impact of comorbid depression on adherence to therapy for multiple sclerosis. Mult. Scler. Int.. 2011;2011:271321
-  M.H. Verdier-Taillefer, V. Gourlet, R. Fuhrer, A. Alperovitch. Psychometric properties of the Center for Epidemiologic Studies-Depression scale in multiple sclerosis. Neuroepidemiology. 2001;20:262-267 Crossref
-  T.M. Watson, E. Ford, E. Worthington, N.B. Lincoln. Validation of mood measures for people with multiple sclerosis. Int. J. MS Care. 2014;16:105-109 Crossref
-  M.M. Weissman, D. Sholomskas, M. Pottenger, B.A. Prusoff, B.Z. Locke. Assessing depressive symptoms in five psychiatric populations: a validation study. Am. J. Epidemiol.. 1977;106:203-214
-  M.A. Whooley, A.L. Avins, J. Miranda, W.S. Browner. Case-finding instruments for depression. Two questions are as good as many. J. Gen. Intern. Med.. 1997;12:439-445 Crossref
-  J.W. Williams Jr., P.H. Noel, J.A. Cordes, G. Ramirez, M. Pignone. Is this patient clinically depressed?. JAMA. 2002;287:1160-1170
-  W.J. Youden. Index for rating diagnostic tests. Cancer. 1950;3:32-35 Crossref
-  A.S. Zigmond, R.P. Snaith. The hospital anxiety and depression scale. Acta Psychiatr. Scand.. 1983;67:361-370 Crossref
Department of Rehabilitation Medicine, University of Washington, Seattle, WA, USA
⁎ Corresponding author at: University of Washington, Box 354237, Seattle, WA 98195, USA.
© 2015 Elsevier Inc., All rights reserved.