If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Department of Geriatric Psychiatry, Institute of Mental Health, SingaporePsychotherapy Service, Institute of Mental Health, SingaporeSaw Swee Hock School of Public Health, National University of Singapore, Singapore
Brief screening scales for caregiver burden are much needed in routine dementia services to efficiently identify caregivers of persons with dementia (PWD) for further intervention. Although the 22-item Zarit Burden Interview (ZBI) is often used, its available screening versions have not performed as well as the full version in distinguishing significant burden. We developed a brief screening scale that is valid and comparable to ZBI in distinguishing caregiver burden.
Design and setting
Baseline data of an ongoing cohort study.
Family careivers of community-dwelling PWD (n = 394).
Participants completed questionnaires containing ZBI and other caregiving scales. Initially, we split the study samples into 2—the derivation sample (n = 215) was used to develop a brief scale that best distinguishes significant burden (using the best-subset approach with 10-fold cross-validation), whereas the validation sample (n = 179) verified its actual performance in distinguishing significant burden. We then evaluated the derived scale in its internal consistency reliability, factorial validity, known group validity, and construct validity, and mapped the scores between the brief scale and ZBI using the equipercentile equating method.
We derived a 3-item scale which had comparable performance to ZBI in distinguishing significant burden (area under the receiver operating characteristic curve 0.86, 95% confidence interval 0.81-0.92). It had a single dimension in exploratory factor analysis and maintained good psychometric properties similar to those of ZBI. It also explained 77.8% of the variability in ZBI, and had scores that could be mapped to ZBI with reasonable precision.
Conclusions and Implications
We have derived a highly accessible tool to screen for caregiver burden, which can have a wider health system effect of expanding the reach of caregiver-focused interventions to services involved in the care of PWD. Notably, this screening tool was developed using rigorous methods and demonstrated comparability to ZBI in its validity, reliability, and total scores.
which is understandable because persons with dementia (PWD) become increasingly dependent on their family members as they lose the ability to care for themselves. The costs are only expected to rise further as the number of PWD is projected to triple from 46.8 million in 2015 to 131.5 million in 2050.
However, despite being well evidenced in research settings, the scale is relatively lengthy and can increase the burden of administration to the caregivers. Several shorter versions of ZBI have been developed in an attempt to address this shortcoming.
This may raise concerns about the validity of these screening scales in distinguishing the caregiver burden, considering that burden and depression are commonly conceptualized as 2 ends of the same spectrum (under the diathesis stress model),
with depression being the manifestation of high and significant burden.
To address the challenges, we sought (as the primary aim) to develop a brief burden scale with the fewest items possible, and evaluate its performance in distinguishing caregiver burden. Additionally, we had 2 secondary aims to demonstrate that this new brief scale has (1) properties of reliability and validity that are similar to those expected of ZBI and (2) scores that can be accurately mapped to the total scores of ZBI to demonstrate comparability.
Participants and Procedures
This study was based on the baseline data of an ongoing cohort study, where we consecutively sampled caregivers who accompanied the PWD to the dementia services of 2 tertiary hospitals in Singapore (Institute of Mental Health and Khoo Teck Puat Hospital). Our inclusion criteria comprised (1) spouses or children of PWD, (2) caring for PWD who is residing in the community, and (3) age ≥21 years. At the point of recruitment, the participants completed on site a set of self-administered questionnaires which included ZBI and a depression scale [Center for Epidemiologic Studies–Depression Scale (CES-D)].
Participants from one of the recruitment sites Khoo Teck Puat Hospital also completed an additional scale assessing caregiving gains [Gain in Alzheimer care INstrument (GAIN)].
A total of 394 participants were recruited—215 (54.6%) from Hospital 1 and 179 (45.4%) from Hospital 2—with a total response rate of 88%. Ethics approval was granted by the Domain Specific Review Board of Singapore.
ZBI is a 22-item scale that assesses the perceived burden experienced by caregivers of older persons.
The items are self-administered by the caregivers on 5-point Likert-type scales and summed to generate a total score ranging from 0 to 88. CES-D comprises 20 items that measure the frequency of depressive symptoms over the past week using 4-point Likert-type scales. The total score ranges from 0 to 60, with scores ≥16 suggestive of significant depression.
Participants chose the description that best matched the PWD: still capable of independent living (mild stage), needs some assistance with daily living (moderate stage), or needs round-the-clock supervision (severe stage). This brief measure was previously shown to have adequate agreement with Clinical Dementia Rating Scale (kappa 0.56-0.6),
The presence of severe behavioral problem was indirectly measured through the need for admission to a geriatric psychiatry ward, indicating behavioral problems that were too severe to be managed in the community setting.
For the primary aim (scale development), we split the study samples into 2 (derivation sample and validation sample)—the derivation sample (based on participants from Institute of Mental Health, n = 215) was used to develop a brief scale that can best distinguish significant burden, whereas the validation sample (based on participants from Khoo Teck Puat Hospital, n = 179) was used to evaluate the actual performance of this brief scale in distinguishing significant burden.
In the derivation sample (n = 215), we employed the best-subset approach
with 10-fold cross-validation to select the scale items in ZBI that can best distinguish significant burden. Significant burden was indirectly identified by the presence of significant depression in caregivers (CES-D ≥ 16),
It uses logistic regression to exhaustively evaluate all possible combinations of the 22 items from ZBI, and narrows down to a list of top models that have the lowest prediction errors. It then selects the best model using 10-fold cross-validation; this is done by randomly dividing the sample into 10 folds of equal size, cross-validating the prediction error within the 10 folds, and selecting the least complex model that is within 1 standard error of the best model (commonly described as the “1 standard error” rule which ensures that the selected model remains stable and can be consistently replicated even in other independent samples).
The selected model would then constitute the new, brief scale [henceforth referred to as the ZBI–Screening version (ZBI-S)]. In the validation sample (n = 179), we evaluated the actual performance of ZBI-S in distinguishing significant burden, by computing the AUROC. In general, an AUROC of >0.8 is considered excellent performance.
As part of the secondary aims, we evaluated whether ZBI-S maintained psychometric properties that were expected of a caregiver burden scale. Specifically, we assessed its psychometric properties in the full sample (n = 394) with respect to internal consistency reliability, factorial validity, known group validity, and construct validity. Internal consistency reliability was assessed with Cronbach alpha and McDonald omega,
Factorial validity was assessed with exploratory factor analysis using maximum-likelihood estimation methods and oblique rotation (oblimin), with the number of factors in exploratory factor analysis identified using Horn parallel analysis.
Known group validity was assessed by comparing the mean scores based on variables that have been known to defer in the levels of caregiver burden, including primary caregiving role, coresidence with the PWD, severity of dementia, and presence of behavioral problems in the PWD.
Construct validity was assessed using the Pearson correlation coefficient (r), with the hypotheses regarding construct validity further described in Appendix 1.
As part of the other secondary aims, we also evaluated whether the scores of ZBI-S can be accurately mapped to those of ZBI. Using the full sample (n = 394), we mapped ZBI-S scores to the original 22-item ZBI using the equipercentile equating method with log-linear smoothing.
This method does not require any assumption on the score distribution, and can ensure that the mapped scores always fall within the range of the intended scale. Log-linear smoothing was applied to avoid an irregular distribution of the scores. The 95% confidence intervals (CIs) of the mapped scores were computed using 1000 bootstrap samples.
were performed in R (version 3.5.1). The other analyses were conducted in Stata (version 14).
Demographic information of the 394 participants is presented in Appendix 2. The participants had a mean age of 53.0 years (SD 10.7), with the majority being Ethnic 1 (86.6%), children caregivers (86.3%), and primary caregivers (70.8%).
In the derivation sample (n = 215), the exhaustive search method identified a list of top models as presented in Appendix 3. As shown in Figure 1, the 10-fold cross-validation then selected the 3-item model as the most parsimonious model based on the established 1-standard error rule.
In the validation sample (n = 179), the selected 3-item scale (ZBI-S) had excellent performance in distinguishing significant burden, with an AUROC of 0.86 (95% CI 0.80-0.91), and a sensitivity of 0.86 and a specificity of 0.73 at the optimal cut-off score of ≥4 (Appendix 4). Notably, the AUROC of ZBI-S was not significantly different from that of the 22-item ZBI (P = .710) (Table 1). In contrast, the previously known 1- and 4-item screening versions of ZBI
had significantly worse AUROC than the original ZBI (P = .008 and .001 respectively) when they were evaluated in our sample (Table 1). The 3 screening versions of ZBI (1-, 4-, and 3-item variants) are separately shown in Appendix 5 for reference purposes.
Table 1Performance of the Screening Versions of ZBI in Identifying Significant Burden
Significant burden was indirectly identified by the presence of significant depression (Center for Epidemiologic Studies-Depression score ≥16) in caregivers, which indicated the definitive need for further interventions.
in the Validation Sample (n = 179), and a Comparison With the Performance of the Original 22-Item Version
∗ Significant burden was indirectly identified by the presence of significant depression (Center for Epidemiologic Studies-Depression score ≥16) in caregivers, which indicated the definitive need for further interventions.
† P values represent the statistical significance of the difference in AUROC between a ZBI variant and the original 22-item version.
‡ The optimal cut-off score is based on a balance between sensitivity and specificity, with a preference for slightly higher sensitivity to reduce the false negative rates in screening scales.
We then evaluated the psychometric properties of ZBI-S in the full sample (n = 394). Despite being much briefer, the 3-item ZBI-S had an acceptable internal consistency reliability (alpha = 0.78, 95% CI 0.74-0.82; omega = 0.80, 95% CI 0.76-0.83). In exploratory factor analysis, ZBI-S demonstrated only 1 dimension among its scale items (the scree plot and the factor loading are shown in Appendix 6 and 7, respectively). The results for known group and construct validities were consistent with the characteristics expected of a caregiver burden scale (Appendices 8-12).
The 3-item ZBI-S explained 77.8% of variance in ZBI (based on the results from R-squared). As shown in Table 2, its scores can be mapped to those of ZBI, with a precision of approximately ±3 in their 95% CI.
Table 2Equivalent ZBI Scores for a Given ZBI-S Score
This study developed a 3-item screening scale (ZBI-S) for caregiver burden in dementia caregiving, using a rigorous method of item selection (through exhaustive search of all possible combinations of items), and following the well-established processes of derivation, cross-validation, and independent validation. Unlike the previous screening versions of ZBI, the new ZBI-S was developed with reference to an external criterion and, consequently, demonstrated excellent performance in distinguishing significant burden (similar to that of ZBI, and comparatively better than the previously known 1- or 4-item variant of ZBI). Notably, ZBI-S also inherited the key properties of the original ZBI (the best-known scale to date for caregiver burden),
having properties of reliability and validity that are consistent with those of ZBI, explaining most of the variance in ZBI, and having scores that could be mapped back to ZBI with reasonable precision.
As demonstrated in this study, the 3-item ZBI-S can be as useful as the original 22-item ZBI (and better than the previous screening versions of the ZBI) in identifying caregivers with significant burden who might benefit from further intervention. The demonstrable score mapping offers the brief scale as a viable alternative when the original ZBI cannot be feasibly administered, as well as affords comparability across clinical sites that administer the 2 different versions. With only 3 items, ZBI-S can make screening of caregiver burden more accessible to clinical practitioners. It may thus have a wider health system effect of promoting caregiver-focused evaluations in clinical services that are involved in the care of PWD (including those in primary care and social care settings) and potentially expand the reach of caregiver-focused interventions beyond specialized dementia services.
Although there can be other alternative methods to derive a brief scale, the best-subset approach that we adopted has the strength of producing the most efficient brief scale, which has the least number of items yet remains comparable to ZBI. Although brief scales are not uncommonly fraught with deficiencies in psychometric properties,
our approach (to exhaustively search for the best subset in ZBI) ensures that ZBI-S inherits the characteristics of the original scale and minimizes the possibility of altering the psychometric properties of the brief scale in relation to the original scale.
Several limitations should be considered. First, the participants were recruited only from tertiary dementia services. However, they should largely still represent those in the community, because most of the PWD in Singapore receive their dementia care from tertiary centers, and the 2 recruitment centers in this study are the only 2 dementia services that serve the population in the North-East of Singapore. Second, the proportion of spousal caregivers in this study was relatively lower than that of children caregivers. However, this probably is not due to sampling bias considering that our proportion of spousal caregivers (13.7%) is not dissimilar to the 16.0% reported in a separate study based on a nationally representative sample.
Third, one may argue that the external criterion in this study (CES-D) may not reflect a definitive diagnosis of depression to indicate the caregivers' need for further intervention. However, a recent meta-analysis had demonstrated the excellent performance of CES-D ≥16 in identifying clinical depression, with a pooled AUROC of 0.87.
In conclusion, this study has procured a highly accessible tool to screen for caregiver burden, which can have a wider health system effect of expanding the reach of caregiver-focused interventions to clinical and social services that are involved in the care of PWD. Notably, this 3-item tool was developed using rigorous methods and demonstrated comparability to ZBI in its validity, reliability, and total scores.
Appendix 1. Our Predefined Hypotheses Regarding the Construct Validity of the New ZBI-S (Zarit Burden Interview–Screening Version)
Construct validity was assessed using the Pearson correlation coefficient (r), with correlation coefficient of >0.50 considered strong while values ≤ 0.50 are considered weak or moderate.
We expected ZBI-S to demonstrate the following 4 characteristics consistent of a caregiver burden scale:
It should correlate strongly (r > 0.50) with constructs related to caregiver depression, such as with the Depressed Affect subscale of the CES-D and the Somatic Symptoms subscale of the CES-D. This is because burden and depression has been viewed as 2 related constructs within the same spectrum,
with more severe burden manifesting as depression.
It should correlate less strongly (r ≤ 0.50) with the Positive Affect and Interpersonal Problems subscales of the CES-D because caregiver burden is expected to differ from constructs such as positive feelings, or the feeling that others are being critical.
It should correlate less strongly (r ≤ 0.50) with Gain in Alzheimer care INstrument (GAIN). This is because GAIN, which measures positive outcomes in caregiving, is a different construct from caregiver burden and has only been shown to correlate weakly with ZBI.
Significant burden was indirectly identified by the presence of significant depression (Center for Epidemiologic Studies–Depression score ≥16) in caregivers, which indicated the definitive need for further interventions.
AUROC, area under the receiver operating characteristics curve; CI, confidence interval, ZBI-S, Zarit Burden Interview–Screening version.
The optimal cut-off score is highlighted in bold.
∗ Significant burden was indirectly identified by the presence of significant depression (Center for Epidemiologic Studies–Depression score ≥16) in caregivers, which indicated the definitive need for further interventions.
† The optimal cut-off score based on a balance between sensitivity and specificity, with a preference for slightly higher sensitivity to reduce the false-negative rates in the screening scale.
Appendix 5. The Scale Items in the 3 Screening Versions of Zarit Burden Interview (ZBI)
The numbers to the left of the scale items correspond to the item numbers in the original ZBI. Each item is rated on a 5-point Likert scale based on how often a caregiver experiences a specific feeling when providing care (0 = never; 1 = rarely; 2 = sometimes; 3 = quite frequently; 4 = nearly always).
This research was supported by the Singapore Ministry of Health's National Medical Research Council under the Centre Grant Program (grant no. NMRC/CG/004/2013). It also received pilot funding from the National University of Singapore. Separately, the first author (T.M.L.) was supported by research grants under the Singapore Ministry of Health's National Medical Research Council (NMRC) (grant no. NMRC/Fellowship/0030/2016 and NMRC/CSSSP/0014/2017). The funding sources had no involvement in any part of the project.