jpad journal

AND option

OR option



D.A. Loewenstein, R.E. Curiel Cid, M. Kitaigorodsky, E.A. Crocco, D.D. Zheng, K.L. Gorman


Center for Cognitive Neuroscience and Aging, Department of Psychiatry and Behavioral Sciences, University of Miami Miller School of Medicine, 1695 NW 9th Avenue, Miami, Florida,. U.S.A.

Corresponding Authors: David A. Loewenstein, PhD, ABPP-CN; Director, Center for Cognitive Neuroscience and Aging; Professor of Psychiatry and Behavioral Sciences; Professor of Neurology; University of Miami, 1695 NW 9th Ave, Suite 3202, Miami, FL 33136;; Phone: (305) 355-7016; Fax: (305) 255-9076

J Prev Alz Dis 2021;
Published online January 18, 2021,



Background: Difficulties in inhibition and self-monitoring are early features of incipient Alzheimer’s disease and may manifest as susceptibility to proactive semantic interference. However, due to limitations of traditional memory assessment paradigms, recovery from interference effects following repeated learning opportunities has not been explored.
Objective: This study employed a novel computerized list learning test consisting of repeated learning trials to assess recovery from proactive and retroactive semantic interference.
Design: The design was cross-sectional.
Setting: Participants were recruited from the community as part of a longitudinal study on normal and abnormal aging.
Participants: The sample consisted of 46 cognitively normal individuals and 30 participants with amnestic mild cognitive impairment.
Measurements: Participants were administered the Cognitive Stress Test and traditional neuropsychological measures. Step-wise logistic regression was applied to determine which Cognitive Stress Test measures best discriminated between diagnostic groups. This was followed by receiver operating characteristic analyses.
Results: Cued A3 recall, Cued B3 recall and Cued B2 intrusions were all independent predictors of diagnostic status. The overall predictive utility of the model yielded 75.9% sensitivity, 91.1% specificity, and an overall correct classification rate of 85.1%. When these variables were jointly entered into receiver operating characteristic analyses, the area under the curve was .923 (p<.001).
Conclusions: This novel paradigm’s use of repeated learning trials offers a unique opportunity to assess recovery from proactive and retroactive semantic interference. Participants with mild cognitive impairment exhibited a continued failure to recover from proactive interference that could not be explained by mere learning deficits.

Key words: Proactive semantic interference, retroactive semantic interference, prodromal Alzheimer’s disease, mild cognitive impairment, intrusions.



Hasher and Zacks (1988) first described age-related changes in inhibitory processes that diminished the ability to ignore distracting information (1). This was confirmed in subsequent studies (2-4). Difficulties in inhibitory processes and self-monitoring have also been seen as early features of incipient Alzheimer’s disease (AD; 5-8). Loewenstein and colleagues (2004) posited that learning deficits are related to deficiencies in the semantic network and found that proactive interference of competing to-be-remembered lists of semantically related targets were especially sensitive to the mild cognitive impairment (MCI) stages of AD (6). Curiel et al (2013) employed a novel paradigm (9), the Loewenstein and Acevedo Scales for Semantic Interference and Learning (LASSI-LTM) that required learning a list of 15 target items representing three semantic categories (fruits, musical instruments, and articles of clothing). Maximal learning was facilitated by category cues at both acquisition and recall. Proactive semantic interference (PSI) and the failure to recover from PSI (frPSI) were assessed by having the examinee attempt to learn 15 new targets on List B (representing the identical semantic categories used for List A targets) over two additional learning trials while using these identical category cues during both acquisition and retrieval.
Subsequent studies conducted in independent cohorts in the United States and other countries have found that performance deficits on the LASSI-L were superior to several traditionally used memory tests (e.g., list learning measures, delayed paragraph recall) in distinguishing between cognitively normal older adults and those with preclinical AD or early and late stage MCI. Various studies on the LASSI-L have related these early cognitive changes to biological markers of AD such as in-vivo amyloid imaging (10-12) and neurodegeneration measured by magnetic resonance imaging (MRI; 13-14), functional MRI (15), and fluorodeoxyglucose positron emission tomography (PET/CT; 16). In a majority of these studies, AD pathology was more associated with deficits in frPSI than impairments in initial PSI. Using Receiver Operator Characteristic Curve (ROC) analyses, Matias-Guiu and colleagues found that the LASSI-L was superior to the Free and Cued Selective Reminding Test (FCSRT), in detecting MCI patients with suspected AD (16) and in differentiating both early and late stage MCI individuals from cognitively normal older adults.
It has been proposed that both PSI and frPSI can be assessed in different manners (11). These include the number of correct responses on List B relative to List A or the number of semantic intrusions rendered on List B recall trials. In one recent study, MCI patients that were amyloid positive and had presumptive AD evidenced significantly more intrusion errors than MCI participants who had a clinical history consistent with AD but were amyloid negative, or MCI participants diagnosed with other neurological and neuropsychiatric conditions who were also amyloid negative (11).
The finding that frPSI is particularly sensitive to incipient AD raises an interesting theoretical as well as empirical question. Will deficits in frPSI continue in the presence of additional opportunities to learn two competing semantic word lists? That is, could extending additional opportunities to learn both List A and List B provide deeper insights into initial learning deficits in aMCI participants at risk for AD, as well as their ability to completely recover from PSI deficits over time? An additional question is whether the failure to recover from retroactive interference (frRSI) is an issue in persons with aMCI. These issues have not been addressed by the LASSI-L and other paradigms.
To test the abovementioned potential limitations of this novel assessment paradigm, we employed the Cognitive Stress Test (CST). The CST required learning of 18 targets words, all of which belonged to one of three semantic categories: occupations, household items and types of transportation. Identical category cues were provided during each of the three learning trials as well as during each of the three cued recall trials for each list. This provided a unique opportunity to directly assess the immediate and persistent effects of semantic interference over multiple trials. In addition, we assessed the ability to recover from retroactive semantic interference, which has not been previously examined in aMCI and AD research. We hypothesized that failure to recover from proactive semantic interference would continue to be problematic for individuals with aMCI despite multiple trials that would allow the recovery from these deficits.



Participants were part of an NIH-funded longitudinal study on normal and abnormal aging. All participants provided informed consent for this IRB-approved study. In this investigation, we carefully selected 46 individuals classified as cognitively normal (CN) and 30 participants with amnestic mild cognitive impairment (aMCI). Inclusion and exclusion criteria are as follows.

Cognitively normal group (n=46)

Participants were classified as CN if there were: a) no subjective cognitive complaints made by the participant and/or a collateral informant; b) no evidence of memory or other cognitive decline after an extensive interview with the participant and an informant; c) Global Clinical Dementia Rating (CDR) scale score of 0 (17); and d) all memory (e.g.: Hopkins Verbal Learning Test, Revised (HVLT-R; 18) or delayed paragraph recall from the National Alzheimer’s Coordinating Center Uniform Data Set (NACC UDS; 19) and non-memory measures (e.g., Category Fluency (20), Trails A and B (21), WAIS-IV Block Design subtest (22)) were less than 1.0 standard deviation below normal limits for age, education, and language group.

Amnestic MCI group (n=30)

Participants were classified as aMCI if: (a) they fulfilled Petersen’s criteria (23) for MCI, b) subjective cognitive complaints were reported by the participant and/or collateral informant; c) Global CDR scale score was 0.5; and d) delayed recall was impaired (i.e., 1.5 standard deviations or more below the mean, accounting for age, education, and language of testing) on either the HVLT-R or delayed paragraph recall from the NACC UDS.

Exclusion Criteria for all study groups

Exclusion criteria included significant sensory or motor deficits (e.g., visual or hearing impairment, paralysis) or literacy lower than the 6th grade level on the WRAT-4 (24) evidenced during the clinical evaluation by Drs. Loewenstein or Curiel and judged to preclude completion of the study measures; 2) DSM-5 diagnosis of major depressive disorder, bipolar disorder, current psychotic disorder, substance use disorder or any DSM-5 Axis 1 diagnosis after an extensive interview by the study clinicians using the SCID (25). Individuals with major depressive disorder were excluded from the study given that this condition often results in attention and/or concentration difficulties and psychomotor slowing that may adversely affect test performance on neuropsychological measures. Individuals with major neurocognitive disorder were not included in this sample.

Cognitive Stress Test (CST)

We employed a novel computerized measure called the Cognitive Stress Test (CST) that expands upon our previous work with the widely-studied Loewenstein-Acevedo Scale for Semantic Interference and Learning (LASSI-L), including the computerized version of the LASSI-L which has evidenced high test-retest reliability and discriminative validity (Curiel et al, in press). The CST employs the following: 1) semantic cuing at both acquisition and retrieval of 18 List A targets representing three semantic categories (occupations, household items, or types of transportation) over three initial learning trials, 2) three consecutive presentations of a second list of 18 new targets (List B) representing the same categories as the first list to examine PSI and frPSI, and 3) use of category cues to elicit recall of List A targets to assess retroactive semantic interference (RSI), with an additional learning trial to examine failure to recover from retroactive semantic interference (frRSI). The CST represents an exciting approach to preclinical AD assessment in that it builds upon our previous work and is a fully computer-administered web-based task, which facilitates remote deliverability, reduces the need for a skilled psychometrist, and allows for automatically recording of correct responses, intrusions and other errors.

Statistical Analyses

Statistical analyses were conducted using SPSS Version 26. First, age, gender, education, and language of testing and then global cognitive function were evaluated between diagnostic groups using one-way ANOVAs and Chi-square analyses with Yate’s Correction for Discontinuity. CST cued recall and intrusion scores were compared using ANOVA while adjusting for factors that were distributed differently between diagnostic groups. The alpha cutoff value was adjusted using the Bonferroni correction for multiple comparisons. Step-wise logistic regression models were employed to determine the best independent classification using CST variables. These were followed by a ROC analysis examining significant independent predictors with regards to area explained under the ROC curve.



As depicted in Table 1, there were no statistically significant differences between aMCI and CN groups with regards to mean age, education and language of testing. Participants in the aMCI group had lower mean Mini-Mental State Examination (MMSE) scores (26) and there were more males in the aMCI group than the CN group.
Table 2 indicates that individuals with aMCI had lower scores on all CST trials . After adjusting for baseline differences in MMSE scores and using sex as a covariate, aMCI participants scored lower on all three List A initial learning trials and all three List B trials susceptible to PSI and frPSI. After covariate adjustment, there were no aMCI and CN differences on recall trials susceptible to retroactive semantic interference (RSI) or the ability to recover from RSI (frRSI). Table 3 denotes intrusion errors across the different CST trials. With and without adjustment for covariates, the only measures that differentiated groups were semantic intrusions on List B1 (which measures PSI), List B2 (which measures frPSI) and List B3 (which measures persistent frPSI after repeated learning trials).

Table 1. Demographics by Diagnostic Group

Table 2. CST Cued Recall Scores by Diagnostic Group

*Values survived Bonferroni Correction at 0.05/8=0.00625

Table 3. CST Intrusion Errors by Diagnostic Group

*Values survived Bonferroni Correction at p<.05

We calculated PSI, the initial failure to recover from proactive semantic interference after 1 additional learning trial (frPSI1), and the persistence of proactive semantic interference after 2 additional learning trials (frPSI2). PSI was calculated using the ratio of Cued B1 Recall to Cued A1 Recall. FrPSI1 was calculated using the ratio of Cued B2 Recall to Cued A2 Recall. FrPSI2 was calculated using the ratio of Cued B3 Recall to Cued A3 Recall.
There were no aMCI versus CN differences in the Cued B1/ Cued A1 ratio (F(1.74)= 1,59; p=.211). However, aMCI participants demonstrated more frPSI1 (F(1.74)= 8,25; p=.005) and frPSI2 (F(1.74)=19,45; p<.001). As depicted in Table 4, on the Cued B3 recall trial, which followed two additional learning trials of List B items, CN participants were able to recover so that they could recall an average of 88.6% of the that they recalled during Cued A3 recall. In contrast, participants with aMCI were only able to recover an average of 67.4% of the items that they recalled during Cued A3 recall.
Step-wise logistic regression was employed to determine which of the initial learning and PSI measures best discriminated between aMCI and CN groups. As indicated in Table 5, Cued A3 recall, Cued B3 recall and Cued B2 intrusions were predictors of diagnostic status. The overall predictive utility of the model yielded 75.9% sensitivity and 91.1% specificity, and overall correct classification rate of 85.1%. When these variables were jointly entered into ROC analyses, the area under the ROC curve was .923 (p<.001).

Table 4. Proactive Interference and Failure to Recover from Proactive Interference Ratios

Table 5. Step-Wise Logistic Regression Using Measures of Initial Learning and Susceptibility to Proactive Interference to Distinguish Amnestic Mild Cognitive Impairment and Cognitively Normal Groups

*Model at step 3 yielded 75.9%. sensitivity and 91.1% specificity (overall classification 85.1%)


The current investigation used a novel computerized paradigm with semantically competing target word lists, the Cognitive Stress Test, to investigate whether the effects of proactive semantic interference (PSI) and the initial failure to recover from PSI (frPSI) would persist or diminish with additional learning trials. The obtained pattern of results indicated that, despite repeated administrations of the second list, participants with amnestic MCI had a persistent failure to recover from proactive semantic interference (frPSI). This cannot be explained by mere learning deficits alone since proactive semantic interference deficit ratios adjusted for initial learning on the corresponding trial of List A targets. The unique nature of proactive semantic interference deficits was also evidenced by increased intrusion errors, which were produced by aMCI participants on Cued B1, Cued B2 and Cued B3 trials but not on additional trials of List A susceptible to retroactive interference. In fact, no measure of retroactive interference reached statistical significance, which is consistent with the notion that PSI is uniquely related to early cognitive function in older adults with aMCI at risk for AD (17, 27). Previous studies have suggested that PSI effects may be more associated with MCI and early AD than RSI (27-28). In contrast, in 2012 Ricci and colleagues (29) found RSI but lack of PSI effects using the Rey Auditory Verbal learning Test (RAVLT). It should be noted, however, that the RAVLT list-learning task did not specifically elicit semantic interference, which is the focus of the current investigation.
Unlike previous studies, the current investigation incorporated multiple trials of two sets of 18 different targets, each belonging to one of three semantic categories. The current findings suggest that even after repeated learning trials, aMCI participants are not able to overcome the effects of semantic interference. Our finding of a combined area under the ROC curve exceeding .92 for Cued A3 Recall, Cued B3 recall and Cued B2 intrusions indicates that aMCI participants have deficits in initial learning as well as a failure to recover from proactive interference. The latter is evidenced by increasing deficits in recall of List B relative to List A targets over time (percentage of correct responses), as well as intrusion errors on measures susceptible to proactive interference and the failure to recover from proactive interference. This suggests that different measures of failure to recover from proactive semantic interference may have different biological underpinnings. Indeed, using the LASSI-L, which only affords one opportunity to recover from proactive semantic interference, Cued B2 recall was correlated with atrophy in AD prone regions (13-14). In contrast, Loewenstein et al., (2018) showed that it was not Cued B2 recall but Cued B2 semantic intrusions that could differentiate between MCI groups who were amyloid positive versus other MCI groups who were amyloid negative (11), suggesting the potential specificity of intrusion errors as a cognitive breakdown associated with AD brain pathology. Similarly, Sanchez and colleagues (2017) found that among clinically asymptomatic middle-age offspring of AD parents, Cued B2 intrusions were highly related to abnormal limbic connectivity issues on fMRI (15).
Torres et al. (2019) conducted a qualitative analysis on List B intrusion errors and found that the vast majority were incorrect recall of List B targets followed by semantic errors related to the List B target but not explicitly derived from List A (30). This indicates potential disruptions in cortical-limbic difficulty observed by others (13) and suggests that semantic intrusions represent potentially greater deficits in executive inhibitory processes that allow the individual to access source memory and inhibit previously learned responses.
Strengths of the current paradigm include computerized and uniform administration of three learning trials of 18 targets (representing three different categories) to assess maximum learning using cues at both the encoding and retrieval stages. When applied to three additional trials of 18 different targets (representing identical semantic categories), there was a unique opportunity to study proactive interference and failure to recover from proactive interference (as assessed by the ratio of correct recall on List B to correct recall of List A on the same trial) and semantic intrusions. Participants were comprehensively assessed by both clinical and neuropsychological assessment and compared to older adults of equivalent age with similar educational attainment. There did not appear to be any issues with ceiling or floor effects using 18 to-be-remembered targets, which may have been related to adequate category cues provided at acquisition and retrieval. Finally, the CST was not used in diagnostic formulation to avoid potential issues with circularity.
Potential limitations of the study involve relatively modest numbers of participants and lack of longitudinal follow-up. We intend to keep recruiting and following these participants and obtaining both structural MRI as well as amyloid and tau PET. Future work with fMRI may further elucidate the mechanisms underlying the inability of aMCI participants to break free from the effects of semantic interference when provided with additional learning opportunities. Normal controls appear to be able to increasingly recover from proactive semantic interference effects over time, but this does not hold true with individuals with aMCI. Further exploration into this phenomenon has significant theoretical and clinical implications.


Funding: R01AG061106-02 Loewenstein, David, PI; Florida Department of Health Ed and Ethel Moore Grant #8AZ23. The sponsors had no role in the design and conduct of the study; in the collection, analysis, and interpretation of data; in the preparation of the manuscript; or in the review or approval of the manuscript

Conflict of interest: This study was. supported by the National Institute on Aging (NIA). The CST measure was developed by and is intellectual property held by Drs. Loewenstein and Curiel at the University of Miami.

Ethical standards:This study was IRB approved and met all national and international standards for the protection of human subjects.



1. Hasher, L, Zacks, RT. Working memory, comprehension, and aging: A review and a new view. In Bower GH (ed) Psychol Learn Motive 1988;22:193–225.
2. Amieva H, Phillips LH, Della Sala S, et al. Inhibitory functioning in Alzheimer’s disease. Brain 2004;127:949-64.
3. Collette F, Amieva H, Adam S, et al. Comparison of inhibitory functioning in mild Alzheimer’s disease and frontotemporal dementia. Cortex 2007;43(7):866-874.
4. Clapp WC, Gazzaley A. Distinct mechanisms for the impact of distraction and interruption on working memory in aging. Neurobiol Aging 2012;33(1):134-148.
5. Belleville S, Bherer L, Lepage E, et al. Task switching capacities in persons with Alzheimer’s disease and mild cognitive impairment. Neuropsychologia 2008;46(8):2225-2233.
6. Loewenstein DA, Acevedo A, Luis C, et al. Semantic interference deficits and the detection of mild Alzheimer’s disease and mild cognitive impairment without dementia. J Int Neuropsychol Soc 2004;10(1):91-100.
7. Dewar M, Pesallaccia M, Cowan N, et al. Insights into spared memory capacity in amnestic MCI and Alzheimer’s disease via minimal interference. Brain Cogn 2012;78(3):189-199.
8. Aurtenetxe S, García-Pacios J, Del Río D, et al. Interference impacts working memory in mild cognitive impairment. Front Neurosci 2016;10:443.
9. Curiel RE, Crocco E, Acevedo A, et al. A new scale for the evaluation of proactive and retroactive interference in mild cognitive impairment and early Alzheimer’s disease. J Aging Sci 2013;1(1):1-5.
10. Loewenstein DA, Curiel RE, Greig MT, et al. A novel cognitive stress test for the detection of preclinical Alzheimer disease: discriminative properties and relation to amyloid load. Am J Geriatr Psychiatry 2016;24(10):804-813.
11. Loewenstein DA, Curiel RE, DeKosky, S, et al. Utilizing semantic intrusions to identify amyloid positivity in mild cognitive impairment. Neurology 2018;91(10):e976-e984
12. Curiel Cid RE, Crocco EA, Duara R, et al. A novel method of evaluating semantic intrusion errors to distinguish between amyloid positive and negative groups on the Alzheimer’s disease continuum. J Psychiatr Res 2004;124:131-136.
13. Loewenstein, D, Curiel, RE, DeKosky, S, et al. Recovery from proactive semantic interference and MRI volume: A replication and extension study. J Alzheimer’s Dis 2017a;59(1),131-139.
14. Loewenstein, DA, Curiel, RE, Wright, C, et al. Recovery from proactive semantic interference in mild cognitive impairment and normal aging: Relationship to atrophy in brain regions vulnerable to Alzheimer’s disease. J Alzheimer’s Dis 2017b;56(3):1119-1126.
15. Sánchez SM, Abulafia C, Duarte-Abritta B, et al. Failure to recover from proactive semantic interference and abnormal limbic connectivity in asymptomatic, middle-aged offspring of patients with late-onset Alzheimer’s disease. J Alzheimers Dis 2017;60(3):1183-1193.
16. Matias-Guiu JA, Cabrera-Martín MN, Curiel RE, et al. Comparison between FCSRT and LASSI-L to detect early stage Alzheimer’s disease. J Alzheimers Dis 2018;61(1):103-111.
17. Morris, JC. Clinical dementia rating: a reliable and valid diagnostic and staging measure for dementia of the Alzheimer type. Int psychogeriatr 1997;9:173-176.
18. Hogervorst, E, Combrinck, M, Lapuerta, P, et al. The Hopkins Verbal Learning Test and screening for dementia. Dement Geriatr Cogn Disord 2002;13,13–20.
19. Monsell, SE, Dodge, HH, Zhou, XH et al. Results from the NACC Uniform Data Set Neuropsychological Battery Crosswalk Study. Alzheimer Dis Assoc 2016;30,134–139.
20. Malek-Ahmadi, M, Small, BJ, & Raj, A. The diagnostic value of controlled oral word association test-FAS and category fluency in single-domain amnestic mild cognitive impairment. Dement Geriatr Cogn Disord 2011;32,235–240.
21. Reitan, RM. Validity of the Trail Making Test as an indicator of organic brain damage. Percept Mot Skills 1958;8,271-276.
22. Wechsler, D. (2014). Wechsler Adult Intelligence Scale–Fourth Edition (WAIS–IV). 2014. Psychological Corporation, Texas.
23. Petersen RC, Caracciolo B, Brayne C, et al. Mild cognitive impairment: a concept in evolution. J Intern Med 2014;275(3):214-228.
24. Wilkinson, GS, & Robertson, GJ. WRAT 4: Wide Range Achievement Test. 2006. Psychological Assessment Resources, Florida.
25. First MB, Williams JBW, Karg RS, et al. Structured Clinical Interview for DSM-5—Research Version (SCID-5 for DSM-5, Research Version; SCID-5-RV). 2015. American Psychiatric Association, Virginia.
26. Folstein, MF, Folstein, SE, & McHugh, PR. «Mini-mental state». A practical method for grading the cognitive state of patients for the clinician. J Psychiatr Res 1975;12(3):189-198.
27. Ebert PL, Anderson ND. Proactive and retroactive interference in young adults, healthy older adults, and older adults with amnestic mild cognitive impairment. J Int Neuropsychol Soc 2009;15(1):83-93.
28. Wilson, KE, Abulafia, CA, Loewenstein, DA, et al. Individual cognitive and depressive traits associated with maternal versus paternal family history of late-onset Alzheimer’s disease: proactive semantic interference versus standard neuropsychological assessments. J Pers Med Psychiatry 2018;11:1-6.
29. Ricci M, Graef S, Blundo C, et al. Using the Rey Auditory Verbal Learning Test (RAVLT) to differentiate Alzheimer’s dementia and behavioural variant fronto-temporal dementia. Clin Neuropsychol 2012;26(6):926-41.
30. Torres VL, Rosselli M, Loewenstein DA, et al. Types of errors on a semantic interference task in mild cognitive impairment and dementia. Neuropsychol 2019;33(5):670-684.



V. Bloniecki1,2, G. Hagman1,3, M. Ryden3, M. Kivipelto1,3,4,5,6


1. Department of Neurobiology, Caring Sciences and Society (NVS), Division of Clinical Geriatrics, Center for Alzheimer Research, Karolinska Institute, Stockholm, Sweden; 2. Dermato-Venereology Clinic, Karolinska University Hospital, Stockholm, Sweden; 3. Theme Aging, Karolinska University Hospital, Stockholm, Sweden;
4. Ageing and Epidemiology (AGE) Research Unit, School of Public Health, Imperial College London, London, UK; 5. Institute of Public Health and Clinical Nutrition, University of Eastern Finland, Kuopio, Finland, Kuopio, Finland; 6. Research and Development Unit, Stockholms Sjukhem, Stockholm, Sweden.

Corresponding Author: Victor Bloniecki, Karolinska Institute, Karolinska Uinversity Hospital, Eugeniavägen 3, SE-17176, Stockholm, Sweden. Tel.: +46 70-726 82 20; Email:

J Prev Alz Dis 2021;
Published online January 18, 2021,



Background: Due to an ageing demographic and rapid increase of cognitive impairment and dementia, combined with potential disease-modifying drugs and other interventions in the pipeline, there is a need for the development of accurate, accessible and efficient cognitive screening instruments, focused on early-stage detection of neurodegenerative disorders.
Objective: In this proof of concept report, we examine the validity of a newly developed digital cognitive test, the Geras Solutions Cognitive Test (GCST) and compare its accuracy against the Montreal Cognitive Assessment (MoCA).
Methods: 106 patients, referred to the memory clinic, Karolinska University Hospital, due to memory complaints were included. All patients were assessed for presence of neurodegenerative disorder in accordance with standard investigative procedures. 66% were diagnosed with subjective cognitive impairment (SCI), 25% with mild cognitive impairment (MCI) and 9% fulfilled criteria for dementia. All patients were administered both MoCA and GSCT. Descriptive statistics and specificity, sensitivity and ROC curves were established for both test.
Results: Mean score differed significantly between all diagnostic subgroups for both GSCT and MoCA (p<0.05). GSCT total test time differed significantly between all diagnostic subgroups (p<0.05). Overall, MoCA showed a sensitivity of 0.88 and specificity of 0.54 at a cut-off of <=26 while GSCT displayed 0.91 and 0.55 in sensitivity and specificity respectively at a cut-off of <=45.
Conclusion: This report suggests that GSCT is a viable cognitive screening instrument for both MCI and dementia.

Key words: Dementia, MCI, cognitive test, MoCA, e-medicine.


Dementia is currently a global driver of health care costs, and with an ageing demographic, the disease burden of neurodegenerative disorders will increase exponentially in the future. The prevalence is estimated to double every two decades, reaching approximately 80 million affected patients worldwide in 2030 (1). In 2016, the global costs associated with dementia were 948 billion US dollars and are currently projected to increase to 2 trillion US dollars by 2030, corresponding to roughly 2% of the world’s total current gross domestic product (GDP) (2, 3)..
Dementia, or major neurocognitive disorder (MCD), is an umbrella term for neurodegenerative disorders typically characterized by memory dysfunction with Alzheimer’s disease (AD) constituting approximately 60% of all cases. Other common forms of dementia include vascular dementia, Lewy-Body dementia and Frontotemporal dementia. Modern diagnostic tools, such as various imaging modalities and cerebrospinal fluid biomarkers (4, 5), have improved our diagnostic accuracy substantially. These methods have also provided key insights into the pathological mechanisms associated with neurodegenerative and contributed to the development of concepts such as mild cognitive impairment (MCI) and “preclinical AD” (6, 7). Preclinical AD is defined by the presence of cerebral amyloid or tau pathology, identified by positron emission tomography (PET) imaging or cerebrospinal fluid (CSF) biomarkers, before the onset of clinical symptoms (8).
Nevertheless, assessment of cognitive functions, the primary clinical outcome of interest, still largely relies on analogue “pen and paper” based tests administered to patients by health care providers (9). Although some regional differences exist, two of the most known and used cognitive tests include the Montreal Cognitive Assessment (MoCA) and the Mini-Mental State Examination (MMSE) (10, 11). Both tests assess various cognitive domains, with some inter-test differences, including for example; orientation, memory, concentration, executive functions, language, and visuospatial abilities (9) with scores ranging from 0 to 30 points. MoCA, as compared to MMSE which is mostly focused on memory deficits, includes assessment of more cognitive domains thus increasing its diagnostic accuracy. Although optimal cut-off points vary somewhat between different studies, a score lower than 26 on MoCA and 24 on MMSE are considered indicative of dementia (12–15). MoCA has in a previous meta-analysis shown to have a sensitivity and specificity of 0.94 and 0.60 respectively, at a cut-off of 26 points (16). This indicates a good ability to detect dementia, but at the cost of a high amount of false positives. MMSE has, in a meta-analysis, demonstrated a sensitivity of 0.85 and specificity of 0.9 (14). However, MMSE has limited value in detecting MCI and prodromal AD patients from healthy controls (17). Albeit, in the setting of cognitive screening tests a trade-off between sensitivity and specificity is necessary and screening instruments should favor sensitivity over specificity.
Given the current scientific consensus that potential future disease-modifying drugs for AD need to be administered early on in the disease continuum, there is a clear need to develop accurate and widely available cognitive screening tests in order to facilitate early diagnosis of MCI patients in the future. In the European Union, there are currently approximately 20 million individuals over the age of 55 with MCI, most of whom have not undergone screening for cognitive impairment (18). A previous study investigating the treatment and diagnostic capacity of six European countries (France, Germany, Italy, Spain, Sweden, United Kingdom) estimated that over 1 million patients would progress from MCI to AD due to capacity constraints within current health care systems if a disease modifying treatment were to be available in 2020 (18). As such, digital cognitive screening instruments are likely to be a part of the diagnostic process in the future, especially when considering the advancement of digitalized health care in multiple facets of modern medicine (19).
Cognitive assessment instruments are available in different settings including clinic based and at home testing (20, 21). Current cognitive evaluation methods include both pen-and-paper screening tools, which is the conventional method administrated by a clinical neuropsychologist, and computerized cognitive tests (20, 21). Increasing advances in technology has led clinical trials to move away from the conventional methods and adopt validated digital cognitive tools that are sensitive to capturing cognitive changes in early prevention stages (20, 22). Computerized cognitive assessment tools offer several benefits over the traditional instruments, enabling recording of accuracy and speed of response precisely, minimizing floor and ceiling effects and eliminating the examiner bias by offering a standardized format (20–22). Computerized cognitive assessments may also generate potential time and cost savings as the test can be administrated by the patient or other healthcare professionals than neuropsychologist, as long as appropriate professional will be responsible for the test interpretation and diagnosis (20, 22). Thus, unmonitored digital tools provide practical advantages of reduced need for trained professionals, self-administration, automated test scoring and reporting and ease of repeat adjustments, which enable administration for large-scale screening (22, 23). On the other hand, cognitive assessment tools are typically administrated to elderly population who might lack familiarity with digital tools, which can negatively affect their performance (22, 24). However, the attitude and perception of patients using a computerized cognitive assessment have been investigated in the elderly population, and individuals expressed a growing acceptance of using computerized cognitive assessments and rated them as understandable, easy to use and more acceptable than pen and paper tests (20, 22). They also perceived them as having the potential to improve patient care quality and the relationship between the patient and clinician when human intervention is involved (20).
Currently, there are a number of computerized screening instruments available, and they are either a digital version of the existing standardized tests or new computerized tests and batteries for cognitive function assessment (25). The pen-and-paper version of the MoCA test was recently transformed to an electronic version (eMoCA) (24). eMoCA was tested on a group of adults to compare its performance to MoCA, and most of the subjects performed comparably (24). For the detection of MCI, eMoCA (24, 25) and CogState (26) showed promising psychometric properties (25). Computer test of Inoue (27), CogState (26) and CANS-MCI (28) showed a good sensitivity in detecting AD (25). Unlike the other computerized cognitive screening tools, Geras Solutions is a comprehensive tool that provides, besides the cognitive test, a medical history questionnaire that is administrated by the patient, and a symptom survey that is administrated by the patient’s relatives. Thus, it has the potential to save more time and cost compared to the other digital assessment instruments by providing a more complete clinical evaluation.
The primary objective of this study is to investigate the accuracy and validity of a newly developed digital cognitive test (Geras Solutions Cognitive Test [GSCT]). The GSCT is a self-administered cognitive screening test provided by Geras Solutions predominantly based on MoCA. In this study, we intend to investigate the validity of GSCT, including psychometric properties, agreement with MoCA and diagnostic accuracy by establishing sensitivity, specificity, receiver operating characteristics (ROC), area under the curve values (AUC) and optimal cut-off levels, as well as compare performance with MoCA.


Materials and methods

Geras Solutions cognitive test

The GSCT, is a newly developed digital screening tool for cognitive impairment and is included in the Geras Solutions APP (GSA). Development of the screening tool was done in collaboration with the research and clinical team at Theme Aging, Karolinska University Hospital, Solna memory clinic and Karolinska Institutet. GSCT is developed on existing cognitive assessment methods (MoCA and MMSE) and includes additional proprietary tests developed at the memory clinic, Karolinska University Hospital Stockholm, Sweden. The test is suitable for digital administration through devices supporting iOS and Android.
The test is composed of 16 different items assessing various aspects of cognition, developed in order to screen for cognitive deterioration in the setting of dementia and to ensure suitability for administration via mobile devices. The GSCT is scored between 0-59 points in total and has six main subdomains including; memory (0-10 points), visuospatial abilities (0-11 points), executive functions (0-13 points), working memory (0-19 points), language (0-1 point) and orientation (0-5 points). Additionally, the time needed to complete the individual tasks is registered and presented as total test time and subdomain test time. The GSCT is automatically scored using a computer algorithm and results are presented as the total score as well as subdomain scores. A detailed description of the GSCT test items and scoring is provided in supplement 1.


The included study population consisted of 106 patients referred to the memory clinic at Karolinska University Hospital, Solna, predominantly by primary care practitioners due to memory complaints and suspicion of cognitive decline. All patients referred to the clinic between January 2019 and January 2020 were asked to participate in the study. No exclusion criteria were established a priori. If a patient fulfilled the criteria for inclusion (i.e. referred for investigation of suspect dementia at the memory clinic and provided informed consent) they were included in the study. A total of 106 patients accepted participation in the study. Five patients did not complete GSCT (two with MCI, two with subjective cognitive impatient [SCI] and one with dementia) and three patients displayed test scores with evident irregularities (one with MCI, one with SCI and one with dementia) and were excluded from the final analysis, thus leaving 98 complete cases. Irregularities included two patients whom started the test multiple times and one patient with a congenital cognitive deficiency resulting in test scores below 2 SD from the mean on both MoCA and GSCT.
All patients included in the study underwent the standard investigative procedure for dementia assessment as conducted at Karolinska University Hospital Memory Clinic. The investigative process is completed in its entirety in one week and includes; brain imaging, lumbar punctures for analysis of CSF biomarkers and neuropsychological assessment including administration of different cognitive rating scales, including MoCA. Patients received a dementia or MCI diagnosis according to the ICD-10 classification and the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-V) criteria were used as clinical support (29). If no evidence of neurodegeneration was observed patients were provided with an SCI diagnosis based on the ICD-10 classification (30). Final diagnosis was determined by specialist in geriatric medicine. In parallel, patients who accepted inclusion in the study completed the self-administered GSCT during the investigative process. GSCT was, in all cases, administered after MoCA, but not during the same day. Patients were provided a tablet and conducted GSCT alone with a health care provider adjacent if any technical difficulties would arise. The GSCT is a self-administered and all test instructions are provided by the platform. The test is intended to be performed in a home environment without any assistance. Information regarding patients GSCT scores, MoCA scores, age, gender and final diagnosis were collected for statistical analysis. All included patients provided informed consent, and the study was approved by the Regional Ethics Committee of Karolinska Institute, Stockholm, Sweden. Registration number: 2018/998-31/1.
The mean age for the whole included population (n=98) was 58 years. 5 patients were below 50 years of age (5%), 58 patients were between 50 and 60 years of age (59%), 34 patients were between 61 and 70 years of age (35%) and one patient over 70 years (1%) Altogether, 67% (n=65) of the patients were assessed without any signs of neurodegenerative disorder and diagnosed with SCI. 24% were diagnosed with MCI, and 9% received a dementia diagnosis. The dementia group consisted of 8 patients with AD and 1 patient with vascular dementia.


All statistical analyses were done using Statistica software (version 13). Baseline descriptive characteristics were calculated and are provided in Table 1. The rating scales (GSCT and MoCA) were treated as both continuous and dichotomous variables when identifying optimal cut-off levels based on sensitivity and specificity analysis. Both parametric and non-parametric tests were used for the analysis to validate findings and are reported if discrepancies were seen. Agreement between test measures were analyzed using standardized concordance correlation coefficient and analysis of Bland-Altman plot. Association between GSCT and MoCA was assessed using Pearson correlation. The internal consistency of GSCT was analyzed using Cronbach’s alpha index.
ANOVA was used to assess the differences in cognitive test scores categorized by diagnostic subgroups. Post-hoc analysis was conducted using Fisher’s Least Significant Difference (LSD) method. Logistic regression of total test scores was done in order to compare odds ratio between the tests.
Validation of GSCT total score against MoCA required the following to be established or calculated: ROC curves, the area under the curve (AUC) values with 95 % confidence intervals and sensitivity/specificity levels. Analyses were performed to estimate optimal cutoff values based on the best-compiled outcome from a range of sensitivity and specificity levels when testing the continuous scale against a dichotomous test of reference (SCI vs dementia and SCI vs MCI). Adjustment for multiple comparisons was done using the FDR-method. The presented p-values are adjusted values with the FDR-method. An adjusted p-level of <0.05 was defined as statistically significant.



Baseline data, psychometric properties and normative data

Baseline patient characteristics, including cognitive test scores, are provided in Table 1. The mean score for GSCT was 45 points in the SCI group, 36 points in the MCI group and 28 points in patients with dementia.

Table 1. Descriptive statistics

Descriptive data and test scores. Values are shown as means, standard deviations and minimum/maximum; A. p<0.05 compared to SCI. B. p<0.05 compared to MCI. C. p<0.05 compared to dementia; SCI = Subjective cognitive impairment. MCI = Mild cognitive impairment. GSCT = Geras Solutions cognitive test. MoCA = Montreal Cognitive Assessment.

Figure 1. Bland-Altman plot of standardized test scores

X-axis= mean of MoCA and GSCT. Y-axis = Difference in MoCA and GSCT

The correlation between GSCT and MoCA was (r(96) = 0.82, p <0.01). Standardized concordance correlation coefficient between GSCT and MoCA was 0.82, indicating a high level of agreement. Agreement between the GSCT and MoCA was also analyzed using a Bland-Altman plot with standardized values showing that 97 % of data points lie within ±2SD of the mean difference, see figure 1. Estimation of the internal consistency of GSCT showed a standardized Cronbach’s alpha index of α = 0.87.
Age was not significantly correlated with GSCT scores (r =-0.16, p=0.1). Diagnostic subgroup was significantly associated with age (F(2, 95) = 4,8 = 0.02), with post hoc test showing a significant difference between dementia and SCI patients (mean 63 vs 57 years, p=0.01) but not between SCI and MCI (mean 57 vs 60 years, p=0.08) or MCI and dementia patients (mean 60 vs 63 years, p=0.2). No differences in GSCT scores were observed depending on gender (t (96) =-0.3, p= 0.74) with males having a mean score of 41 points and females 40.4. Finally, both age, gender and education were included in an ANCOVA showing that education (F(1, 93) = 5.4, p= 0.03) was significantly associated with GSCT scores in contrast to age (F(1, 93) = 2.9, p = 0.1) and gender (F(1, 93) = 0.74, p = 0.4). Patients with more than 12 years of education showed higher mean test scores as compared to patients with 12 years or less (mean 42.2 vs 37.6 points, p = 0.05). GSCT total test time differed significantly depending on diagnostic subgroup (F(2, 95) = 36.4, p < 0.01) (Figure 2). Post-hoc tests showed that the differences in mean scores were significant between all three subgroups with SCI patients showing a mean test time of 1057 seconds compared to 1296 and 2065 seconds for MCI and dementia patients respectively (SCI vs MCI, p < 0.01) (SCI vs dementia, p < 0.01) (MCI vs dementia, p < 0.01).

Figure 2. Differences in GSCT test time depending on diagnosis

Mean GSCT total test time and a 95% confidence interval for patients with SCI, MCI and Dementia. p<0.05 between all subgroups.


Between-group differences in GSCT and MoCA

Average GSCT scores differed significantly depending on diagnostic subgroup (F(2, 95) = 20.3, p < 0.01). Post-hoc tests showed that the differences in mean scores were significant between all three subgroups (SCI vs MCI, p < 0.01) (SCI vs dementia, p < 0.01) (MCI vs dementia, p = 0.02).
Mean MoCA scores were also significantly different depending on diagnosis (F(2, 95) = 29.5, p < 0.01) and the mean scores were significantly different for all three subgroups (SCI vs MCI, p < 0.01) (SCI vs dementia, p < 0.01) (MCI vs dementia, p < 0.01) (Table 1).

Box plots for test scores for both GSCT and MoCA categorized by diagnosis can be seen in Figure 3. Odds ratios were calculated showing a one unit increase on the GSCT increased the odds of being healthy by 1.15 (CI 95% 1.07 – 1.22) while MoCA was associated with a 1.47 increase in odds (CI 95% 1.22-1.76).

Figure 3. Boxplots showing differences in test scores depending on diagnosis

Median GSCT and MoCA scores are represented by small squares. Larger squares represent interquartile range while whiskers show non-outlier range.


Accuracy and comparison with MoCA

GSCT showed very good to excellent discriminative properties at a wide range of cut-off values. When including all patients, thus coding both MCI and dementia patients into a binary classification of healthy/cognitively impaired, GSCT total score displayed an AUC value of 0.80 with 95% CI [0.70-0.90], whereas MoCA showed an AUC value of 0.80 with CI [0.70-0.90]. MoCA showed a sensitivity of 0.88 and specificity of 0.54 at a cut-off of <=26 while GSCT total score displayed 0.91 and 0.55 in sensitivity and specificity respectively at a cut-off of <=45. Figure 4 shows respective AUC curves and Table 2 presents the respective summary statistics.

Figure 4. Comparison of ROC curves between cognitive tests

Receiver operating characteristics curves for GSCT and MoCA in; top left SCI vs (MCI + dementia); Top right SCI vs MCI. Bottom left SCI vs Dementia.

When assessing the accuracy in discriminating between SCI and MCI patients GSCT showed an AUC value of 0.74 with 95% CI [0.62-0.85] whereas MoCA showed an AUC value of 0.74 with 95% CI [0.61-0.85]. Sensitivity and specificity at a cut-off level of <=45 was 0.88 and 0.55, respectively for GSCT total score. Whereas MoCA, at the traditional cut-off of <=26, displayed a sensitivity of 0.83 and specificity of 0.54 (Table 2). Both tests were excellent at discriminating dementia patients from SCI. GSCT showed an AUC score of 0.96 with 95% CI [0.92-0.1] while MoCA had an AUC score of 0.98 with 95% CI [0.95-0.1]. At the traditional MoCA cut-off of <= 26, sensitivity and specificity scores were 1 and 0.54, respectively whereas GSCT using a cut-off of <=35.5 showed a sensitivity of 1 and specificity of 0.9. As seen in Figure 5, both tests show good capabilities in discriminating between different diagnostic subgroups in this material, although some overlap between MCI and SCI patients existed for both tests. GSCT was marginally better at discriminating MCI from SCI patients as compared to MoCA. No patients with dementia scored within the normal range for either test.

Figure 5. Scatterplot of cognitive test scores depending on diagnosis

Scatter plot of GSCT and MoCA categorized by diagnosis. Marked lines represent cut-off points.

Table 2. Summary of accuracy for both tests

Summary statistics ROC



In this study, we present the first results on a newly developed digital cognitive test provided by Geras Solutions. GSCT displayed good agreement with MoCA based on concordance correlation analysis and Bland-Altman plot indicating that both tests measure similar cognitive domains. Additionally, normative data regarding the influence of age, gender and education was analyzed showing that education, but not age and gender, affected test scores. Individuals with more than 12 years of education had higher mean GSCT scores as compared to individuals with 12 years or less of education providing valuable information regarding scoring analysis in different demographic groups. GSCT showed equally good discriminative properties compared to the MoCA test. Both tests were excellent at discriminating dementia patients from SCI patients with a sensitivity of 1 for both GSCT and MoCA while showing a specificity of 0.9 and 0.56, respectively. This result is similar to the differential capabilities of other digital cognitive test showing sensitivity and specificity scores ranging from 0.85-1 and 0.81-1 respectively (31–33). Both tests also showed similar capabilities when discriminating between SCI and MCI patients with AUC scores of 0.74. GSCT was in this study slightly better in correctly identifying cognitive deterioration in MCI patients with a sensitivity of 0.88 compared to 0.83 for MoCA while both tests showed similar specificity of 0.55 and 0.54 receptively. The GSCT showed somewhat better sensitivity in detecting MCI patients compared to other digital screening tools, such as CogState, which previously reported sensitivity scores ranging between 0.63 and 0.84, albeit those test demonstrated higher specificity (31, 33, 34). Since GSCT is intended as a screening tool used early in the diagnostic process we believe that focus on high sensitivity is of more importance and must come at the cost of lower specificity.
Both tests demonstrated significant differences in mean test scores between all diagnostic subgroups. Additionally, the total GSCT time was also significantly different between all subgroups providing further valuable clinical information as compared to current paper and pen based cognitive screening instruments. GSCT showed very good internal consistency (α = 0.87). Based on this study, we suggest a cut-off level of <=45 for detection of MCI while values <=35.5 indicate manifest dementia.

Overall, GSCT performed at least as well as compared to currently available screening tools for dementia disorders (MoCA) while simultaneously providing several advantages. First, the test is administered via a digital device, thus eliminating the time-consuming need for testing provided by health care practitioners while also increasing the availability of cognitive screening. Given the earlier reported estimated increase in dementia prevalence combined with possible disease-modifying drugs, there is an urgent need for increased accessibility. Additionally, the digital set up of the test eliminates administration bias from health care providers and creates a more homogenous diagnostic tool. Albeit, future studies are needed to test the device in a setting without health care providers nearby. Furthermore, the possibility to register total and domain-specific test time may provide valuable clinical information potentially increasing the diagnostic capabilities, a hypothesis needing further testing in future research. Due to current trends, the development of an effective and accurate digital screening tool for cognitive impairment is of utter importance. Given a sufficiently accurate test, patients scoring in the normal range would not need to undergo further examination in the hospital setting. Instead, this digital screening instrument could identify the proper individuals in need of expanded testing e.g. MRI, CSF analysis and detailed neuropsychological testing, thus saving resources for the health care system and allocating interventions for those in need.


In this initial study we were not able to include healthy subjects. Instead, SCI patients were used as “healthy controls”. Although these patients have a self-reported presence of cognitive dysfunction, no objective findings for the presence of an ongoing neurodegenerative process could be identified. Future studies should include healthy patients without any subjective symptoms. Additionally, future larger normative studies are required to investigate how factors such as age, gender and education affect GSCT performance in order to increase validity and diagnostic accuracy. Another limitation of the test is the lack of information regarding test-retest reliability. In this preliminary trial, we were unable to obtain longitudinal data thus hindering such analysis. Future studies must include longitudinal measurements in order to determine the test-retest reliability of GSCT.
Another limitation of this study is the small sample size, especially in the MCI and dementia subgroups. These findings should be interpreted with caution and future studies, including more patients with MCI and dementia disorders, are necessary to improve the accuracy of the test. Albeit the low sample size increases the risk of type 2 errors, we found significant differences for all groups in mean GSCT scores, further supporting the robustness of the findings. Continuous collection of data from new individuals will improve test performance and provide normative information. Another limitation is the fact that patients were administered GSCT during the same week as MoCA, which could potentially generate practice effects. Furthermore, all testing in the study was conducted in Swedish and all included patients were living in close proximity to Stockholm, Sweden. Thus, there may be a potential bias in the selection of the study population and future studies should investigate whether GSCT scores are affected by regional differences as well as examine the suitability of different language versions in order to improve accessibility.



Overall, the Geras Solutions Cognitive Test performed very well with diagnostic capabilities equal to MoCA when tested on this study population.
This report suggests that GSCT could be a viable cognitive screening instrument for both MCI and dementia. Continued testing and the collection of normative data and test-retest reliability analysis is needed to improve the validity and diagnostic accuracy of the test. Additionally, future studies should explore the diagnostic value of total test time as well as item specific test time.

Funding: Theme Aging Research Unit had research collaboration with Geras Solutions during the study and a grant from Geras Solutions was provided to support conducting the study. The study was conducted independently at the memory clinic, Karolinska University Hospital, and the funding organizations had not been involved in analyses and writing. Other research support: Joint Program of Neurodegenerative Disorders, IMI, Knut and Alice Wallenberg Foundation, Center for Innovative Medicine (CIMED) Stiftelsen Stockholms sjukhem, Konung Gustaf V:s och Drottning Victorias Frimurarstiftelse, Alzheimer’s Research and Prevention Foundation, Alzheimerfonden, Region Stockholm (ALF and NSV grants). Advisory board (MK): Geras Solutions, Combinostics, Roche. GH: Advisory board: Gears Solutions. VB: Consultant for Geras Solutions.

Conflict of Interest: MK: Advisory board: Combinostics, Roche; GH: Advisory board: Gears Solutions; VB: Consultant for Geras Solutions.

Ethical Standards: The study was approved by the Regional Ethics Committee of Karolinska Institute, Stockholm, Sweden. Registration number: 2018/998-31/1.

Open Access: This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits use, duplication, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.



1. Prince M, Wimo A, Guerchet M et al (2015) World alzheimer report. The global impact of dementia. An analysis of prevalance, incidence, cost and trends. Alzheimer’s Disease International, London.
2. Wimo A, Guerchet M, Ali GC, Wu YT, Prina AM, Winblad B, et al. The worldwide costs of dementia 2015 and comparisons with 2010. Alzheimer’s Dement. 2017;13(1):1–7.
3. Xu J, Zhang Y, Qiu C, Cheng F. Global and regional economic costs of dementia: a systematic review. Lancet. 2017;390:S47.
4. Blennow K, Zetterberg H. Biomarkers for Alzheimer’s disease: current status and prospects for the future. J Intern Med. 2018;284(6):643–63.
5. Jack CR, Bennett DA, Blennow K, Carrillo MC, Dunn B, Haeberlein SB, et al. 2018 National Institute on Aging-Alzheimer’s Association (NIA-AA) Research Framework NIA-AA Research Framework: Toward a biological definition of Alzheimer’s disease. Alzheimer’s Dement. 2018;14(1):535–62.
6. Lopez OL. Mild cognitive impairment. Continuum (Minneap Minn). 2013;19(2 Dementia):411–24.
7. Sperling RA, Aisen PS, Beckett LA, Bennett DA, Craft S, Fagan AM, et al. Toward defining the preclinical stages of Alzheimer’s disease: Recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimer’s Dement. 2011;7(3):280–92.
8. Dubois B, Hampel H, Feldman HH, Scheltens P, Aisen P, Andrieu S, et al. Preclinical Alzheimer’s disease: Definition, natural history, and diagnostic criteria. Vol. 12, Alzheimer’s and Dementia. Elsevier Inc.; 2016. p. 292–323.
9. Sheehan B. Assessment scales in dementia. Ther Adv Neurol Disord. 2012;5(6):349–58.
10. Nasreddine ZS, Phillips NA, Bédirian V, Charbonneau S, Whitehead V, Collin I, et al. The Montreal Cognitive Assessment, MoCA: A Brief Screening Tool For Mild Cognitive Impairment. J Am Geriatr Soc. 2005;53(4):695–9.
11. Folstein MF, Folstein SE, McHugh PR. “Mini-mental state”. A practical method for grading the cognitive state of patients for the clinician. J Psychiatr Res. 1975;12(3):189–98.
12. Carson N, Leach L, Murphy KJ. A re-examination of Montreal Cognitive Assessment (MoCA) cutoff scores. Int J Geriatr Psychiatry. 2018;33(2):379–88.
13. Milani SA, Marsiske M, Cottler LB, Chen X, Striley CW. Optimal cutoffs for the Montreal Cognitive Assessment vary by race and ethnicity. Alzheimer’s Dement Diagnosis, Assess Dis Monit. 2018;10:773–81.
14. Creavin ST, Wisniewski S, Noel-Storr AH, Trevelyan CM, Hampton T, Rayment D, et al. Mini-Mental State Examination (MMSE) for the detection of dementia in clinically unevaluated people aged 65 and over in community and primary care populations. Cochrane Database Syst Rev. 2016;2016(1):CD011145.
15. O’Bryant SE, Humphreys JD, Smith GE, Ivnik RJ, Graff-Radford NR, Petersen RC, et al. Detecting dementia with the mini-mental state examination in highly educated individuals. Arch Neurol. 2008;65(7):963–7.
16. Davis DH, Creavin ST, Yip JL, Noel-Storr AH, Brayne C, Cullum S. Montreal Cognitive Assessment for the diagnosis of Alzheimer’s disease and other dementias. Cochrane Database Syst Rev. 2015;
17. Mitchell AJ. A meta-analysis of the accuracy of the mini-mental state examination in the detection of dementia and mild cognitive impairment. J Psychiatr Res. 2009;43(4):411–31.
18. Hlavka JP, Mattke S, Liu JL. Assessing the Preparedness of the Health Care System Infrastructure in Six European Countries for an Alzheimer’s Treatment. Rand Heal Q. 2019;8(3):2.
19. Meskó B, Drobni Z, Bényei É, Gergely B, Győrffy Z. Digital health is a cultural transformation of traditional healthcare. mHealth. 2017;3:38–38.
20. Robillard JM, Lai JA, Wu JM, Feng TL, Hayden S. Patient perspectives of the experience of a computerized cognitive assessment in a clinical setting. Alzheimer’s Dement Transl Res Clin Interv. 2018;
21. Kim H, Hsiao CP, Do EYL. Home-based computerized cognitive assessment tool for dementia screening. J Ambient Intell Smart Environ. 2012;
22. Wild K, Howieson D, Webbe F, Seelye A, Kaye J. Status of computerized cognitive testing in aging: A systematic review. Alzheimer’s and Dementia. 2008.
23. Morrison RL, Pei H, Novak G, Kaufer DI, Welsh-Bohmer KA, Ruhmel S, et al. A computerized, self-administered test of verbal episodic memory in elderly patients with mild cognitive impairment and healthy participants: A randomized, crossover, validation study. Alzheimer’s Dement Diagnosis, Assess Dis Monit. 2018;
24. Berg JL, Durant J, Léger GC, Cummings JL, Nasreddine Z, Miller JB. Comparing the Electronic and Standard Versions of the Montreal Cognitive Assessment in an Outpatient Memory Disorders Clinic: A Validation Study. J Alzheimer’s Dis. 2018;
25. De Roeck EE, De Deyn PP, Dierckx E, Engelborghs S. Brief cognitive screening instruments for early detection of Alzheimer’s disease: A systematic review. Alzheimer’s Research and Therapy. 2019.
26. Maruff P, Lim YY, Darby D, Ellis KA, Pietrzak RH, Snyder PJ, et al. Clinical utility of the cogstate brief battery in identifying cognitive impairment in mild cognitive impairment and Alzheimer’s disease. BMC Psychol. 2013;
27. Inoue M, Jinbo D, Nakamura Y, Taniguchi M, Urakami K. Development and evaluation of a computerized test battery for Alzheimer’s disease screening in community-based settings. Am J Alzheimers Dis Other Demen. 2009;
28. Memória CM, Yassuda MS, Nakano EY, Forlenza O V. Contributions of the Computer-Administered Neuropsychological Screen for Mild Cognitive Impairment (CANS-MCI) for the diagnosis of MCI in Brazil. Int Psychogeriatrics. 2014;
29. Diagnostic and statistical manual of mental disorders : DSM-5 [Internet]. Fifth edition. Arlington, VA : American Psychiatric Publishing, [2013] ©2013;
30. The ICD-10 Classification of Mental and Behavioural Disorders Clinical descriptions and diagnostic guidelines World Health Organization.
31. Scharre DW, Chang SI, Nagaraja HN, Vrettos NE, Bornstein RA. Digitally translated Self-Administered Gerocognitive Examination (eSAGE): Relationship with its validated paper version, neuropsychological evaluations, and clinical assessments. Alzheimer’s Res Ther. 2017;9(1).
32. Onoda K, Yamaguchi S. Revision of the cognitive assessment for dementia, iPad version (CADi2). PLoS One. 2014;9(10).
33. Possin KL, Moskowitz T, Erlhoff SJ, Rogers KM, Johnson ET, Steele NZR, et al. The Brain Health Assessment for Detecting and Diagnosing Neurocognitive Disorders. J Am Geriatr Soc. 2018;66(1):150–6.
34. de Jager CA, Schrijnemaekers ACMC, Honey TEM, Budge MM. Detection of MCI in the clinic: Evaluation of the sensitivity and specificity of a computerised test battery, the Hopkins Verbal Learning Test and the MMSE. Age Ageing. 2009;38(4):455–60.


R.E. Curiel Cid1, E.A. Crocco1, M. Kitaigorodsky1, L. Beaufils2, P.A. Peña2, G. Grau1, U. Visser2, D.A. Loewenstein1

1. Center for Cognitive Neuroscience and Aging, Department of Psychiatry and Behavioral Sciences, University of Miami Miller School of Medicine, 1695 NW 9th Avenue, Miami, Florida, 33136. U.S.A; 2. Department of Computer Science, University of Miami, 1365 Memorial Drive, Coral Gables, Florida 33146, U.S.A.

Corresponding Author: Rosie E. Curiel, Psy.D., Associate Professor of Psychiatry & Behavioral Sciences, University of Miami Miller School of Medicine, 1695 NW 9th Avenue, Suite 3202, Miami, FL 33136.

J Prev Alz Dis 2021;
Published online January 19, 2021,



BACKGROUND: The Loewenstein Acevedo Scales of Semantic Interference and Learning (LASSI-L) is a novel and increasingly employed instrument that has outperformed widely used cognitive measures as an early correlate of elevated brain amyloid and neurodegeneration in prodromal Alzheimer’s Disease (AD). The LASSI-L has distinguished those with amnestic mild cognitive impairment (aMCI) and high amyloid load from aMCI attributable to other non-AD conditions. The authors designed and implemented a web-based brief computerized version of the instrument, the LASSI-BC, to improve standardized administration, facilitate scoring accuracy, real-time data entry, and increase the accessibility of the measure.
Objective: The psychometric properties and clinical utility of the brief computerized version of the LASSI-L was evaluated, together with its ability to differentiate older adults who are cognitively normal (CN) from those with amnestic Mild Cognitive Impairment (aMCI).
Methods: After undergoing a comprehensive uniform clinical and neuropsychological evaluation using traditional measures, older adults were classified as cognitively normal or diagnosed with aMCI. All participants were administered the LASSI-BC, a computerized version of the LASSI-L. Test-retest and discriminant validity was assessed for each LASSI-BC subscale.
Results: LASSI-BC subscales demonstrated high test-retest reliability, and discriminant validity was attained.
Conclusions: The LASSI-BC, a brief computerized version of the LASSI-L is a valid and useful cognitive tool for the detection of aMCI among older adults.

Key words: Computerized test, mild cognitive impairment, Alzheimer’s disease, semantic intrusion errors, semantic interference, clinical trials.



Alzheimer’s disease (AD) is a devastating condition that is expected to significantly impact the rapidly aging population. Important advancements have been made to identify novel candidate biomarkers of AD, and a research framework to stage the disease from its preclinical stage onward has been proposed, with the aim of establishing a biological definition of the disease (1). Despite these formidable advances, neuropsychological assessment remains an essential component of the evaluative process because cognitive impairment is a fundamental defining symptom of AD that emerges early, at a certain point in the transition from the preclinical to clinically symptomatic stages of the disease. Further, cognitive changes are used to detect and track disease progression over time and a measurable change in cognitive ability represents a potentially meaningful clinical outcome (2). Thus, the identification of cognitive markers that are sensitive to detecting early disease states and converge with biological markers of AD pathology, have become increasingly necessary in terms of identifying individuals at risk, monitoring disease progression, and ascertaining treatment efficacy (3).
Traditional paper-and-pencil cognitive measures employed for the detection of AD-related Mild Cognitive Impairment (MCI) are often insensitive to detecting subtle cognitive changes that occur during preclinical or prodromal disease states (5, 6). There is a developing body of literature, however, that cognitive stress paradigms can measure subtle deficiencies that are highly implicated in early AD disease states among older adults. One such paradigm that measures semantic interference in memory, the Loewenstein-Acevedo Scales for Semantic Interference and Learning (LASSI-L), was sensitive enough to differentiate older adults who are cognitively unimpaired from those with subjective memory complaints, and early amnestic MCI (7, 8). On this memory measure, proactive semantic interference (PSI) deficits and particularly, the inability to recover from PSI (frPSI) was also highly associated with brain amyloid load in older adults with otherwise normal performance on a traditional battery of cognitive tests (9). The LASSI-L has outperformed other widely used memory measures in detecting prodromal AD in both English and Spanish (10, 11), and has been found to be useful in different cultural/language groups (7, 11, 12). In addition to measuring the total number of correct targets recalled on individual LASSI-L subscales, there is evidence that semantic intrusion errors may have specific utility in the assessment of prodromal AD. Loewenstein and colleagues (4) found that semantic intrusion errors sensitive to PSI and frPSI on the LASSI-L could differentiate amyloid positive aMCI groups from amyloid negative aMCI groups with non-AD diagnoses.
While it is recognized that intrusion errors represent early manifestations of neurodegenerative brain disease, a potential limitation of previous approaches is that the number of intrusion errors are often highly dependent on an individual’s total responses on a particular trial. Thus, even a seemingly modest number of intrusion errors may actually represent an at-risk cognitive profile, depending on the total number of responses that are correct. For example, an individual may make a minimal number of intrusion errors on a given trial, which may appear to be clinically insignificant. However, if the number of total responses is low, even a modest number of intrusion errors may indicate impaired inhibitory processes and underlying brain pathology. As a result, we recently developed a novel method to evaluate semantic intrusion errors utilizing the percentage of intrusion errors (PIE) in relation to total correct responses (13). This method takes into account the observation that the number of intrusion errors a person makes is often highly dependent on their total responses on a particular trial. Thus, even a seemingly modest number of intrusion errors may represent an at-risk cognitive profile. PIE demonstrated high levels of sensitivity and specificity in differentiating CN from amyloid positive persons with preclinical AD and preliminary work suggests that it is a novel and sensitive index of early memory dysfunction (11, 13).
Traditional paper and pencil neuropsychological assessments are lengthy, require a skilled examiner, are vulnerable to human error in administration and scoring, and associated with practice effects. Moreover, some of these measures have been found to be biased among diverse ethnic/cultural and language groups. To address some of these concerns, computerized testing batteries have been developed to explore a more suitable option to mitigate some of the abovementioned limitations (14-17). However, these too have limitations in early detection of AD-associated cognitive impairment. For example, many of these computerized batteries are relatively successful at distinguishing between older adults with normal cognition and those with dementia or late stage MCI, but lack the predictive power needed to move the field forward, which is to correctly classify individuals with MCI and/or earlier on the disease continuum, and do so in a manner that is validated for use among different ethnic/cultural and language groups. This highlights a major problem with many traditional computerized batteries; they are automated versions of traditional paper-and-pencil cognitive assessment paradigms that lack sensitivity to detect AD-associated cognitive decline, and employ the same paradigms originally developed for the assessment of dementia or traumatic brain injury (17).
Recent work by Curiel and associates (5-12) led to the development of a brief computerized version of the LASSI-L, the LASSI-BC, which incorporates all the elements of this well-established cognitive stress test. The LASSI-BC is currently being studied extensively in a longitudinal study of at-risk aging adults. This novel computerized version of the instrument does not require a skilled examiner, is web-based and can remotely run on most browser-capable devices. Moreover, it is intuitive and appropriate for use among older adults that are either predominantly English or Spanish-speaking and who have varying ethnic/cultural backgrounds including Hispanics and African Americans.
In this first validation study, we examine the psychometric properties of the LASSI-BC. We also evaluate the clinical utility several LASSI-BC subscales as it relates to their ability to differentiate older adults with normal cognition from those with aMCI on measures of: i) proactive semantic interference, ii) the failure to recover from proactive semantic interference, iii) retroactive semantic interference and iv) the percentage of intrusion errors in relation to total cued recall responses by the participant. Performance on these specific subscales were selected a priori because, as noted above, our previous work using the paper-and-pencil LASSI-L has robustly demonstrated that these particular subscales are the most sensitive to cognitive breakdowns associated with MCI due to preclinical and prodromal AD.



This study included 64 older adults that were evaluated as part of an IRB-approved longitudinal investigation funded by the National Institute on Aging. An experienced clinician administered a standard clinical assessment protocol, which included the Clinical Dementia Rating Scale (CDR) (18), and the Mini-Mental State Examination (MMSE) (19). Subsequently, a neuropsychological battery was independently administered in either Spanish or English dependent on the participant’s dominant and preferred language. Spanish language evaluations were completed with equivalent standardized neuropsychological tests and appropriate age, education, and cultural/language normative data (20-23). Proficient bilingual (Spanish/English) psychometricians performed all the testing.
Diagnostic groups were classified using the following criteria:

Amnestic MCI group (aMCI) (n=25)

Participants met Petersen’s criteria (24)) for MCI and evidenced all of the following: a) subjective cognitive complaints by the participant and/or collateral informant; b) evidence by clinical evaluation or history of memory or other cognitive decline; c) Global Clinical Dementia Rating scale of 0.5 (18); d) below expected performance on delayed recall of the HVLT-R (23) or delayed paragraph recall from the National Alzheimer’s Coordinating Center -Unified Data Set (NACC-UDS) (25) as measured by a score that is 1.5 SD or more below the mean using age, education, and language-related norms.

Cognitively Normal Group (n=39)

Participants were classified as cognitively normal if all of the following criteria were met: a) no subjective cognitive complaints made by the participant and a collateral informant; b) no evidence by clinical evaluation or history of memory or other cognitive decline after an extensive interview with the participant and an informant; c) Global CDR score of 0; d) performance on all traditional neuropsychological tests (e.g.: Category Fluency (26), Trails A and B (27), WAIS-IV Block Design subtest (28) was not more than 1.0 SD below normal limits for age, education, and language group.

Loewenstein-Acevedo Scales for Semantic Interference and Learning, Brief Computerized Version (LASSI-BC)

The LASSI-BC was not used for diagnostic determination in this study. This computerized cognitive stress test is a novel computerized measure that is briefer than the paper-and-pencil LASSI-L, taking approximately 10 to 12 minutes to complete. The LASSI-BC contains the elements of the original LASSI-L which demonstrated the greatest differentiation between aMCI, PreMCI and CN older adults in previous studies. For example, free recall preceding the cued recall trials of the LASSI-L added time to the administration but was never as effective as cued recall in distinguishing among diagnostic groups. Developed in collaboration with the University of Miami Department of Computer Science, the LASSI-BC is a remotely accessible test available in both English and Spanish. As a web application, it can be run on devices that can run Google Chrome, including desktop computers, laptops, tablets, or even smartphones. While the LASSI-BC is a fully self-administered test with all verbal responses recorded and scored by the computer, for the purposes of this validation study, a trained study team member was present for each administration to systematically record responses, which provided a double check on the accuracy of data. The LASSI-BC utilizes Google Cloud Speech API , which has been successfully implemented for use with older adults. The test leverages Google Cloud’s Speech to Text software in conjunction with a backup lexicon for understanding the participants’ spoken words. The lexicon is designed to account for variations in participant’s pronunciation by allowing for words that the computer “mishears” to serve as alternatives to the actual word being spoken. Lexicons were chosen based on observations from participants during the test.
Upon initiating the examination, the participant is instructed in both audio and visual formats. They will see 15 words belonging to one of three semantic categories: fruits, musical instruments, or articles of clothing (five words per category). The words are then individually presented on the screen and audio for a 6-second interval. This presentation facilitates optimal encoding and storage of the to-be-remembered information. Further, this instruction style has been easily understood and accepted by older adults during pilot studies in the course of developing the LASSI-BC. After the computer presents all 15 words, participants are presented with each category cue (e.g., fruits) and asked to recall the words that belonged to that category. Participants are then presented with the same target stimuli for a second learning trial with subsequent cued recall to strengthen the acquisition and recall of the List A targets. The exposure to the semantically related list (i.e., List B) is then conducted in the same manner as exposure to List A. List B consists of 15 words different from List A, all of which belong to each of the three categories used in List A (i.e., fruits, musical instruments, and articles of clothing). Following the presentation of the List B words, the person is asked to recall each of the List B words that belonged to each of the categories. List B words are presented again, followed by a second category-cued recall trial. Finally, to assess retroactive semantic interference, participants are asked to free recall the original List A words. Primary measures used in this study are the second cued recall score for List A (maximum learning), first cued recall score for List B (susceptibility to proactive semantic interference), second cued recall of List B (failure to recover from proactive semantic interference), and the third cued recall of List A (retroactive semantic interference). In addition, we evaluated the novel ratio used with the LASSI-L, that takes into account the percentage of intrusion errors (PIE) as a function of total responses on subscales that measure proactive semantic interference and the failure to recover from proactive semantic interference. Specifically, the ratio is denoted as follows: Total Intrusion Errors/ (Total Intrusion Errors + Total Correct Responses) for LASSI-BC Cued B1 (a measure of susceptibility to proactive semantic interference) and LASSI-BC Cued B2 recall (a measure of recovery from proactive semantic interference).



The computerized version of the LASSI-BC had psychometric properties that compared favorably to the test-retest reliabilities obtained on the original paper-and-pencil LASSI-L (7). As depicted in Table 1, CN (n=39) and aMCI (n=25) groups did not differ in terms of age, sex, or language of evaluation. Individuals diagnosed as aMCI, although well educated (Mean =14.26; SD=3.5), had less educational attainment relative to their cognitively normal counterparts (Mean =16.32; SD=2.3). As expected, aMCI participants also had lower mean MMSE scores (Mean =26.04; SD=2.3).

Table 1. Demographic Characteristics and Computerized LASSI-BC Scores among Participants who are Cognitively Normal and with Amnestic Mild Cognitive Impairment


Test-retest reliability

Test-retest reliability data was obtained on a subset of 15 older adults diagnosed with aMCI using Petersen’s criteria (24) for each of the LASSI-BC subscales. The mean age was 73.4 (SD=6.3); education 15.4 (SD=3.6); and the mean MMSE score for this group was 26.6 (SD=2.2). These individuals (60% primary English-speakers and 60% female) were administered the LASSI-BC on two occasions, within a 4 to 39-week interval (Mean =13.9.; SD=10.6 weeks). In our pilot work, we found robust test-retest correlations ranging from 0.55 to 0.721 on the subscales that have shown to be the most sensitive measures of cognitive decline in the original paper-and-pencil version. In this study, test-retest comparisons were conducted for Cued Recall A2 (measures maximum learning), Cued Recall B1 (measures proactive semantic interference), and Cued Recall B2 (measures the failure to recover from proactive semantic interference). One-tailed Pearson Product Moment Correlation Coefficients were obtained given the directional hypotheses concerning test-retest relationships. High, statistically significant test-retest reliabilities were obtained for Cued A2 Recall (r=.726; p<.001); Cued Recall B1 (r=.529; p=0.021); Cued Recall B2 (r=.555; p=0.016).

Discriminant validity

As depicted in Table 1, LASSI-BC scales sensitive to maximum learning (Cued A2), vulnerability to proactive semantic interference (Cued B1) and the failure to recover from proactive semantic interference (Cued B2) were statistically significant in discriminating between older adults with amnestic MCI and cognitively normal counterparts. These results were identical when demographic variables such as education were entered in the model as covariates
We then calculated areas under the Receiver Operating Characteristic (ROC) curve for LASSI-BC correct responses as well as the PIE indices for Cued B1 and Cued B2 subscales. We selected these measures a priori given that performance on these specific subscales have traditionally been the most discriminant measures on the paper-and-pencil form of the LASSI-L.
As shown in Table 2, an optimal cut-point of 5 by Youden’s criteria on correct responses for Cued Recall B1, yielded a sensitivity of 84.6% and a specificity of 86.8%. An optimal cut-point of 9 by Youden’s criteria on correct responses provided on Cued Recall B2, yielded statistically significant areas under the ROC curve of .868 (SE=0.88) and .824 (SE=.051), respectively.

Table 2. Classification of aMCI versus Cognitively Normal Participants on the LASSI-BC


We subsequently examined an optimal cut-point for PIE on the Cued Recall B1 and the Cued Recall B2 subscales. For PIE on Cued Recall B1, the area under the ROC was .879 (SE=.06) with a sensitivity of 92.9% and specificity of 80%, respectively using an optimal cut-point of .2540. For PIE on Cued Recall B2, the area under the ROC was .801 (SE=.07), using an optimal cut-point of .2159, which yielded a sensitivity of 78.6% and specificity of 68.0%. We selected these specific subscales because they have shown to be the strongest predictors of aMCI in the paper-and-pencil form of the LASSI-L.
We subsequently entered the statistically significant LASSI-BC subscales (Cued Recall B1 and Cued Recall B2) into a stepwise logistic regression. As seen in Table 3, the first variable to enter the logistic regression model was PIE on Cued B1 [B=6.86 (SE=1.67) Wald=17.07, p<0.001)]. On the second step of the logistic regression model, correct responses on Cued Recall B2 entered the model [B=-.34 (SE=.128), Wald= 17.1 (p=.008)]. Combining PIE Cued Recall B1 and correct responses on Cued Recall B2, yielded an overall sensitivity of 80% and specificity of 89.7%. It should be noted logistic regression weighs overall classification in a manner that favors the largest diagnostic group (in this case CN participants). Nonetheless, ROC and stepwise regression models yielded similar results indicating excellent discriminative ability.
In sum, our findings support that the LASSI-BC has equal or better psychometric properties than the original paper-and-pencil LASSI-L and demonstrates that computerized administration is both feasible, well accepted, and has excellent discriminant properties.

Table 3. Step-wise Logistic Regression Using Proactive Semantic Interference Measures on the Computerized LASSI-BC



The present study was designed to examine the psychometric properties of the LASSI-BC, the brief computerized version of the LASSI-L, a cognitive stress test that utilizes a novel cognitive assessment paradigm based on semantic interference in memory. In studies conducted in the United States and abroad, the LASSI-L has shown great utility in detecting cognitive changes among individuals during the preclinical and prodromal stages of AD (4, 29) and has been found to be appropriate for use among diverse ethnic/cultural and language groups (11, 30, 12). The paradigm that this measure employs is unique in that it explicitly and from the outset organizes the examinee’s learning around specific semantic categories, which promotes active encoding, reduces the use of individualized learning strategies that can help or hinder performance, increases depth of initial learning, and is designed to tap an individual’s vulnerability to semantic interference.
The current investigation examined all salient subscales of the LASSI-BC, which were selected based on previous work with the paper-and-pencil versions. The computerized version evidenced good test-retest reliability for participants diagnosed with aMCI. Scores on all LASSI-BC subscales were higher for cognitively normal older adults, relative to aMCI participants. In addition, high levels of discriminant validity were obtained in differentiating aMCI from cognitively normal groups based on ROC analyses and logistic regression.
A potential limitation of this first validation study is that we employed modest numbers of participants who were tested in either English or Spanish on the LASSI-BC. Although, our overall findings were highly significant and the paper-and-pencil LASSI-L has been validated in different languages (i.e.- Spanish speakers in Argentina, Spanish speakers in Spain, Spanish speakers from Mexico, etc.) and with different ethnic/cultural groups (European Americans, Hispanics and African Americans), such future comparisons should be made with the LASSI-BC. Further, additional studies with the LASSI-BC will include evaluating the diagnostic utility of this computerized cognitive stress test to differentiate older adults earlier on the preclinical continuum of AD, and relate performance to biomarkers of AD pathology, as well as compare it to other traditional and widely used cognitive measures in the field.
There has been an increase in the number of computerized tests developed including the CogState (31) and the Cognition Battery from the NIH Toolbox (16), but limitations exist. For example, one of the most widely-used computerized cognitive batteries for the assessment of MCI is the CogState. As part of the Mayo Clinic Study on Aging, Mielke and associates (32) administered the CogState to eighty-six participants diagnosed with MCI who were found to have worse performance than cognitively healthy individuals; however, it is likely that individuals classified as MCI ranged from early states of MCI to late MCI, the latter of which is more cognitively similar to early dementia in terms of neuropsychological test performance, limiting evidence that this measure in sensitive to preclinical cognitive change. Further, the authors noted that their results are not generalizable to other ethnicities due to the demographic makeup of the region (Minnesota, USA). Another study conducted by Mielke and colleagues (33) aimed to examine performance on the CogState with neuroimaging biomarkers (MRI, FDG PET, and amyloid PET) among cognitively normal participants aged 51-71; however, only weak associations were found between CogState subtests and biomarkers of neurodegeneration.
With the rapidly aging population, early detection of cognitive decline in individuals at risk for AD and related disorders has become a global priority. Accurately identifying at risk individuals through the detection and monitoring of subtle, albeit sensitive cognitive changes that transpire early in the disease course is an important initiative and computerized cognitive outcome measures have the potential to greatly reduce burden for participants, clinical researchers and clinicians.
The development of computerized cognitive tests for older adults has significantly increased during the past decade. In fact, available systematic reviews have identified more than a dozen computerized measures designed to detect dementia or MCI (34, 35, 36). Moreover, the use of computerized assessments with older adults has been found to be feasible and reliable (37, 38). A recent meta-analysis has shown relatively good diagnostic accuracy, and authors further concluded that their performance distinguishing individuals with MCI and dementia is comparable with traditional paper-pencil neuropsychological measures (35). It is anticipated that as technology advances, clinical trials will include validated computerized testing to sensitively capture cognitive performance, particularly in large-scale secondary prevention efforts (39). The impact of this technological advancement in computerized, web-based cognitive testing has the potential to facilitate remote deliverability, allow for real-time data entry, improves standardization, and reduces administration and scoring errors. Moreover, computerized assessment can more readily monitor longitudinal cognitive changes for each individual, facilitating a precision-based approach. It is critical; however, that emerging cognitive tests move beyond simply computerizing outdated, insensitive cognitive paradigms and instead invest in the development and validation of cognitive paradigms that are sensitive and specific to early cognitive breakdowns that occur during the preclinical stages of AD. These too should exhibit sensitivity to biomarkers of AD (e.g., amyloid load, tau deposition, and neurodegeneration in AD-prone regions). Doing so may address some of the most critical challenges facing clinical trials including proper selection of at-risk participants, and monitoring meaningful cognitive change over time.

Funding: This research was funded by the National Institute of Aging Grant 1 R01 AG047649-01A1 (David Loewenstein, PI), 1 R01 AG047649-01A1 (Rosie Curiel Cid, PI) 5 P50 AG047726602 1Florida Alzheimer’s Disease Research Center (Todd Golde, PI), 8AZ. The sponsors had no role in the design and conduct of the study; in the collection analysis, and interpretation of data; in the preparation of the manuscript; or in the review or approval of the manuscript.

Ethical standard: This research study was conducted in alignment with the Declaration of Helsinki and through the approval of the University of Miami Institutional Review Board.
Conflict of interest: Drs. Curiel and Loewenstein have intellectual property used in this study.


1. Jack Jr CR, Bennett DA, Blennow K, et al. NIA-AA research framework: toward a biological definition of Alzheimer’s disease. Alzheimer’s & Dementia. 2018 Apr;14(4):535-62.
2. Harvey PD, Cosentino S, Curiel R, et al. Performance-based and observational assessments in clinical trials across the Alzheimer’s disease spectrum. Innovations in clinical neuroscience. 2017 Jan; 14(1-2):30.
3. Edmonds EC, Delano-Wood L, Galasko DR, et al. Subtle cognitive decline and biomarker staging in preclinical Alzheimer’s disease. Journal of Alzheimer’s disease. 2015 Jan 1;47(1):231-42.
4. Loewenstein DA, Curiel RE, Duara R, et al. Novel cognitive paradigms for the detection of memory impairment in preclinical Alzheimer’s disease. Assessment. 2018 Apr25;(3):348-59.
5. Brooks L, Loewenstein D. Assessing the progression of mild cognitive impairment to Alzheimer’s disease: current trends and future directions. Alzheimer’s Research & Therapy. 2010;2(28):28-28.
6. Crocco E, Curiel RE, Acevedo A, et al. An evaluation of deficits in semantic cueing and proactive and retroactive interference as early features of Alzheimer’s disease. The American Journal of Geriatric Psychiatry. 2014 Sep 1;22(9):889-97.
7. Curiel RE, Crocco E, Acevedo A, et al. A new scale for the evaluation of proactive and retroactive interference in mild cognitive impairment and early Alzheimer’s disease. Aging. 2013;1(1):1000102.
8. Loewenstein DA, Curiel RE, Greig MT, et al. A novel cognitive stress test for the detection of preclinical Alzheimer disease: discriminative properties and relation to amyloid load. The American Journal of Geriatric Psychiatry. 2016 Oct 1;24(10):804-13.
9. Matías-Guiu JA, Curiel RE, Rognoni T, Valles-Salgado M, Fernández-Matarrubia M, Hariramani R, Fernández-Castro A, Moreno-Ramos T, Loewenstein DA, Matías-Guiu J. Validation of the Spanish version of the LASSI-L for diagnosing mild cognitive impairment and Alzheimer’s disease. Journal of Alzheimer’s Disease. 2017 Jan 1;56(2):733-42.
10. Rosselli M, Loewenstein DA, Curiel RE, et al. Effects of bilingualism on verbal and nonverbal memory measures in mild cognitive impairment. Journal of the International Neuropsychological Society. 2019 Jan;25(1):15-28.
11. Capp KE, Curiel Cid, RE, Crocco EA, et al. Semantic Intrusion Error Ratio Distinguishes Between Cognitively Impaired and Cognitively Intact African American Older Adults. Journal of Alzheimer’s Disease. 2019 Dec 23(Preprint):1-6.
12. Matias-Guiu JA, Cabrera-Martín MN, Curiel RE, et al. Comparison between FCSRT and LASSI-L to detect early stage Alzheimer’s disease. Journal of Alzheimer’s Disease. 2018 Jan 1;61(1):103-11.
13. Crocco, E, Curiel RE, Grau, G. Percentage of intrusion errors predicts patterns of cognitive change in older adults. (Under Review). Journal of Alzheimer’s Disease.
14. Beaumont JL, Havlik R, Cook KF, et al. Norming plans for the NIH Toolbox. Neurology. 2013;80(11 Suppl 3):S87–S92.
15. Saxton J, Morrow L, Eschman A, et al. Computer assessment of mild cognitive impairment. Postgraduate medicine. 2009 Mar 1;121(2):177-85.
16. Weintraub S, Dikmen SS, Heaton RK, et al. Cognition assessment using the NIH Toolbox. Neurology. 2013 Mar 12;80(11 Supplement 3):S54-64.
17. Parsons TD, Courtney CG, Arizmendi BJ, et al. Virtual reality stroop task for neurocognitive assessment. InMMVR 2011 Feb 16 (pp. 433-439).
17. Morris JC. Clinical dementia rating: a reliable and valid diagnostic and staging measure for dementia of the Alzheimer type. International psychogeriatrics. 1997 Dec;9(S1):173-6.
18. Folstein MF, Folstein SE, McHugh PR. “Mini-mental state”: a practical method for grading the cognitive state of patients for the clinician. Journal of psychiatric research. 1975 Nov 1;12(3):189-98.
19. Arango-Lasprilla JC, Rivera D, Aguayo A, et al. Trail making test: Normative data for the Latin American Spanish speaking adult population. NeuroRehabilitation. 2015 Jan 1;37(4):639-61.
20. Arango-Lasprilla JC, Rivera D, Garza MT, et al. Hopkins verbal learning test–revised: Normative data for the Latin American Spanish speaking adult population. NeuroRehabilitation. 2015 Jan 1;37(4):699-718.
21. Benson G, de Felipe J, Sano M. Performance of Spanish-speaking community-dwelling elders in the United States on the Uniform Data Set. Alzheimer’s & Dementia. 2014 Oct;10:S338-43.
22. Peña-Casanova J, Quinones-Ubeda S, Gramunt-Fombuena N,et al. Spanish Multicenter Normative Studies (NEURONORMA Project): norms for verbal fluency tests. Archives of Clinical Neuropsychology. 2009 Jun 1;24(4):395-411.
23. Brandt J. The Hopkins Verbal Learning Test: Development of a new memory test with six equivalent forms. The Clinical Neuropsychologist. 1991 Apr 1;5(2):125-42.
24. Petersen RC. Mild cognitive impairment as a diagnostic entity. Journal of internal medicine. 2004 Sep;256(3):183-94.
25. Beekly DL, Ramos EM, Lee WW, Deitrich WD, Jacka ME, Wu J, Hubbard JL, Koepsell TD, Morris JC, Kukull WA. The National Alzheimer’s Coordinating Center (NACC) database: the uniform data set. Alzheimer Disease & Associated Disorders. 2007 Jul 1;21(3):249-58.
26. Binetti G, Magni E, Cappa SF, et al. Semantic memory in Alzheimer’s disease: an analysis of category fluency. Journal of Clinical and Experimental Neuropsychology. 1995 Feb 1;17(1):82-9.
27. Reitan RM. Validity of the Trail Making Test as an indicator of organic brain damage. Perceptual and motor skills. 1958 Dec;8(3):271-6.
28. Wechsler D. Wechsler adult intelligence scale–Fourth Edition (WAIS–IV). San Antonio, TX: NCS Pearson. 2008;22(498):816-27.
29. Crocco EA, Loewenstein DA, Curiel RE, et al. A novel cognitive assessment paradigm to detect Pre-mild cognitive impairment (PreMCI) and the relationship to biological markers of Alzheimer’s disease. Journal of psychiatric research. 2018 Jan 1;96:33-8.
30. Curiel Cid RE, Loewenstein DA, Rosselli M, et al. A cognitive stress test for prodromal Alzheimer’s disease: Multiethnic generalizability. Alzheimer’s & Dementia: Diagnosis, Assessment & Disease Monitoring. 2019 Dec;11(C):550-9.
31. Darby D, Collie A, McStephen M, Maruff P. Reliable detection of asymptomatic longitudinal cognitive decline in healthy community dwelling volunteers. Journal of the American Geriatrics Society. 2004 Apr;52.
32. Mielke MM, Machulda MM, Hagen CE, Edwards KK, Roberts RO, Pankratz VS, Knopman DS, Jack Jr CR, Petersen RC. Performance of the CogState computerized battery in the Mayo Clinic Study on Aging. Alzheimer’s & Dementia. 2015 Nov 1;11(11):1367-76.
33. Mielke MM, Weigand SD, Wiste HJ, Vemuri P, Machulda MM, Knopman DS, Lowe V, Roberts RO, Kantarci K, Rocca WA, Jack Jr CR. Independent comparison of CogState computerized testing and a standard cognitive battery with neuroimaging. Alzheimer’s & Dementia. 2014 Nov 1;10(6):779-89.
34. Zygouris S, Tsolaki M. Computerized cognitive testing for older adults: a review. American Journal of Alzheimer’s Disease & Other Dementias®. 2015 Feb;30(1):13-28.
35. Chan JY, Kwong JS, Wong A, Kwok TC, Tsoi KK. Comparison of computerized and paper-and-pencil memory tests in detection of mild cognitive impairment and dementia: A systematic review and meta-analysis of diagnostic studies. Journal of the American Medical Directors Association. 2018 Sep 1;19(9):748-56.
36. De Roeck EE, De Deyn PP, Dierckx E, Engelborghs S. Brief cognitive screening instruments for early detection of Alzheimer’s disease: a systematic review. Alzheimer’s research & therapy. 2019 Dec 1;11(1):21.
37. Wild K, Howieson D, Webbe F, Seelye A, Kaye J. Status of computerized cognitive testing in aging: a systematic review. Alzheimer’s & Dementia. 2008 Nov 1;4(6):428-37.
38. Pankratz VS, Roberts RO, Mielke MM, Knopman DS, Jack CR, Geda YE, Rocca WA, Petersen RC. Predicting the risk of mild cognitive impairment in the Mayo Clinic Study of Aging. Neurology. 2015 Apr 7;84(14):1433-42.
39. Buckley RF, Sparks KP, Papp KV, Dekhtyar M, Martin C, Burnham S, Sperling RA, Rentz DM. Computerized cognitive testing for use in clinical trials: a comparison of the NIH Toolbox and Cogstate C3 batteries. The journal of prevention of Alzheimer’s disease. 2017;4(1):3.


X. Fu1,*, W. Yu2,*, M. Ke2, X. Wang1, J. Zhang1, T. Luo1, P.J. Massman3,4, R.S. Doody3, Y. Lü1,*

1. Department of Geriatrics, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, China; 2. Institute of Neuroscience, Chongqing Medical University, Chongqing 400016, China; 3. Department of Neurology, Baylor College of Medicine, Houston, TX USA at the time this work was done. Now Genentech/Roche, Basel, Switzerland; 4. Department of Psychology, University of Houston, Houston, TX USA; *Authors contributed equally and are co-first authors of the study.

Corresponding Authors: Prof. Yang Lü, 1 Youyi Road, Yuzhong District, Chongqing 400016, China, Tel: +86-23-89011622, Fax: +86-23-68811487, E-mail:

J Prev Alz Dis 2020;
Published online December 21, 2020,



BACKGROUND: A specialized instrument for assessing the cognition of patients with severe Alzheimer’s disease (AD) is needed in China.
Objectives: To validate the Chinese version of the Baylor Profound Mental Status Examination (BPMSE-Ch).
Design: The BPMSE is a simplified scale which has proved to be a reliable and valid tool for evaluating patients with moderate to severe AD, it is worthwhile to extend the use of it to Chinese patients with AD.
Setting: Patients were assessed from the Memory Clinic Outpatient.
Participants: All participants were diagnosed as having probable AD by assessment.
Measurements: The BPMSE was translated into Chinese and back translated. The BPMSE-Ch was administered to 102 AD patients with a Mini-Mental State Examination (MMSE) score below 17. We assessed the internal consistency, reliability, and construct validity between the BPMSE-Ch and MMSE, Severe Impairment Battery (SIB), Global Deterioration Scale (GDS-1), Geriatric Depression Scale(GDS-2), Instrumental Activities of Daily Living (IADL), Physical Self-Maintenance Scale (PSMS), Neuropsychiatric Inventory (NPI) and Clinical Dementia Rating (CDR).
Results: The BPMSE-Ch showed good internal consistency (α = 0.87); inter-rater and test-retest reliability were both excellent, ranging from 0.91 to 0.99. The construct validity of the measure was also supported by significant correlations with MMSE, SIB. Moreover, as expected, the BMPSE-Ch had a lower floor effect than the MMSE, but a ceiling effect existed for patients with MMSE scores above 11.
Conclusions: The BPMSE-Ch is a reliable and valid tool for evaluating cognitive function in Chinese patients with severe AD.

Key words: Alzheimer’s disease, Baylor Profound Mental Status Examination, Chinese version, severe dementia, validation.

Abbreviations: AD: Alzheimer’s disease; ADAS-cog: Alzheimer’s Disease Assessment Scale-Cognitive section; ANOVA: A one-way analysis of variance; BPMSE: Baylor Profound Mental Status Examination; BPMSE-Ch: Chinese version of the Baylor Profound Mental Status Examination; BPMSE-Ch-cog: Cognition subscale of Chinese version of the Baylor Profound Mental Status Examination; BPMSE-Ch-behav: Behavior subscale of Chinese version of the Baylor Profound Mental Status Examination; CDR: Clinical Dementia Rating; FAST: Functional Assessment Staging; GDS-1: Global Deterioration Scale; GDS-2: Geriatric Depression Scale: IADL, Instrumental Activities of Daily Living; MMSE: Mini-Mental State Examination; NPI: Neuropsychiatric Inventory; PSMS: physical self-maintenance scale; SIB: Severe Impairment Battery.



Alzheimer’s disease (AD) is a common neurodegenerative disorder among mainly elderly persons worldwide. The manifestations of AD include deterioration in cognition, memory and activities of daily living. It is usually accompanied by behavioral and psychological symptoms (1).
Currently, China is facing serious issues related to having an aging population. Persons aged 60 or older account for 17.3% of the total population (2). The prevalence of all-cause dementia over age 65 is about 6% in China, and AD makes up about 65% of all cases (3, 4). The rough prevalence of AD in China has reported to ranges from 7 per 1000 people to 66 per 1000 individuals (5). In a population-based cross-sectional survey, 10276 residents aged 65 year or older were drawn from Beijing (northern-eastern), Zhengzhou (northern-central), Guiyang (southern-western) and Guangzhou (southern-eastern). This survey showed that the prevalence of AD was 3.21% in a total of 10276 residents (6). Despite the fact that China has the relatively high AD prevalence, few studies of AD were conducted to research excellent methods for AD diagnosing and evaluating.
It seems unquestionable that AD is gradually evolving into a crucial social problem and presents a major challenge for health-care in China. However, awareness of AD and dementia in general is inadequate in China, leading to delayed diagnosis and initiation of treatment (7, 8).Therefore, many patients do not get evaluated until moderate to severe stages of the disease (9, 10). Moreover, once these patients present for an evaluation, tools to assess them are limited (11). Hence, better instruments are needed for the accurate assessment of patients with advanced AD.
A variety of neuropsychological and functional measures have been utilized to assess mental status and dementia severity both cross-sectionally and longitudinally. Frequently-used instruments include the Mini-Mental Status Examination (MMSE) (12), Severe Impairment Battery (SIB) (13), Alzheimer’s Disease Assessment Scale-Cognitive section (ADAS-Cog) (14), Geriatric Deterioration Scale (GDS-1) (15), Functional Assessment Staging Tool(FAST) (16) and Clinical Dementia Rating (CDR) (17). However, these scales show some limitations in patients with moderate to severe AD. The MMSE and ADAS-cog are not optimal for evaluating patients with severe AD because both contain a lot of verbal information and; therefore, the results may be confounded by language disorders and/or low level of education. SIB is a suitable tool to evaluate patients with severe dementia. However, this test takes more than 30 minutes to administer, which often exceeds the attention capacity of most patients with severe AD (18). The NPI is usually used to evaluate neuropsychiatric symptoms, but it is largely dependent on the description from caregivers (19). Overall, it is clear that a convenient and effective assessment instrument for measuring cognitive function in patients with severe AD is highly needed.
The Baylor Profound Mental State Examination (BPMSE) developed by Doody RS et al, is a simplified scale which has proved to be a reliable and valid tool for evaluating patients with moderate to severe AD (20). And in Doody’s study, European American accounted for about 82% of the original population. Thus, it is worthwhile to extend the use of the BPMSE to Severely demented patients from different cultural backgrounds. To date, there have been three translated versionsof the BPMSE, including Korean, Danish and Spanish (21-23). A study of the Korean version has shown that the BPMSE is a rapid, easy and valid scale for measuring cognitive function in patients with moderate to severe AD, particularly in patients with MMSE below 12. Similarly, a study utilizing the Danish version indicated that the BPMSE is a stable and strong instrument, and was recommended as an appropriate measure of dementia severity in patients with more sever impairment. Adaptation of the Spanish version revealed that BPMSE that the BPMSE is a useful tool for assessing cognitive function, even in daily medical practice focusing on patients with severe AD.
In China, there is no applicable scale for assessing patients with severe AD. Therefore, the aim of our study was to develop a Chinese version of the BPMSE (BPMSE-Ch) and to evaluate the psychometric properties of this version in Chinese patients with AD.




The original version of BPMSE consists of three parts, including the cognition subscale which includes 25 questions, the behavior subscale which includes 10 items to rate the presence or absence of behavioral problems, and 2 qualitative observations of language and social interaction. The cognition subscale assesses four areas: language, orientation, attention, and motor skills. The BPMSE total cognition subscale has a score between 0and 25: maximum 5 scores for orientation, 11 scores for language, 4scores for attention and 5 scores for motor skills. BPMSE behavior subscale score has a score between 0 (no behavioral disturbances) and 10 (all behavioral disturbances). In present study, we did not study the 2 qualitative observations about communication and social interactions.
Firstly, the original version of BPMSE was translated into Chinese with Mandarin by two bilingual translators whose mother tongue was Chinese. Then, the two Chinese versions were discussed by our team with gerontologists, a neurologist, a psychologist and an English expert, and the final Chinese version was formulated based on this input. Finally, two other translators of English philology back translated the final Chinese version into English to confirm consistency with the original version.


Patients were recruited from the Memory Clinic, Department of Geriatrics, The First Affiliated Hospital of Chongqing Medical University.

Enrollment criteria

(a) All participants were diagnosed as having probable AD according to National Institute of Neurological and Communicative Diseases and Stroke/Alzheimer’s Disease and Related Disorders Association criteria (NINCDS-ADRDA) (24); (b) Patients with MMSE <17 were included; (c) This study was approved by the Ethical Committee of The First Affiliated Hospital of Chongqing Medical University on human research; (d) Informed consent was obtained from all participants or their family members.

Exclusion criteria

(a) Patients were excluded if they had other neurological or psychiatric disorders or clinically significant medical conditions (e.g., acute infections, cancer, organ failure etc.,); (b) Patients had severely impaired communication abilities (e.g., global aphasia, deafness, blindness, muteness etc.,); (c) Patients had a history of head trauma, sedative drugs use or substance abuse.


The following measures were administered to all enrolled patients: BPMSE-Ch, MMSE, SIB, GDS-1, GDS-2, IADL, PSMS, NPI, and CDR. All tests were given on the same day. Two trained physicians in our clinic administered the BPMSE-Ch to evaluate a subset of enrolled patients consecutively and independently in order to examine inter-rater reliability. Finally, to investigate test-retest reliability, some patients were randomly chosen to be given the BPMSE-Cha second time within 30days of the first administration. It took 5 minutes on average to administer the BPMSE-Ch.

Statistical analyses

Internal consistency was assessed by computing coefficient α. Inter-rater reliability was assessed by correlation and paired t-test of the two scores obtained by different professionals on the same day. And the test-retest reliability was also calculated with correlational and paired t-test analyses using scores obtained on the same patient within 30 days. The correlations between the BPMSE-Ch and other measures including the SIB, MMSE, GDS-1, GDS-2, IADL, PSMS, NPI and CDR were calculated with Pearson correlations in order to evaluate construct validity. In addition, patients were divided into dementia severity groups using the MMSE and CDR, and differences between those groups were analyzed by conducting a one-way analysis of variance (ANOVA) and Scheffé’s test. Statistical analyses were performed with SPSS 20.0 for Windows.



Demographic characteristics and test performances

102 patients (male: 35, female: 67) were included in our study, the mean age of the patients was 77.76, ranging between 64 and 93. The mean years of education was 7.95, ranging from 0 to 16 years. The specific variations were showed in Table 1.

Table 1. Demographic characteristics and Scores on Instruments

Abbreviations: MMSE, Mini-Mental State Examination; BPMSE-Ch-cog, Cognition subscale of Chinse version of the Baylor Profound Mental Status Examination; BPMSE-Ch-behav, Behavior subscale of Chinse version of the Baylor Profound Mental Status Examination; SIB, Severe Impairment Battery; NPI, Neuropsychiatric Inventory; SD, standard deviation.



In our study, the coefficient α which could reflect the inter-correlations for items on the BPMSE-Ch cognition (BPMSE-Ch-cog) subscale, was 0.87. Furthermore, significant correlations were found among all the BPMSE-Ch-cog components, as seen in Table 2. Inter-rater and test-retest reliability were showed in Table 3.

Table 2. Correlations among BPMSE-Ch-cogsubscales

Correlation coefficients by Pearson correlation. * p< 0.001.


Table 3. Inter-rater and test-retest reliability

Abbreviations: BPMSE-Ch-cog, Cognition subscale of Chinse version of the Baylor Profound Mental Status Examination; BPMSE-Ch-behav, Behavior subscale of Chinse version of the Baylor Profound Mental Status Examination.
Correlation coefficients by Pearson correlation. n = Number of patients. All p values <0.001.


52 patients were tested twice by two trained doctors simultaneously and independently to determine the inter-rater reliability. The correlation between two total cognition subscale scores was 0.99 (p < 0.001) and there was no significant difference (paired t (51) = +1.84, p > 0.05) between the two scores (Mean = 0.17, SD = 0.68). The correlation between two behavior subscale scores was 0.92 (p < 0.001).
42 patients were tested twice by a same doctor within 30 day-interval for the test-retest reliability. The test-retest correlation between two total cognition scores was 0.99 (p < 0.001). Similarly, there was no significant difference (paired t (41) = +1.18, p > 0.05) between the two scores obtained at two time points (Mean = 0.14, SD = 0.78). The test-retest correlation between two behavior scores was 0.94 (p < 0.001).


Construct validity of the BPMSE-Ch was showed in Table 4. The correlations between the BPMSE-Ch-cog and MMSE (0.76), SIB (0.78), GDS-1 (-0.26), GDS-2 (0.16), PSMS (-0.26), IADL (-0.36), NPI (-0.41), CDR (-0.54) were calculated by Pearson correlation. The results showed that the construct validity of BPMSE-cog was very good (r=0.78) for SIB and good for MMSE (0.76). In addition, the relationship between BPMSE-Ch behavior subscale (BPMSE-Ch-behav) and NPI was analyzed (0.54, p < 0.001, Table 4).

Table 4. Concurrent validity of BPMSE-Ch

Abbreviations: BPMSE-Ch, Chinse version of the Baylor Profound Mental Status Examination; BPMSE-Ch-cog, Cognition subscale of Chinse version of the Baylor Profound Mental Status Examination; BPMSE-Ch-behav, Behavior subscale of Chinse version of the Baylor Profound Mental Status Examination; MMSE, Mini-Mental State Examination; SIB, Severe Impairment Battery; GDS1, Global Deterioration Scale; GDS2, Geriatric Depression Scale; IADL, Instrumental Activities of Daily Living; PSMS, physical self-maintenance scale; CDR, Clinical Dementia Rating; NPI, Neuropsychiatric Inventory.


Ceiling and floor effects

The relationship between BPMSE-Ch-cog and MMSE was revealed on a scatterplot (Supplementary Figure 1A). The range of 0 to 5 scores on the MMSE corresponded to a substantial range of 2 to 24 scores on BPMSE-Ch-cog, indicating that the BPMSE-Ch had no floor effect. In addition, it was found that patients scoring 12 to 16 on MMSE had the BPMSE-Ch-cog scores ranging from 20 to 25 (Mean: 23.08, SD: 1.08, Table 5). This demonstrated that BPMSE-Ch showed a ceiling effect among patients who were at a relative moderate level of dementia.


The relationship between BPMSE-Ch-cog and SIB scores is displayed (Supplementary Figure 1B). The relatively highR2=0.61 indicated that BPMSE-Ch-cog showed a strong association with the SIB, which demonstrated that the BPMSE-Ch was a sensitive tool for assessing patients with severe AD.

BPMSE-Ch-cog score stratified by MMSE levels

Table 5 presented that BPMSE-Ch-cog differentiated all the enrolled patients belonging to different severity groups according to the MMSE scores (F = 56.7, p <0.001). Patients in the MMSE Group 1 (range 16-12) had a BPMSE-Ch score of 23.08 ± 1.08, patients in the MMSE Group 2 (range 7-11) had a BPMSE-Ch score of 21.25 ± 3.53, and patients in the MMSE Group 3 (range 0-6) had a further reduced BPMSE-Ch score of 12.50 ± 6.69. From the results of Table 5, it was found that the differences in total BPMSE-Ch-cog score as well as in its four subcomponents scores between the Group 2 and Group 3 was significant (p < 0.001).

Table 5. Three severity groups according to the MMSE

Abbreviations: MMSE, Mini-Mental State Examination; BPMSE-Ch-cog, Cognition subscale of Chinse version of the Baylor Profound Mental Status Examination; SD, standard deviation; n = Number of patients. One-way ANOVA test. NS = Nonsignificant; 1. By Scheffé’s analysis


BPMSE-Ch-cog score stratified by CDR levels

BPMSE-Ch-cog differentiates the patients into different groups according to the CDR stage (F = 16.0, p < 0.001) (Supplementary Table 1). It was observed that the mean BPMSE-Ch-cog and subcomponents scores declined as the CDR stage increased (Supplementary Table 1). Furthermore, at Group 1 (CDR = 0.5), the total score of BPMSE-Ch-cog ranged from 23 to 25(Mean = 24.33, SD = 1.15); at Group 2 (CDR = 1), the total score of BPMSE-Ch-cog ranged from 19 to 25 (Mean = 22.86, SD = 1.42); at Group 3 (CDR = 2), the total BPMSE-Ch-cog score ranged from 2 to 25 (Mean = 19.82, SD = 5.59); at Group 4 (CDR = 3), the total BPMSE-Ch-cog score ranged from 2 to 24 (Mean = 12.50, SD = 7.29). It was observed that as the CDR stage increased, the corresponding range of BPMSE-Ch-cog became wide. Moreover, it was also shown that significant differences of total BPMSE-Ch-cog score and subcomponents scores existed between Group 3 and Group 4(Supplementary Table 1). All above suggested that BPMSE-Ch measured in a way different from CDR, and could differentiate levels of cognition at high CDR stages. Discussion The present study shows that BPMSE-Ch is a reliable, stable and valid instrument for assessing cognition in patients with severe AD. Internal consistency is robust, inter-rater reliability is near-perfect for both the BPMSE-Ch-cog and BPMSE-Ch-behav subscales, and test-retest reliability is also excellent. Furthermore, excellent construct validity was found referring to significant correlations with SIB (r=0.78), MMSE (r=0.76). These findings are consistent with the results of previous adoptions of Korean, Spanish, and Danish versions of the BPMSE. BPMSE-Ch-cog scores were strongly associated with MMSE, SIB ratings, indicating that the BPMSE-Ch-cog can differentiate well among patients with AD with differing degrees of cognitive impairment, particularly in the more severe end of the dementia spectrum, which of course is its primary intended use. In this regard, BPMSE-Ch-cog do not display floor effects in severely demented patients, as measured by the MMSE. Also, BPMSE-Ch-cog scores are strongly associated with SIB scores (while displaying a lower floor than the SIB), and its administration time is much shorter (only 5 minutes on average versus 30 minutes for the SIB). It further suggests that BPMSE-Ch is an efficient tool. Relative low correlations are also shown between BPMSE-Ch-cog scores and PSMS and IADL functional scores, demonstrating that the BPMSE-Ch can only partly measure cognitive abilities relevant to the abilities needed to function in daily life. We thought the possible reason is that the most enrolled patients would have reached maximum impairment of activities of daily living. It supposed that a certain degree of ceiling effects existed in IADL and PSMS tests. AlthoughGDS-1 is an available tool used to evaluate not only cognition but also the abilities to maintain daily life, participation in adverse activities and it is useful for the severe AD cases (25-27), it is a synthetic grade evaluation tool. The forced-choice format would place most enrolled patients into high stages. This might be the reason that the correlation between BPMSE-Ch and GDS-1 is low. Behavioral and psychological symptoms of dementia (BPSD) in patients with Alzheimer’s disease have a strong correlation with cognitive impairment and impairment in activities of daily living. NPI is a common tool for BPMSD evaluating. The BPMSE-Ch-behav selectively focused on disruptive behaviors. In this study, it has been found that there is a moderate correlation between BPMSE-Ch-behavand NPI. While the NPI is obtained by questions to the primary caregiver and is a complex and time-consuming process. Therefore, it indicated that BPMSE-Ch is also a relative practicable instrument to evaluate the behavioral and psychological symptoms in patients with severe dementia. The correlation between BPMSE-Ch-cog and GDS-2 is not significant (r = 0.16, p > 0.001). There are two possible reasons. Firstly, BPMSE-Ch-cog does not involve questions directed against depressive symptoms and is not intended to evaluate for depression. Secondly, it has been reported that patients with moderate-severe AD have relatively low GDS-2 scores (28), which is similar to our study. It suggests that patients with moderate-severe AD have no obvious depression symptoms. In our study, the highest GDS-2score seen was 24; therefore, GDS-2 sometimes shows a good complementary assessment for depression. Because the BPMSE measures clinical features distinct from the GDS-2 the absence of correlation is not surprising.
Regarding its suitability for use with severely impaired patients, it has been observed that the BPMSE-Ch-cog differentiates well between patients with MMSE scores 0-6 and those with MMSE scores 7-11, but not as well between patients with scores of 12-16 and those with scores 7-11. This indicates that the BPMSE-Ch, like its versions in other languages, is most appropriate to use with patients who are more severely impaired (with MMSE score of 11 or below). Similarly, analyses of patients in different CDR stages reveals that the total BPMSE-Ch score and subcomponent scores differ significantly between patients in CDR stage 2 versus those in CDR stage 3, and patients in both of these more severely impaired CDR stages exhibited a wide range of scores, with substantial variability. These results lend further support to the use of the BPMSE-Ch with severely impaired patients.
In conclusion, the BPMSE-Ch is a convenient, stable, reliable and valid scale to assess cognition in patients with moderate-severe AD, and is most appropriately used with patients who have MMSE scores 11 or below. And in future work, we should popularize the BPMSE-Ch in other areas of China including rural areas to research the properties about BPMSE. We believe that it would be beneficial for this instrument to be widely used for evaluating cognitive functioning of patients with severe AD in China.


Acknowledgments: Funding Information: This study was supported by grants from National Key R&D Program of China (2018YFC2001700), General Project of Technological Innovation and Application Development of Chongqing Science & Technology Bureau (cstc2019jscx-msxmX0239), Key project of Social undertakings and people’s livelihood security of Chongqing Science & Technology Commission (cstc2017shms-zdyfX0009) and Postgraduate Research Innovation Project of Chongqing(CYS16122), Particularly, we greatly thank Dr. Sergio Salmerón (Department of Geriatrics, Hospital General de Villarrobledo, Albacete, Spain) for the assistance in making a translation of BPMSE.

Conflict of Interest: The authors declare that they have no potential competing interests

Ethics approval and consent to participate: The study was approved by the Ethics Committee of The First Affiliated Hospital of Chongqing Medical University and has been performed in accordance with the ethical standards laid down in the Declaration of Helsinki and its later amendments.

Authors’ Contributions: Rachelle S. Doody and Yang Lü designed the study. Xue Fu, Weihua Yu and Yang Lü collected the data and wrote the paper. Yang Lü, Paul J. Massman and Rachelle S. Doody revised the manuscript: Xia Wang, Jia Zhang and Tao Luo analyzed data and assisted with writing the article.




1. J.C. Morris, K. Blennow, L. Froelich, et al. Harmonized diagnostic criteria for Alzheimer’s disease: recommendations, Journal of internal medicine2014; 275(3): 204-13.
2. F. Li, S. Chen, C. Wei, J. Jia. Monetary costs of Alzheimer’s disease in China: protocol for a cluster-randomised observational study, BMC neurology 2017; 17(1):15.
3. J. Jia, A. Zhou, C. Wei, et al. The prevalence of mild cognitive impairment and its etiological subtypes in elderly Chinese, Alzheimer’s & dementia : the journal of the Alzheimer’s Association 2014; 10(4): 439-47.
4. Y. Zhang, Y. Xu, H. Nie, et al. Prevalence of dementia and major dementia subtypes in the Chinese populations: a meta-analysis of dementia prevalence surveys, 1980-2010, Journal of clinical neuroscience : official journal of the Neurosurgical Society of Australasia 2012;19(10): 1333-7.
5. K.Y. Chan, W. Wang, J.J. Wu, et al. Epidemiology of Alzheimer’s disease and other forms of dementia in China, 1990-2010: a systematic review and analysis, Lancet (London, England) 2013; 381(9882): 2016-23.
6. J. Jia, F. Wang, C. Wei, et al. The prevalence of dementia in urban and rural areas of China, Alzheimer’s & dementia : the journal of the Alzheimer’s Association 2014, 10(1): 1-9.
7. D. Liu, G. Cheng, L. An, et al. Public Knowledge about Dementia in China: A National WeChat-Based Survey, International journal of environmental research and public health2019; 16(21).
8. X. Li, W. Fang, N. Su, Y. Liu, S. Xiao, Z. Xiao, Survey in Shanghai communities: the public awareness of and attitude towards dementia, Psychogeriatrics : the official journal of the Japanese Psychogeriatric Society2011; 11(2): 83-9.
9. M. Zhao, X. Lv, M. Tuerxun, et al. Delayed help seeking behavior in dementia care: preliminary findings from the Clinical Pathway for Alzheimer’s Disease in China (CPAD) study, International psychogeriatrics 2016; 28(2): 211-9.
10. D. Peng, Z. Shi, J. Xu, et al. Demographic and clinical characteristics related to cognitive decline in Alzheimer disease in China: A multicenter survey from 2011 to 2014, Medicine2016; 95(26): e3727.
11. F.A. Schmitt, W. Ashford, C. Ernesto, et al. The severe impairment battery: concurrent validity and the assessment of longitudinal change in Alzheimer’s disease. The Alzheimer’s Disease Cooperative Study, Alzheimer disease and associated disorders1997; 11 Suppl 2: S51-6.
12. M.F. Folstein, S.E. Folstein, P.R. McHugh, «Mini-mental state». A practical method for grading the cognitive state of patients for the clinician, Journal of psychiatric research 1975; 12(3): 189-98.
13. J. Saxton, A.A. Swihart, Neuropsychological assessment of the severely impaired elderly patient, Clinics in geriatric medicine 1989; 5(3): 531-43.
14. S.J. Cano, H.B. Posner, M.L. Moline, et al. The ADAS-cog in Alzheimer’s disease clinical trials: psychometric evaluation of the sum and its parts, Journal of neurology, neurosurgery, and psychiatry2010; 81(12): 1363-8.
15. B. Reisberg, S.H. Ferris, M.J. de Leon, T. Crook. The Global Deterioration Scale for assessment of primary degenerative dementia, The American journal of psychiatry1982; 139(9): 1136-9.
16. B. Reisberg. Functional assessment staging (FAST), Psychopharmacology bulletin 1988; 24(4): 653-9.
17. C.P. Hughes, L. Berg, W.L. Danziger, L.A. Coben, R.L. Martin, A new clinical scale for the staging of dementia, The British journal of psychiatry : the journal of mental science 1982; 140: 566-72.
18. G.M. Peavy, D.P. Salmon, V.A. Rice, et al. Neuropsychological assessment of severely demeted elderly: the severe cognitive impairment profile, Archives of neurology 1996; 53(4): 367-72.
19. K.L. Lanctot, J. Amatniek, S. Ancoli-Israel, et al. Neuropsychiatric signs and symptoms of Alzheimer’s disease: New treatment paradigms, Alzheimer’s & dementia (New York, N. Y.)2017; 3(3): 440-449.
20. R.S. Doody, S.L. Strehlow, P.J. Massman, E.P. Feher, C. Clark, J.R. Roy, Baylor profound mental status examination: a brief staging measure for profoundly demented Alzheimer disease patients, Alzheimer disease and associated disorders 1999; 13(1): 53-9.
21. A. Korner, A. Brogaard, I. Wissum, U. Petersen, The Danish version of the Baylor Profound Mental State Examination, Nordic journal of psychiatry2012; 66(3): 198-202.
22. H.R. Na, S.H. Lee, J.S. Lee, R.S. Doody, S.Y. Kim, Korean version of the Baylor Profound Mental Status Examination: a brief staging measure for patients with severe Alzheimer’s disease, Dementia and geriatric cognitive disorders 2009; 27(1): 69-75.
23. S. Salmeron, I. Huedo, M. Lopez-Utiel, et al. Validation of the Spanish version of the Baylor Profound Mental Status Examination, Journal of Alzheimer’s disease 2016; 49(1): 73-8.
24. G. McKhann, D. Drachman, M. Folstein, R. Katzman, D. Price, E.M. Stadlan, Clinical diagnosis of Alzheimer’s disease: report of the NINCDS-ADRDA Work Group under the auspices of Department of Health and Human Services Task Force on Alzheimer’s Disease, Neurology1984; 34(7): 939-44.
25. R.H. Paul, R.A. Cohen, D.J. Moser, et al. The global deterioration scale: relationships to neuropsychological performance and activities of daily living in patients with vascular dementia, Journal of geriatric psychiatry and neurology 2002; 15(1): 50-4.
26. J.S. Kim, C.W. Won, B.S. Kim, H.R. Choi, Predictability of various serial subtractions on global deterioration scale according to education level, Korean journal of family medicine2013; 34(5): 327-33.
27. S.H. Choi, B.H. Lee, S. Kim, et al. Interchanging scores between clinical dementia rating scale and global deterioration scale, Alzheimer disease and associated disorders2003; 17(2): 98-105.
28. A.J. Midden, B.T. Mast, Differential item functioning analysis of items on the Geriatric Depression Scale-15 based on the presence or absence of cognitive impairment, Aging & mental health 2017; 1-7.


I. McRae1, L. Zheng2,4, S. Bourke3, N. Cherbuin1, K.J. Anstey2,4

1. Centre for Research on Ageing Health and Wellbeing, Research School of Population Health, The Australian National University, Canberra, ACT, Australia; 2. Neuroscience Research Australia, Margarete Ainsworth Building, Barker Street, Randwick, Sydney NSW, Australia; 3..Department of Health Services Research and Policy, Research School of Population Health, The Australian National University, Canberra, ACT, Australia; 4. Ageing Futures Institute, School of Psychology, University of New South Wales, Sydney, NSW, Australia

Corresponding Author: Dr Ian McRae, Centre for Research on Ageing Health and Wellbeing, Research School of Population Health, The Australian National University, Canberra, ACT 2600, Australia, Email:, Ph: +61 431 929 750

J Prev Alz Dis 2020;
Published online December 15, 2020,



Background: Assessment of cost-effectiveness of interventions to address modifiable risk factors associated with dementia requires estimates of long-term impacts of these interventions which are rarely directly available and must be estimated using a range of assumptions.
OBJECTIVES: To test the cost-effectiveness of dementia prevention measures using a methodology which transparently addresses the many assumptions required to use data from short-term studies, and which readily incorporates sensitivity analyses.
DESIGN: We explore an approach to estimating cost-effective prices which uses aggregate data including estimated lifetime costs of dementia, both financial and quality of life, and incorporates a range of assumptions regarding sustainability of short- term gains and other parameters.
SETTING: The approach is addressed in the context of the theoretical reduction in a range of risk factors, and in the context of a specific small-scale trial of an internet-based intervention augmented with diet and physical activity consultations.
MEASUREMENTS: The principal outcomes were prices per unit of interventions at which interventions were cost-effective or cost-saving.
RESULTS: Taking a societal perspective, a notional intervention reducing a range of dementia risk-factors by 5% was cost-effective at $A460 per person with higher risk groups at $2,148 per person. The on-line program costing $825 per person was cost-effective at $1,850 per person even if program effect diminished by 75% over time.
CONCLUSIONS: Interventions to address risk factors for dementia are likely to be cost-effective if appropriately designed, but confirmation of this conclusion requires longer term follow-up of trials to measure the impact and sustainability of short-term gains.

Key words: Dementia, risk factors, cost-effectiveness, interventions, sustainability.



While many studies have addressed the association of lifestyle and vascular factors with dementia, few have addressed whether interventions designed to reduce risk factors are cost-effective (1). This is in part because dementia risk reduction programs are implemented well before the usual age of dementia onset. This means that economic evaluation using simulation modelling requires parameters relevant to a long-term time frame. As most intervention studies to date have 5 years or less of follow-up (1) (exceptions include the planned trials of multi-domain interventions (2)), cost-effectiveness studies require model parameters to be extrapolated well beyond the data observed. Reviews of model-based economic evaluations of dementia interventions (1, 3) have identified very few methods which assess prevention strategies. The types of non-pharmaceutical interventions identified in these reviews mainly focused on early assessment of dementia, screening or diagnosis rather than reduction in risk factors (3).
Short-term cost-effectiveness studies (4) and methodologies have been published which address cost-effectiveness of transitions from mild cognitive impairment (MCI) to dementia (5). However, assessing cost-effectiveness of programs which reduce or treat risk factors (many of which occur in mid-life) requires modelling the impact of interventions over longer time frames (1, 3, 5-9) and requires assumptions on how trial results are sustained over the longer term. In the absence of robust estimates of many of the parameters needed for full Markov or other simulation models, we suggest an alternate approach to estimating the price at which programs are cost-effective. This approach provides transparency in estimating the sensitivity of these prices with highly uncertain parameter estimates.
A 2019 review of health economic evaluations of primary prevention programs for dementia (1) identified three analyses of prevention strategies (6, 8, 9) which modelled dementia progress and costs over the long-term. Noting the range of uncertainties, the review recommended that “extensive sensitivity analysis to examine the impact of assumptions” be implemented. This included assumptions regarding long-term vs short-term outcomes of interventions, the impact of optimal program targeting, and discounting (1). Two of the analyses were partial evaluations which addressed potential cost savings from reductions in dementia levels, but did not address health benefits (usually measured by Quality Adjusted Life Years (QALYs)), so cost-effectiveness was not testable(6, 8). While including an extensive sensitivity analysis, the study which addressed cost-effectiveness (9) required a range of assumptions to estimate parameters including annual risk rates, mortality rates for those with and without dementia and QALY levels by age for people with dementia (9).
Estimated age/gender specific incidence rates(10) for dementia are available for Australia, but the impact of interventions on incidence of dementia at each age is not known, nor are age-specific costs or QALY estimates. Hence, there is value in exploring non-simulation approaches to estimate the cost-effectiveness of interventions which address dementia risk factors using aggregate data. We use an approach based on average lifetime costs of dementia and losses in quality of life per individual who develops dementia. Until long-term parameters can be obtained with confidence, this approach avoids the need for transition probabilities and cost and QALY measures by age. It also gives a direct means of linking costs and benefits and provides a transparent means of undertaking sensitivity analyses of all factors, including parameters reflecting the sustainability of improvements in risk factors, program targeting and discounting.
To demonstrate the proposed approach we draw on two examples (11, 12): (1) a study that estimated the effects of risk reduction through population attributable risk(PAR) and (2) a recent randomized control trial (RCT) which assessed the impact of an on-line dementia prevention program. The RCT has a relatively short follow-up (15 months), so to estimate the long term cost-effective and cost-saving prices we provide a range of different assumptions including the degree to which gains in risk reduction are sustained and how well the program is targeted to people with high likelihoods of progressing to dementia.


Methods and Data


We used available estimates of the proportion of adults aged 65 and over who are expected to develop dementia and then estimated the reduction in prevalence of dementia for a target population from the two example interventions. Savings in costs and QALYs per person generated by the interventions were estimated using the average per person life-time costs of dementia and loss of QALY due to dementia. This enabled us to estimate the maximum price per person for an intervention to be cost saving or cost effective.
The standard measure of cost-effectiveness (technically cost-utility) is the incremental cost per QALY gained (i.e. the Incremental Cost-Effectiveness Ratio or ICER). For the purposes of this study, an intervention with an ICER below $50,000 is considered cost-effective. While Australia has no formal ICER thresholds this is the level most commonly quoted and is consistent with UK, Australian (13) and American(14) literature.
Apart from sensitivity analysis for uncertain parameters, we examined: (1) the impact of program targeting, as an intervention targeted at the highest risk groups has a greater opportunity to reduce dementia prevalence, (2) the impact of “decay” which reflects reduction in the gains from an intervention over time, and (3) the impact of different levels of discounting. Discounting is a means of “valuing down” (1) future financial and health costs as people may prefer to save money (or gain health benefits) now rather than in the future.

Lifetime Costs of Developing Dementia

Lifetime costs for people with dementia are the product of average annual costs of treatment/care and the duration of care. While estimates of duration of dementia vary widely depending mainly on age at diagnosis, international evidence and reviews suggest that a mean of 5 years is appropriate for the duration of care for dementia (15, 16) (noting this may not be the same as the actual duration of dementia) (17)).
The available estimates of costs of dementia take several perspectives. An American study (18) including direct healthcare costs and costs of informal care estimated $260,000 per person in 2015 (all costs in Australian Dollars); while a 2016 Australian analysis found average annual costs of $35,550 per person including indirect costs such as loss of productivity of both patients and carers (10). With 5 years life with dementia, this becomes $177,750 lifetime cost per person. A later Australian study (19) of people with dementia in residential care with a markedly different methodology estimated higher annual costs of $88,000 per year for residential care compared to $55,000 from the earlier study(10). Given the varying results from these studies, we used a figure of $200,000 as baseline, with a range from $150,000 to $300,000 used for sensitivity analyses.

Loss of Quality Adjusted Life Years by People Developing Dementia

The lifetime loss of QALYs for people with dementia includes the loss due to poorer quality of life and the loss due to premature mortality. A conservative median estimated years of life lost to dementia used here as a baseline is 5 years. This is consistent with previous studies (20) and an Australian systematic review (15). Some estimates as high as 9 years of life have been found (7, 21); we use this as the upper limit for sensitivity analysis purposes.
Few generally applicable estimates of QALY values for people with dementia are available (22). Most studies deriving QALYs in a dementia context relate to specific RCTs with specific populations rather than comparing average people with and without dementia. We draw on estimates of average QALYs for people with dementia and the wider aged population (7, 23). With 5 years of life with dementia, and 5 years loss of life due to dementia there is an average loss 1.5 QALYs while alive and 4.2 QALYS due to premature mortality giving a lifetime loss of 5.7 QALYs from the dementia. A previous estimate (7) based on 6 years with dementia and 9 years loss of life led to an estimated 9.4 QALYs lost which we use as an upper level for sensitivity testing.

Prevalence of dementia

Population prevalence data is not required for Example 1 as the predicted outcomes are explicitly in prevalence terms, although it is required for Example 2. While “Australian data on dementia prevalence are lacking” (AIHW 2018 p138 (24)), we use an estimate of 10% for people aged 65 or over from a study using Australian data (10), which is marginally above estimates combining Australian and international data (10, 24). For Example 2, we assume that any reduction in risk will lead to an equivalent reduction in prevalence when the cohort reaches age 65 or over, and that this reduction will apply to the estimated 10% prevalence of dementia in this age group.


People generally value future costs and effects less than current costs and effects and the value diminishes the further into the future they are expected to occur (25). Hence, economic evaluations adjust the value of costs and benefits for the time at which they occur, using discounting (25). Discounting over long periods has major impacts on results of cost-effectiveness studies (1), particularly when comparing program costs at midlife to medical and other savings in later life (26). A range of discount levels are used by different organisations including: a) the use of 3% for both costs and QALYs (9), b) discount rates of 4% for costs and 1.5% for QALYs( 1), c) the use of 5% for both costs and QALYs in Australia by the Medicare Services Advisory Committee (25), and d) a UK recommendation that 3.5% be applied to both costs and QALYs (25).
In the light of extremely low interest rates in Australia and many other countries at present, and the long durations of discounting in this study, we use baseline discount rates of 3% for both costs and QALYs. For sensitivity analysis we include the Australian standard of 5% for both costs and QALYs, and the 4%/1.5% applied in Holland (1).
Simulation approaches apply discounting each year. However, assuming on average no differences between treated and untreated groups before onset of dementia, the discounting will have no material impact on the differences between treatment groups prior to diagnosis (note that while in principle costs change at onset, they are only measured from diagnosis). We, therefore, discount from the average age of commencement of the intervention to approximately the mid-point of the dementia period. To establish the period of discounting we take an average age of diagnosis as being in the early 80s (27-29). Most studies addressing average age at diagnosis show averages from the high 70s to mid 80s, but most commence with aged populations which may lead to some upward bias. We, therefore, include some alternate discounting periods for sensitivity analysis.

Example 1 – Estimates of Dementia Prevention using Population Attributable Risk

Ashby-Mitchell et al.(2017) (11) explored the aggregate Population Attributable Risk (PAR) from a set of known correlates of dementia (midlife obesity, physical inactivity, smoking, low educational attainment, diabetes mellitus, midlife hypertension, depression). They used PAR values to estimate the impact of uniform reductions in these correlates on dementia prevalence. They concluded that a uniform 5% improvement across all risks would, over 20 years, lead to a reduction in the prevalence of dementia of 3.2% or 17,454 people in Australia.
Any intervention which aimed to reduce the risk factors addressed in Example 1 would need to improve obesity levels and hypertension in mid-life so we assume an intervention targeted at the population aged 45 years and over with an average age of around 65 years. Consistent with the modelling in Example 1 this gives a 20-year period from average age at intervention to average age of dementia diagnosis (early 80s) which we use for discounting (15 years used for sensitivity testing).

Example 2 – BBL-GP Intervention

The Body-Brain-Life in General Practice program (BBL-GP) aims to reduce known dementia risk factors using a mixture of on-line training and face-to-face consultations with dietitians and exercise physiologists (12). Results are assessed using an aggregate measure combining a range of known risk factors (the ANU-ADRI (30)) with program participants compared to an active control group. After 62 weeks the BBL-GP participants showed a decline in ANU-ADRI scores of 4.62 units more than the active controls (12). For a population of Australians aged 60-64 years at baseline, a difference in baseline values of 1 point of ANU-ADRI is associated with a difference of 8% in people developing mild cognitive decline (MCI) or dementia after 12 years (31). This suggests a BBL-GP effect of 37% if the 4.62 units improvement is sustained.
This is an upper limit. Firstly, it is unlikely all the gains in risk factors will be sustained (e.g. maintaining weight loss). Secondly, the evidence of the impact of one point of ANU-ADRI on MCI and dementia may be the same as the long-term impact on dementia, but need not be, as there is likely to be a bias towards reducing MCI in those who are least likely to go forward to dementia. In this case the 8% impact of one ADRI point would be an overstatement. Finally, it is not clear if differences in the index obtained from an intervention have the same effect as differences brought about by lifetime experiences. The size of “decay” for any particular intervention is, therefore, driven by a range of factors including the time period between the intervention and the age at which dementia diagnosis is likely. For sensitivity analysis we test a range of different levels of reduction in impact of the BBL-GP program on actual dementia risk, beginning with a 50% reduction and increasing to a 95% reduction. We term this “decay” to reflect both the difficulty in sustaining the intervention’s short-term gains and the other issues described.
The trial population in Example 2 had an average age of 51 years (12), so for discounting purposes there is approximately 30 years to the average age of dementia diagnosis (20 years used for sensitivity testing). The average cost per participant in the BBL-GP trial relative to an active control was $2,700 including set-up costs. The number of participants in this trial was small, and while there are fixed costs of around $200 per person, other expenditures was almost independent of participant numbers. If more fully implemented the program would be expected to be at least quadrupled in size and costs would become $825 per person. We use this figure to assess cost-effectiveness. With a larger implementation, average costs would be further reduced.



Table 1 shows baseline estimates for Example 1 with a target population of all people aged 45 years or over. This suggests that, ignoring program costs and discounting, over the lifetimes of the people protected from dementia by the lifestyle changes there would be savings of $3.5b and 99,488 QALYs. While these savings are large, with a targeting across the whole population, the savings per targeted person are only $342. After allowing for discounting, the maximum cost per targeted person which could lead to a cost saving program is $189, while a cost less than $460 would achieve a cost-effective incremental cost per QALY gained (the ICER) of less than $50,000.

Table 1. Example 1 – PAR – Baseline costing

1. (10)= ((4) + (9)*(5))/(6)
Table 2 provides estimates of maximum costs per person for a program to be cost saving or cost-effective under different assumptions on target size, lifetime costs, QALY losses and discount rates. Tests 1-3 show relatively little sensitivity in cost-effective or cost-saving prices to changes in estimated lifetime costs and lifetime QALY losses to dementia, with greater effects of QALY increases than cost increases on the cost-effective price. Test 4 assumes the intervention targets only half the population aged 45 and over and assumes the targeting is so well focused on those at higher risk that the number of people avoiding dementia is unchanged. This generates a much greater change in the maximum acceptable costs than shown in Tests 1-3. Test 5 assumes an intervention targeted at a population of only 10,000 who are at very high risk of developing dementia (25% prevalence rate), and again with 3.2% of the anticipated cases “saved” from dementia (11). The cost-effective price increases to $2,069 (after discounting), more than 4 times the baseline estimate. With such precise targeting the percentage saved would probably be greater than 3.2%, and any increase in this parameter would increase the cost-effective prices proportionately. Table 2 also shows the impact of different discounting rates, with the 4%/1.5% levels having broadly similar results to baseline, but the 5%/5% showing acceptable prices around 60% of 3%/3% meaning interventions are considerably less likely to be cost-effective. Should the duration of discounting (the period from the intervention to average age of diagnosis) be reduced, for the 3%/3% calculation the maximum cost-effective price would increase by 15% meaning more expensive interventions would be cost-effective.

Table 2. Example 1 – PAR Costings – Sensitivity analyses

NOTE: * shows variation from baseline
Table 3 provides baseline estimates for Example 2. For presentation purposes the assumed population is 10,000 but results are independent of this number. The discounted program prices at baseline of $3,052 per person to be cost saving and $7,401 per person for the program to be cost-effective are well above the average price per participant of $825 relative to the active control.

Table 3. Example 2- BBL-GP – Baseline Costing

Table 4 provides sensitivity testing which in addition to the factors tested for Example 1 tests levels of “decay”, and shows that the targeting, decay and discounting assumptions have the greatest impact on the overall outcomes. The targeting level of 60% was chosen as the trial participants were mainly people with obesity, with the relative risk of developing dementia of 1.6 (11). With an average price of $825, results discounted at 3% and all other factors at baseline level, a decay of up to 88% would be cost-effective, although not 95% (Test 3). With a 60% loading for targeting and the maximum levels of cost savings from preventing dementia and QALY lost to dementia, the intervention would be cost-effective at 95% decay (Test 7). Test 8 shows that with the 60% loading for targeting and other factors at baseline, even at 93% decay from the short term results the program would be cost-effective.

Table 4. Example 2 – BBL-GP Costing – Sensitivity Analyses

NOTE: * shows variation from baseline
The patterns in these tables show that results are linear with respect to both targeting and “decay”, and less than linear with respect to estimated lifetime costs and QALYs lost to dementia. As for Example 1, discounting has a major impact on the results, although even with relatively high levels of “decay” (80% with all other factors at baseline) the intervention is likely to remain cost-effective with 5%/5% discounting. Should the duration of discounting be reduced, the maximum cost-effective price would increase by 34% for the 3%/3% discounting, although this does not lift any of these prices above $825 for the examples in Table 4.



Our results suggest that multi-domain programs such as the BBL-GP in Example 2 are likely to be cost-effective (unless program impacts decay almost completely over time), while the more generic approach of Example 1 requires tight targeting to at-risk populations to be cost-effective. These results are consistent with prior studies (1, 32) in showing the importance of targeting and sustainability of observed results beyond the period of study follow-up.
The estimated cost of $825 per person in Example 2 would be reduced with wider implementation. Previous studies have estimated cost of dementia risk reduction programs of $200 to $500 per person (9, 33). If Example 2 could be conducted at these lower costs it is more likely to be cost-effective even at high levels of “decay”. Should the duration from intervention to diagnosis of dementia be less than the assumed levels, the effect of discounting would be reduced, and maximum cost-effective program prices increased.
Recalling that “decay” includes other factors as well as the need for participants to maintain lifestyle changes over many years, high levels of decay are possible. Studies with long follow-up are needed to assess actual program effects. Programs which continue to interact with the participants continuously over time are likely to improve effects but increase costs. We also note that improving dementia risk factors would improve a range of other health outcomes (e.g. cardiovascular health, diabetes, mild cognitive impairment), in addition to dementia related outcomes. If the total benefits of risk reduction programs were included, they would be even more likely to be cost-effective.


The main limitation in this and any other analysis of cost-effectiveness of dementia prevention interventions is the uncertainty in many parameters, which has required extensive sensitivity analysis to assess a reasonable range of outcomes. However, the approach taken here integrates sensitivity analysis and facilitates estimation of outcomes under varied assumptions.
The study assumed binary outcomes of dementia against no dementia and did not address the benefit of delay in onset of dementia, which also reduced the likelihood of finding cost-effective outcomes. Dementia related QALY losses prior to diagnosis were not included in the study, leading to a further conservative bias in estimates.
Like all approaches to cost-effectiveness modelling for dementia prevention interventions this study is limited by having only short-term program outcomes (1). The baseline calculations assume (1) in the case of Example 1, that well-established associations between risk factors and dementia are causative; (2) for both examples, changes in risk factors driven by interventions have the same effect as if the level of the risk factor was achieved ”naturally” (e.g. reversing midlife obesity with an intervention has the same effect as achieving a normal weight at midlife without intervention) and; (3) changes in risk from a short-term intervention are sustained over time(e.g. weight does not revert to previous levels). The approach used here however provides a simple and transparent way to test the impact of these ongoing concerns.



To explore the cost-effectiveness of interventions aimed at dementia risk reduction requires a means of extrapolating outcomes from what, to date, have been relatively short-term trials. We examined lifetime costs (in both dollar and QALY terms) of dementia and applied these to projected changes in risks of dementia from two example studies. The results suggest that the multi-domain approach of BBL-GP is highly likely to be cost-effective.
The approach shows further the importance of targeting programs to “at risk” portions of the population and the sensitivity to the sustainability or otherwise of trial results. While these factors are well-known, the approach provides a means of estimating the orders of magnitude of program impacts and reinforces the need for longer-term studies to measure all relevant factors to enable assessment of cost-effectiveness with greater confidence.


Funding Sources: This research was undertaken as part of the Centre for Research Excellence in Cognitive Health, which was funded by the National Health and Medical Research Council grant #1100579. Anstey is funded by NHMRC Fellowship #1102694, Zheng is part supported by the NHMRC Dementia Centre for Research Collaboration. The funders had no role in the design and conduct of this study; in the analysis and interpretation of the data; in the preparation of the manuscript; or in the review or approval of the manuscript.
Acknowledgements: We acknowledge the ARC Centre of Excellence in Population Ageing Research.

Conflict of Interest: Dr McRae, Dr Zheng, Dr Bourke, and Professor Cherbuin declare that they have no conflict of interest. Professor Anstey reports personal fees from StaySharp, outside the submitted work.

Ethical standards: The authors followed the ethical guidelines of the Journal for this manuscript.



1. Handels R, Wimo A. Challenges and recommendations for the health-economic evaluation of primary prevention programmes for dementia. Aging & mental health. 2019 Jan;23(1):53-9.
2. Kivipelto M, Mangialasche F, Ngandu T. World Wide Fingers will advance dementia prevention. Lancet Neurol. 2018 Jan;17(1):27.
3. Nguyen KH, Comans TA, Green C. Where are we at with model-based economic evaluations of interventions for dementia? a systematic review and quality assessment. International psychogeriatrics. 2018 Nov;30(11):1593-605.
4. Meeuwsen E, Melis R, van der Aa G, Golüke-Willemse G, de Leest B, van Raak F, et al. Cost-effectiveness of one year dementia follow-up care by memory clinics or general practitioners: economic evaluation of a randomised controlled trial. PloS one. 2013;8(11):e79797-e.
5. Green C, Handels R, Gustavsson A, Wimo A, Winblad B, Skoldunger A, et al. Assessing cost-effectiveness of early intervention in Alzheimer’s disease: An open-source modeling framework. Alzheimer’s & dementia : the journal of the Alzheimer’s Association. 2019 Oct;15(10):1309-21.
. Lin PJ, Yang Z, Fillit HM, Cohen JT, Neumann PJ. Unintended benefits: the potential economic impact of addressing risk factors to prevent Alzheimer’s disease. Health affairs (Project Hope). 2014 Apr;33(4):547-54.
7. Tsiachristas A, Smith AD. B-vitamins are potentially a cost-effective population health strategy to tackle dementia: Too good to be true? Alzheimer’s & Dementia: Translational Research & Clinical Interventions. 2016;2(3):156-61.
8. van Baal PH, Hoogendoorn M, Fischer A. Preventing dementia by promoting physical activity and the long-term impact on health and social care expenditures. Preventive medicine. 2016 Apr;85:78-83.
9. Zhang Y, Kivipelto M, Solomon A, Wimo A. Cost-effectiveness of a health intervention program with risk reductions for getting demented: results of a Markov model in a Swedish/Finnish setting. Journal of Alzheimer’s disease : JAD. 2011;26(4):735-44.
10. Brown L, Hansnata E, La HA. Economic Cost of Dementia in Australia 2016-2056, Report Prepared for Alzheimer’s Australia. Canberra: NATSEM at the Institute for Governance and Policy Analysis, University of Canberra2017.
11. Ashby-Mitchell K, Burns R, Shaw J, Anstey KJ. Proportion of dementia in Australia explained by common modifiable risk factors. Alzheimer’s Research & Therapy. [journal article]. 2017 February 17;9(1):11.
12. Anstey KJ, Kim S, Pond CD, Cherbuin N, McMaster M, Lautenschlager N, et al. Internet-based Intervention Augmented with Diet and Physical Activity Consultation to Decrease Risk of Dementia in At-risk Adults in a Primary Care Setting: Pragmatic Randomized Controlled Trial. Journal of Medical Internet Research. 2020;Accepted for Publication.
13. Wang S, Gum D, Merlin T. Comparing the ICERs in Medicine Reimbursement Submissions to NICE and PBAC—Does the Presence of an Explicit Threshold Affect the ICER Proposed? Value in Health. 2018 2018/08/01/;21(8):938-43.
14. Neumann PJ, Cohen JT, Weinstein MC. Updating Cost-Effectiveness — The Curious Resilience of the $50,000-per-QALY Threshold. The New England Journal of Medicine. 2014 2014 Aug 28;371(9):796-7.
15. Brodaty H, Seeher K, Gibson L. Dementia time to death: a systematic literature review on survival time and years of life lost in people with dementia. International psychogeriatrics. 2012 Jul;24(7):1034-45.
16. Sachs GA, Carter R, Holtz LR, Smith F, Stump TE, Tu W, et al. Cognitive impairment: an independent predictor of excess mortality: a cohort study. Annals of internal medicine. 2011 Sep 6;155(5):300-8.
17. Savva GM, Arthur A. Who has undiagnosed dementia? A cross-sectional analysis of participants of the Aging, Demographics and Memory Study. Age and Ageing. 2015;44(4):642-7.
18. Jutkowitz E, Kane RL, Gaugler JE, MacLehose RF, Dowd B, Kuntz KM. Societal and Family Lifetime Cost of Dementia: Implications for Policy. J Am Geriatr Soc. 2017 Oct;65(10):2169-75.
19. Gnanamanickam ES, Dyer SM, Milte R, Harrison SL, Liu E, Easton T, et al. Direct health and residential care costs of people living with dementia in Australian residential aged care. International journal of geriatric psychiatry. 2018;33(7):859-66.
20. Haaksma ML, Eriksdotter M, Rizzuto D, Leoutsakos J-MS, Olde Rikkert MGM, Melis RJF, et al. Survival time tool to guide care planning in people with dementia. Neurology. 2020;94(5):e538-e48.
21. Strand BH, Knapskog AB, Persson K, Edwin TH, Amland R, Mjorud M, et al. Survival and years of life lost in various aetiologies of dementia, mild cognitive impairment (MCI) and subjective cognitive decline (SCD) in Norway. PloS one. 2018;13(9):e0204436.
22. Prince MJ, Wimo A, Guerchet MM, et al. World Alzheimer Report 2015- The Global Impact of Dementia London: Alzheimer’s Disease International2015.
23. Orgeta V, Edwards RT, Hounsome B, Orrell M, Woods B. The use of the EQ-5D as a measure of health-related quality of life in people with dementia and their carers. Qual Life Res. 2015;24(2):315-24.
24. Australian Institute of Health and Welfare. Australia’s health 2018. Canberra: Australian Institute of Health and Welfare2018.
25. Attema AE, Brouwer WBF, Claxton K. Discounting in Economic Evaluations. Pharmacoeconomics. 2018;36(7):745-58.
26. Devlin N, Scuffham P. Health today versus health tomorrow: does Australia really care less about its future health than other countries do? Aust Health Rev. 2020 Jun;44(3):337-9.
27. Plassman BL, Langa KM, McCammon RJ, Fisher GG, Potter GG, Burke JR, et al. Incidence of dementia and cognitive impairment, not dementia in the United States. Ann Neurol. 2011;70(3):418-26.
28. Brinks R, Landwehr S, Waldeyer R. Age of onset in chronic diseases: new method and application to dementia in Germany. Population Health Metrics. 2013 2013/05/02;11(1):6.
29. Wolters FJ, Tinga LM, Dhana K, Koudstaal PJ, Hofman A, Bos D, et al. Life Expectancy With and Without Dementia: A Population-Based Study of Dementia Burden and Preventive Potential. Am J Epidemiol. 2019 Feb 1;188(2):372-81.
30. Anstey KJ, Cherbuin N, Herath PM, Qiu C, Kuller LH, Lopez OL, et al. A self-report risk index to predict occurrence of dementia in three independent cohorts of older adults: the ANU-ADRI. PloS one. 2014;9(1):e86141.
31. Cherbuin N, Shaw ME, Walsh E, Sachdev P, Anstey KJ. Validated Alzheimer’s Disease Risk Index (ANU-ADRI) is associated with smaller volumes in the default mode network in the early 60s. Brain imaging and behavior. 2019 Feb;13(1):65-74.
32. Richard E, Andrieu S, Solomon A, Mangialasche F, Ahtiluoto S, Moll van Charante EP, et al. Methodological challenges in designing dementia prevention trials – the European Dementia Prevention Initiative (EDPI). Journal of the neurological sciences. 2012 Nov 15;322(1-2):64-70.
33. Clare L, Nelis SM, Jones IR, Hindle JV, Thom JM, Nixon JA, et al. The Agewell trial: a pilot randomised controlled trial of a behaviour change intervention to promote healthy ageing and reduce risk of dementia in later life. BMC Psychiatry. 2015 2015/02/19;15(1):25.


P. de Souto Barreto1,2, K. Pothier3, G. Soriano1,2, M. Lussier4,5, L. Bherer4,5,6, S. Guyonnet1,2, A. Piau1,2, P.-J. Ousset1, B. Vellas1,2

1. Gerontopole of Toulouse, Institute of Ageing, Toulouse University Hospital (CHU Toulouse), Toulouse, France; 2. UPS/Inserm UMR1027, University of Toulouse III, Toulouse, France; 3. University of Tours, EA 2114, PAVEA Laboratory, Tours, France; 4. Département de Médecine, Université de Montréal, Montréal, Canada; 5. Centre de recherche de l’Institut universitaire de gériatrie de Montréal, Montréal, Canada; 6. Centre de recherche de l’Institut de cardiologie de Montréal. Montréal, Canada

Corresponding Author: Professor Philipe de Souto Barreto, Gérontopôle de Toulouse, Institut du Vieillissement, 37 Allées Jules Guesde, F-31000 Toulouse, France, Phone: (+33) 561 145 668, Fax: (+33) 561 145 640, e-mail:

J Prev Alz Dis 2020;
Published online December 4, 2020,



Importance/Objective: To describe the feasibility and acceptability of a 6-month web-based multidomain lifestyle training intervention for community-dwelling older people and to test the effects of the intervention on both function- and lifestyle-related outcomes.
Design: 6-month, parallel-group, randomized controlled trial (RCT).
Setting: Toulouse area, South-West, France.
Participants: Community-dwelling men and women, ≥ 65 years-old, presenting subjective memory complaint, without dementia.
Intervention: The web-based multidomain intervention group (MIG) received a tablet to access the multidomain platform and a wrist-worn accelerometer measuring step counts; the control group (CG) received only the wrist-worn accelerometer. The multidomain platform was composed of nutritional advices, personalized exercise training, and cognitive training.
Main outcomes and measures: Feasibility, defined as the proportion of people connecting to ≥75% of the prescribed sessions, and acceptability, investigated through content analysis from recorded semi-structured interviews. Secondary outcomes included clinical (eg, cognitive function, mobility, health-related quality of life (HRQOL)) and lifestyle (eg, step count, food intake) measurements.
Results: Among the 120 subjects (74.2 ±5.6 years-old; 57.5% women), 109 completed the study (n=54, MIG; n=55, CG). 58 MIG subjects connected to the multidomain platform at least once; among them, adherers of ≥75% of sessions varied across multidomain components: 37 people (63.8% of 58 participants) for cognitive training, 35 (60.3%) for nutrition, and three (5.2%) for exercise; these three persons adhered to all multidomain components. Participants considered study procedures and multidomain content in a positive way; the most cited weaknesses were related to exercise: too easy, repetitive, and slow progression. Compared to controls, the intervention had a positive effect on HRQOL; no significant effects were observed across the other clinical and lifestyle outcomes.
Conclusions and Relevance: Providing multidomain lifestyle training through a web-platform is feasible and well-accepted, but the training should be challenging enough and adequately progress according to participants’ capabilities to increase adherence. Recommendations for a larger on-line multidomain lifestyle training RCT are provided.

Key words: Multimodal lifestyle intervention, exercise, cognitive stimulation, nutritional advice, web-based intervention.



In recent years, the multidomain strategy, an approach characterized by the combination of several lifestyle interventions, have received increasing attention, with the development of large randomized controlled trial (RCT) by our team (1) and others (2, 3). The rationale behind the multidomain lifestyle strategy is simple: if interventions, such as physical exercise, healthy nutrition, and cognitive stimulation, that have already proven their effectiveness for improving/maintaining individual’s health during aging are combined, the benefits will then be potentiated.
Although several studies have investigated the effects of multidomain lifestyle strategies on functional-related outcomes during aging (4), mixed findings were obtained. The interventions tested so far are difficult to transpose into actual clinical practice, mainly because they are burdensome (eg, participants must visit research facilities several times a week or month) and expensive (eg, they require specialized professionals, such as exercise instructors). Several multidomain lifestyle platforms exist (5), however, to the best of our knowledge, none of them tested a training (not just health information, counselling or motivation) intervention, dedicated to older adults, through a randomized controlled trial (RCT) design. The advantages of having an online multidomain lifestyle training platform for which the content and procedures have proven their feasibility, acceptability, and efficacy, are multiple, since such a platform may be accessed at any time by the participants, being adapted to individual’s time constraints, and does not need the physical presence of both participants and specialized professionals. These advantages are still more important in the current context of social isolation and population containment caused by the COVID-19 pandemics.
The main purposes of the present article were to describe the feasibility and acceptability of a 6-month randomized controlled trial (RCT) of a web-based multidomain lifestyle training intervention for community-dwelling older people with spontaneous memory complaint. Secondary objectives were to test the effects of the intervention on both function- (eg, cognition, mobility) and lifestyle-related (eg, PA, food intake) outcomes.



A detailed description of the eMIND trial has been published elsewhere (6). eMIND is a 6-month pilot, parallel-arm, RCT of a multidomain lifestyle intervention composed of cognitive training, exercise training, and nutritional advices, among community-dwelling older adults from Toulouse area, South-West, France. The protocol was approved by the ethic committee of Tours (CPP Tours; 2017T2-10); the first recruitment occurred in December 2017 and the last study visit in September 2019. The trial was registered in a publicly accessible registry (; NCT03336320). All participants signed an informed consent before undertaking study procedures.


Inclusion criteria were: ≥ 65 years-old; Mini-Mental State Examination (MMSE(7)) ≥ 24; presenting subjective memory complaints; easy access to internet (Internet access at least twice a week). Exclusion criteria were: illness with life expectancy less than six months; diagnosis of dementia according with DSM-V; diagnosis of neurodegenerative diseases, particularly Parkinson; major depression; any health condition potentially deteriorated by exercise; dependency in ≥ 1 activity of daily living (basic ADL); already participating in exercise or cognitive training ≥ 2 times/week in the last 2 months.
One hundred-twenty participants were enrolled in eMIND (74.2 years-old ± 5.6; 57.5% women).

Randomization and masking

Participants were randomized after baseline assessments in a 1:1 ratio to either multidomain intervention group (MIG) or control group (CG) using a dedicated at-distance website. Concealment of group allocation was warranted by using opaque envelops, stored in a safe and locked place. Outcome assessors were blinded to group assignment.


Figure 1 displays the main procedures of eMIND. Participants randomized to MIG received a tablet (model: HP x2 210 G2 – 10.1) providing access to a secured, password-encrypted platform that respected all the laws and regulations in France. Data was stored in an approved database for health data. MIG participants also received a commercial wrist-worn accelerometer (model: FitBit Flex 2) that provides objectively measured step counts. The accelerometer was synchronized with the tablet. Participants could access the platform for as many times as they wanted.

Figure 1. Flow of the eMIND study procedures

After baseline assessments and randomization, MIG received a tablet with access to the multidomain platform for six months and a wrist-worn accelerometer; CG received the wrist-worn accelerometer and information on multidomain activities available in the website of the Toulouse University Hospital. The lowest part of the figure (MIG side) displays examples of how participants received the multidomain lifestyle training in their tablets, with cognitive training (eg, different Stroop tasks and N-back), and videos for both exercise (eg, chair rise – for illustration purposes, we put sit and up-right positions together, but participants had videos) and nutritional advice (eg, proteins).


The web multidomain platform focused on three lifestyles: nutritional advice, and exercise and cognitive training. The platform was equipped with a chat, to facilitate communication of participants with the research team, a personalized agenda showing the day-by-day activities (eg, exercise and cognitive training to be done, nutritional advices), a library area where the content of the interventions and educational material on lifestyles were available. Participants were requested to follow both exercise and cognitive training twice a week, and nutritional advices every fifteen days; for that, they should only click on the activities displayed in their personal agenda. The content of each lifestyle is briefly described below.

Exercise training

Three different 6-month exercise programs, with increasingly challenging exercises, were proposed according to individual’s baseline physical function (according to the short physical performance battery (SPPB)). Several videos with different exercises were elaborated: eg, chair rise, walking in the line, flexions and extension in the knee, walking backwards, tiptoe standing, etc. Each exercise video had subtitles with big letters to facilitate the understanding. The exercise program was developed by an experienced exercise scientist on the basis of exercise principles (eg, frequency, intensity, load progression), focusing on the lower-body. A typical exercise session was composed of three different types of strength exercises, three balance exercises (the number of both sets and repetitions per set varied according to individual’s physical function), a specific advice on aerobic exercise (type, session duration, mode (continuous or in bouts), intensity and how to subjectively reach the self-perceived exertion), and a set of flexibility exercises for warm up and cooling down.

Cognitive training

A computerized cognitive training was provided using Neuropeak (, a password encrypted platform developed by researchers from the University of Montreal, Canada. Participants trained on cognitive tasks mainly related with executive functions, with a 6-month cognitive training pre-established personalized program with progressive difficulty; participants received active feedback encouraging them to perform beyond their baseline performances. Three types of tasks were performed: the dual-task, Stroop task, and N-back task. For the dual task (divided attention training), participants had to identify vehicles and fruits, both separately and concurrently (8). For the Stroop task (inhibition and switching training), participants were presented four different conditions: concordant, counting, discordant, and switching. For the N-back (updating training), digits were sequentially presented and the participant had to indicate if the digit was the same than the digit presented “n” steps earlier.

Healthy nutrition advice

Twelve videos (about five-to-eight minutes per video, two videos per month for the 6-month intervention) on nutritional advices were produced for this study by an experienced hospital dietitian from the department of geriatrics (Toulouse University Hospital), being organized under important nutritional topics for older adults, such as, proteins, fat, hydration, fruits and vegetables, calcium and osteoporosis, etc. Advices were based on the French recommendations from the “Programme National Nutrition Santé” (PNNS) ( A quiz with three questions was used to facilitate the retention of the key messages for each nutritional video. A personalized approach for nutritional advices had been planned for people at-risk of nutritional deficiency (mini-nutritional assessment(9) (MNA) ≤ 23.5).

Control group

All controls received the same accelerometer than MIG, but did not have access to the multidomain web-platform. They received a link to information on multidomain activities produced by the research team ( CG participants were asked to bring their own smartphone/tablet at the baseline visit and the research team helped them to synchronize the accelerometer with the smartphone/tablet.

Main outcome measures

For this pilot RCT, feasibility and acceptability of study procedures and tools were the main outcomes. Feasibility was assessed through the adherence to the multidomain protocol. Participants accessing all the three interventions (clicking on the multidomain contents in their personal agenda in the web-platform) for at least 75% of the requirements were considered adherers, confirming the feasibility of the study procedures. Acceptability was assessed through the content analysis of recorded semi-structured interviews performed at the post-intervention assessments. The main element of acceptability was defined by the question “In your opinion, can this web-platform be used in its current state?”, which was anchored by the following responses: No, it requires major changes for improving its easiness-of-use; No, but it requires only minor adaptations; Yes, but a few modifications could render it easier to use; Yes, the platform is easy-to-use in its current state. The main modifications proposed by participants to improve both intervention content and technological aspects were recorded and explored.

Secondary endpoints

Cognitive function

Assessed using the mean of a cognitive composite score (10) combining the Z-scores of four scales: MMSE (7) 10-items orientation, Digit Symbol Substitution Test of Wechsler Adult Intelligence Scale-Revised (DSST, WAIS-R) (11), total recall (up to 48 points) of the Free and Cued Selective Reminding test (FCSRT) (12), and Category Naming test (CNT). The following original scales were investigated separately: MMSE, DSST WAIS-R, total recall of FCSRT, CNT, and Controlled Oral Word Association Test (COWAT) (13).

Physical function

Measured using the SPPB (14), composed of three tests (4-meter usual pace gait speed, 5-repetition chair rise, and balance tests; score range from 0 to 12, higher is better), and the 4-meter gait speed (m/sec).

Depressive symptoms

Measured by the 15-item Geriatric Depression Scale (GDS-15)(15); scores vary from zero to 15, higher is worse.

Nutritional status

Assessed by the MNA (9); scores vary from zero to 30, higher is better.

Health-related quality of life (HRQOL)

Assessed using the Euro-QoL 5D-5L (16). We used two variables: index value (continuous variable varying from -1 to 1, higher is better) for the French population (17) calculated using the EuroQol calculator website ( and the visual analogue scale (VAS, varying from zero to 100, higher is better).

Physical Activity (PA)

This was assessed subjectively by a self-reported questionnaire (QAPPA (18, 19), continuous values of metabolic equivalent task/week (MET-min/week), higher means higher PA) and using step counts from the accelerometers. We used the mean steps/day from four waves of data collection: first week (baseline measure); first month (excluding the first week); month three; month six. Data on step count were considered valid if captured during ≥60% of the exposure (ie, ≥ 18 days/month) and ignoring values ≤1000 steps (20–22).

Leisure-time cognitive activities

Assessed through a 14-item self-reported questionnaire asking for the frequency of cognitive stimulating activities (eg, crosswords, cultural outings); scores range from 0 to 56, higher is better.

Food intake

Measured by a 21-item food frequency questionnaire (FFQ; scores vary from 0 to 21, higher is better) used in the clinical practice in the frailty day-hospital of the Toulouse University Hospital.

Statistical Analysis

The present pilot study was proposed to inform the development of a future, well-powered RCT of ICT multidomain lifestyle intervention; we estimated having 60 people per study arm, taking into account low adherence to multidomain interventions and 6% dropout, would allow us to have the needed data for a proper sample size calculation on clinical outcomes.
Analyses were performed on an intention-to-treat (ITT) basis. Descriptive statistics (mean (SD) and absolute numbers (%), as appropriate) were used. Content analysis of recorded semi-structured interviews performed at post-intervention were used to identify aspects to be improved in study procedures and tools. Baseline difference between MIG and CG were tested using Student t-test for independent samples and chi-square test, as appropriate. The effects of the intervention on the secondary outcome measures were assessed using mixed effect linear regressions, with group, time and group-by-time interaction as fixed terms, a random effect at participants’ level, and a random slope on time; participant’s age was used as a confounder in all analysis due to imbalance between groups.
Statistical significance was determined p < 0.05. All analyses were performed using Stata (v.14.0, Texas, USA).



Figure 2 show the flow of study participants. From the 120 participants randomized, 109 were assessed at 6-month (MIG, n=54; CG, n=55). Among the 58 adverse events registered, 12 were serious (eg, heart infarct), being four in MIG (n=3 subjects) and 8 in CG (n=6 subjects); among the 46 non-serious events (eg, intermittent dizziness), 32 occurred among MIG (n=24 subjects) and 14 in CG (n=10 subjects). No adverse event (both serious and non-serious) was related to study procedures according with study physicians.
Table 1 shows participants’ characteristics. Study groups were well-balanced, except that MIG participants were 2-year older than CG (p=0.052).

Figure 2. Flowchart of study participants


Table 1. Participants’ characteristics


Regarding feasibility, 58 (out of 60) participants in MIG connected to the multidomain platform at least once during the 6-month trial. Among them, adherers (≥75% of prescribed sessions) varied across the different components of the multidomain: 37 individuals (63.8%) for cognitive training, 35 (60.3%) for nutrition, and only three (5.2%) for exercise; these same three persons were adherers in all three multidomain components. Regarding exercise, 31 individuals (53.4%) followed ≥50% of the requested frequency (they connected once a week); for cognitive training and nutritional advices, they were 75.8% and 81%, respectively.
Regarding acceptability, we interviewed 53 participants from MIG: four (7.5%) said the multidomain web-platform was not ready to be used and needed major changes; three (5.7%) indicated it required minor changes; 18 (34%) said it was ready to be used, but minor modifications could render it easier to use; and 28 (52.8%) indicated the platform was ready to be used without any change. Among the strengths/weakness of the platform, although most participants interviewed (n=50, 94.3%) indicated the technical/technological aspects were simple, they reported technical interruptions during the cognitive training; one person indicated the letters were too small. Although all individuals interviewed indicated the content was clear, they found weaknesses, the most cited being the physical exercises were too easy (not challenging enough), repetitive (should vary more the exercises proposed), sometimes difficult to perform due to space limitations at home, and progressed slowly. To a lower extent, other weaknesses highlighted were: nutritional advices were lacking novelty; cognitive training was sometimes repetitive and too long. Some participants suggested that having more contacts with the research team would be helpful and a few indicated that the gamification (eg, motivating rewarding system, points) of the intervention could increase interest and adherence.
The effects of the intervention on clinical and lifestyle outcomes are displayed in Table 2. No statistically significant effects were found, except for the two variables (ie, index value and VAS) of HRQOL, showing MIG had an improved HRQOL compared to CG.
A set of recommendations for developing a large RCT on multidomain lifestyle training intervention is provided in Box 1. These recommendations were developed according with the lessons learnt from eMIND.

Table 2. Effects of the web-based multidomain lifestyle training intervention on both clinical and lifestyle secondary outcomes using mixed-effect linear regression

a. For all variables, except GDS-15, positive values in the within-group adjusted mean difference indicate improvement over time. For GDS-15, negative values indicate improvement; b. For all variables, except GDS-15, positive values in the between-group adjusted mean difference favor the multidomain intervention. For GDS-15, negative values favor the multidomain intervention

Box 1. Recommendations for developing a large RCT on multidomain lifestyle training: lessons from eMINDa

a. These recommendations should be seen as additional elements to the eMIND contents and procedures.



This is the first RCT using a web-based multidomain lifestyle-training intervention for community-dwelling older adults. We showed that study procedures were well-accepted, but expectations regarding participant’s adherence to the intervention were not met, in particular for exercise. Moreover, compared to controls, participants of MIG have improved their HRQOL at the end of the 6-month intervention.
Although most participants were not compliant to the protocol to the expected extent, raising questions about the feasibility of a larger trial, a few considerations are worth discussing. We arbitrarily defined good adherence as connecting to ≥75% of intervention sessions, which is a high rate, in particular for demanding activities, such as exercise (23). Indeed, the FINGER study, a previous multidomain RCT (2) with supervised exercise training, showed that less than 60% of participants performed at least half of the exercise sessions (24), which is a similar rate than ours (53.4% of MIG connected to half of expected sessions). FINGER also had part of the cognitive intervention performed at home through a web-based training system; they found that 47.2% of participants did at least half of the training sessions (24), which is much less than 75.8% observed in eMIND. In the multidomain MAPT trial our team conducted (1, 24), 53.5% of participants attended ≥75% of multidomain sessions (PA and nutritional counselling and cognitive training performed in the same sessions), which is similar to what we found for nutritional advices and cognitive training in eMIND, but not for exercise. Several methodological differences of FINGER and MAPT, compared to eMIND, must be mentioned: FINGER and MAPT were long-term (2- and 3-year, respectively) trials, with larger samples, and different modalities and frequency for providing the multidomain interventions; eMIND and MAPT have similar population, but were much older than FINGER. In the HATICE (25) web-based multidomain trial (focused on counselling and motivation through a coaching system), participants in the intervention group connected in average 1.8 times per month, during almost 18 months; HATICE’s participants were not required to connect on a regular frequency as in eMIND. The fact the study was well-accepted by participants, with clear and, most of time, useful contents is a major result. Therefore, considering the good acceptability and the acceptable adherence rates for cognitive training and nutritional advice, it is plausible to suggest the eMIND findings are promising. Although adherence to exercise was low, compliance of once/week exercise training was acceptable; this is important because doing PA once a week may already bring several health benefits during aging (26, 27).
However, weaknesses must be mentioned and corrected if a larger, well-powered trial should be developed. Indeed, the exercise program should comprise a more diversified set of exercises, with different levels of difficulty in the execution and adapted progression. For this, one possibility is to use a coaching system, like in the HATICE study (25), which would allow participants to interact with the research team through the web-platform in a shorter time interval and adapt the intervention program according to participant’s current status; this might lead to providing the most appropriate content to participants, reducing loss of motivation due to repetitive and not challenging enough exercises. Providing participants with inexpensive exercise materials (therabands, calf weights) may facilitate exercise load progression. Adapting the intervention content to avoid demotivation would also improve cognitive training. The coaching system would further help in making more interactive the nutritional advices component. If connected devices are to be used, an important workload should be foreseen to solve technical problems. Problems with the synchronization between accelerometer and tablets/smartphones in eMIND were frequent, being most of time related with participants’ low digital literacy; sometimes, problems were related to technical issues (requiring the support of IT professionals).
The absence of noticeable differences between groups for clinical outcomes was expected, since eMIND has not been powered to test the effectiveness of the intervention. Why MIG participants had a better HRQOL than CG, even if no statistically significant between-group differences were found in any of the clinical and lifestyle outcomes, deserves further investigation. Woo and colleagues (28) showed that a 24-week exercise and nutritional supplementation, compared to controls, improved self-reported health in middle-aged and young older adults. The effects of the isolated components of multidomain are better known, especially exercise, which has shown to improve HRQOL in older adults (29-33). Nutritional support was shown to have a positive effect on quality of life in older hospitalized patients (34). It is possible that alterations in patient-reported outcomes, such as HRQOL, which are subjective and then potentially more sensitive to change in a short-time interval, precede changes in more robust clinical outcomes, such as mobility. However, since the effects of eMIND intervention on HRQOL is of arguable clinical meaningfulness, it is also possible that the HRQOL improvement found would be temporary. A longer and well-powered trial would shed light on this topic.
The main strengths of our study were: the mixed method approach, composed of a RCT design and qualitative semi-structured interviews (for MIG); the use of a wearable step count tracker; and the fact this is the first web-based multidomain lifestyle-training intervention for older adults. Although other web-based multidomain trials (5) were performed, they focused on counselling and motivation or were developed in middle-aged/young older populations; other trials are on-going (35, 36), in particular the large and long-term MYB trial (36), which will provide a training platform for people 55-77 years-old, but all components of the multidomain (performed in short blocks of 10 weeks) will not be available to all participants. eMIND’s weaknesses were: the small sample; and short intervention length.
Implementing a web-based multidomain lifestyle-training platform accessible to a large number of older people could have a major positive impact in a public health perspective. This is still more relevant in periods of prolonged social isolation and containment, such as during the COVID-19 pandemics. The preliminary findings of this pilot RCT support the development of a larger, well-powered, long-term RCT to test the effectiveness of an adapted version (in particular, for exercise) of the eMIND platform on clinical outcomes.


Funding: This study is supported by the Fondation pour la Recherche Médicale (FRM DOC20161136208) in the context of the call “Evaluation of the impact of connected devices on health 2016”. The Centre Hospitalier Universitaire of Toulouse (CHU-Toulouse) is the sponsor of the study (protocol ID: 17 0071).

Role of Funding Source: None.

Conflict of interest: All authors declare no conflicts of interest related to this article.



1. Andrieu S, Guyonnet S, Coley N, Cantet C, Bonnefoy M, Bordes S, et al. Effect of long-term omega 3 polyunsaturated fatty acid supplementation with or without multidomain intervention on cognitive function in elderly adults with memory complaints (MAPT): a randomised, placebo-controlled trial. Lancet Neurol. 2017 May;16(5):377–89.
2. Ngandu T, Lehtisalo J, Solomon A, Levälahti E, Ahtiluoto S, Antikainen R, et al. A 2 year multidomain intervention of diet, exercise, cognitive training, and vascular risk monitoring versus control to prevent cognitive decline in at-risk elderly people (FINGER): a randomised controlled trial. Lancet Lond Engl. 2015 Jun 6;385(9984):2255–63.
3. Moll van Charante EP, Richard E, Eurelings LS, van Dalen J-W, Ligthart SA, van Bussel EF, et al. Effectiveness of a 6-year multidomain vascular care intervention to prevent dementia (preDIVA): a cluster-randomised controlled trial. Lancet Lond Engl. 2016 Aug 20;388(10046):797–805.
4. Fourteau P, Virecoulon Giudici K, Rolland Y, Vellas B, de Souto Barreto P. Associations between Multidomain lifestyle Interventions and Intrinsic Capacity domains during Aging: A Narrative Review. 2020;In Press.
5. Wesselman LM, Hooghiemstra AM, Schoonmade LJ, de Wit MC, van der Flier WM, Sikkes SA. Web-Based Multidomain Lifestyle Programs for Brain Health: Comprehensive Overview and Meta-Analysis. JMIR Ment Health [Internet]. 2019 Apr 9 [cited 2020 Sep 8];6(4). Available from:
6. Pothier K, Soriano G, Lussier M, Naudin A, Costa N, Guyonnet S, et al. A web-based multidomain lifestyle intervention with connected devices for older adults: research protocol of the eMIND pilot randomized controlled trial. Aging Clin Exp Res. 2018 Sep;30(9):1127–35.
7. Folstein MF, Folstein SE, McHugh PR. ‘Mini-mental state’. A practical method for grading the cognitive state of patients for the clinician. J Psychiatr Res. 1975 Nov;12(3):189–98.
8. Bherer L, Gagnon C, Langeard A, Lussier M, Desjardins-Crépeau L, Berryman N, et al. Synergistic effects of cognitive training and physical exercise on dual-task performance in older adults. J Gerontol B Psychol Sci Soc Sci. 2020 Aug 17;
9. Guigoz Y, Vellas B, Garry PJ. Assessing the nutritional status of the elderly: The Mini Nutritional Assessment as part of the geriatric evaluation. Nutr Rev. 1996 Jan;54(1 Pt 2):S59-65.
10. Donohue MC, Sperling RA, Salmon DP, Rentz DM, Raman R, Thomas RG, et al. The preclinical Alzheimer cognitive composite: measuring amyloid-related decline. JAMA Neurol. 2014 Aug;71(8):961–70.
11. Wechsler D. Wechsler adult intelligence scale—revised. New York: New York: Psychological Corp; 1981.
12. Grober E, Buschke H, Crystal H, Bang S, Dresner R. Screening for dementia by memory testing. Neurology. 1988 Jun;38(6):900–3.
13. Cardebat D, Doyon B, Puel M, Goulet P, Joanette Y. [Formal and semantic lexical evocation in normal subjects. Performance and dynamics of production as a function of sex, age and educational level]. Acta Neurol Belg. 1990;90(4):207–17.
14. Guralnik JM, Simonsick EM, Ferrucci L, Glynn RJ, Berkman LF, Blazer DG, et al. A short physical performance battery assessing lower extremity function: association with self-reported disability and prediction of mortality and nursing home admission. J Gerontol. 1994 Mar;49(2):M85-94.
15. Brink TL, Yesavage JA, Lum O, Heersema PH, Adey M, Rose TL. Screening Tests for Geriatric Depression. Clin Gerontol. 1982 Oct 14;1(1):37–43.
16. Herdman M, Gudex C, Lloyd A, Janssen MF, Kind P, Parkin D, et al. Development and preliminary testing of the new five-level version of EQ-5D (EQ-5D-5L). Qual Life Res. 2011 Dec;20(10):1727–36.
17. Andrade LF, Ludwig K, Goni JMR, Oppe M, de Pouvourville G. A French Value Set for the EQ-5D-5L. PharmacoEconomics. 2020;38(4):413–25.
18. Barreto P de S, Ferrandez A-M, Saliba-Serre B. Questionnaire d’activité physique pour les personnes âgées (QAPPA) : validation d’un nouvel instrument de mesure en langue française. Sci Sports. 2011;26(1):11–8.
19. de Souto Barreto P. Construct and convergent validity and repeatability of the Questionnaire d’Activité Physique pour les Personnes Âgées (QAPPA), a physical activity questionnaire for the elderly. Public Health. 2013 Sep;127(9):844–53.
20. Patel MS, Benjamin EJ, Volpp KG, Fox CS, Small DS, Massaro JM, et al. Effect of a Game-Based Intervention Designed to Enhance Social Incentives to Increase Physical Activity Among Families: The BE FIT Randomized Clinical Trial. JAMA Intern Med. 2017 01;177(11):1586–93.
21. Patel MS, Small DS, Harrison JD, Fortunato MP, Oon AL, Rareshide CAL, et al. Effectiveness of Behaviorally Designed Gamification Interventions With Social Incentives for Increasing Physical Activity Among Overweight and Obese Adults Across the United States: The STEP UP Randomized Clinical Trial. JAMA Intern Med. 2019 Sep 9;1–9.
22. Patel MS, Asch DA, Rosin R, Small DS, Bellamy SL, Heuer J, et al. Framing Financial Incentives to Increase Physical Activity Among Overweight and Obese Adults: A Randomized, Controlled Trial. Ann Intern Med. 2016 Mar 15;164(6):385–94.
23. de Souto Barreto P, Rolland Y, Vellas B, Maltais M. Association of Long-term Exercise Training With Risk of Falls, Fractures, Hospitalizations, and Mortality in Older Adults: A Systematic Review and Meta-analysis. JAMA Intern Med. 2019 Mar 1;179(3):394–405.
24. Coley N, Ngandu T, Lehtisalo J, Soininen H, Vellas B, Richard E, et al. Adherence to multidomain interventions for dementia prevention: Data from the FINGER and MAPT trials. Alzheimers Dement J Alzheimers Assoc. 2019;15(6):729–41.
25. Richard E, Moll van Charante EP, Hoevenaar-Blom MP, Coley N, Barbera M, van der Groep A, et al. Healthy ageing through internet counselling in the elderly (HATICE): a multinational, randomised controlled trial. Lancet Digit Health. 2019 Dec 1;1(8):e424–34.
26. de Souto Barreto P, Cesari M, Andrieu S, Vellas B, Rolland Y. Physical Activity and Incident Chronic Diseases: A Longitudinal Observational Study in 16 European Countries. Am J Prev Med. 2017 Mar;52(3):373–8.
27. de Souto Barreto P, Delrieu J, Andrieu S, Vellas B, Rolland Y. Physical Activity and Cognitive Function in Middle-Aged and Older Adults: An Analysis of 104,909 People From 20 Countries. Mayo Clin Proc. 2016 Nov;91(11):1515–24.
28. Woo J, Chan R, Ong S, Bragt M, Bos R, Parikh P, et al. Randomized Controlled Trial of Exercise and Nutrition Supplementation on Physical and Cognitive Function in Older Chinese Adults Aged 50 Years and Older. J Am Med Dir Assoc. 2020 Mar;21(3):395–403.
29. Raafs BM, Karssemeijer EGA, Van der Horst L, Aaronson JA, Olde Rikkert MGM, Kessels RPC. Physical Exercise Training Improves Quality of Life in Healthy Older Adults: A Meta-Analysis. J Aging Phys Act. 2020 01;28(1):81–93.
30. Tulloch A, Bombell H, Dean C, Tiedemann A. Yoga-based exercise improves health-related quality of life and mental well-being in older people: a systematic review of randomised controlled trials. Age Ageing. 2018 01;47(4):537–44.
31. Fukuta H, Goto T, Wakami K, Ohte N. Effects of drug and exercise intervention on functional capacity and quality of life in heart failure with preserved ejection fraction: A meta-analysis of randomized controlled trials. Eur J Prev Cardiol. 2016 Jan;23(1):78–85.
32. Zampogna B, Papalia R, Papalia GF, Campi S, Vasta S, Vorini F, et al. The Role of Physical Activity as Conservative Treatment for Hip and Knee Osteoarthritis in Older People: A Systematic Review and Meta-Analysis. J Clin Med. 2020 Apr 18;9(4).
33. Oguchi H, Tsujita M, Yazawa M, Kawaguchi T, Hoshino J, Kohzuki M, et al. The efficacy of exercise training in kidney transplant recipients: a meta-analysis and systematic review. Clin Exp Nephrol. 2019 Feb;23(2):275–84.
34. Rasmussen NML, Belqaid K, Lugnet K, Nielsen AL, Rasmussen HH, Beck AM. Effectiveness of multidisciplinary nutritional support in older hospitalised patients: A systematic review and meta-analyses. Clin Nutr ESPEN. 2018;27:44–52.
35. Belleville S, Cuesta M, Bieler-Aeschlimann M, Giacomino K, Widmer A, Mittaz Hager AG, et al. Rationale and protocol of the StayFitLonger study: a multicentre trial to measure efficacy and adherence of a home-based computerised multidomain intervention in healthy older adults. BMC Geriatr. 2020 Aug 28;20(1):315.
36. Heffernan M, Andrews G, Fiatarone Singh MA, Valenzuela M, Anstey KJ, Maeder AJ, et al. Maintain Your Brain: Protocol of a 3-Year Randomized Controlled Trial of a Personalized Multi-Modal Digital Health Intervention to Prevent Cognitive Decline Among Community Dwelling 55 to 77 Year Olds. J Alzheimers Dis JAD. 2019;70(s1):S221–37.



M. Moline1, S. Thein2, M. Bsharat1, N. Rabbee1, M. Kemethofer-Waliczky3, G. Filippov1, N. Kubota4, S. Dhadda1


1. Eisai Inc., Woodcliff Lake, NJ, USA; 2. Pacific Research Network – an ERG Portfolio Company, San Diego, CA, USA; 3. The Siesta Group, Vienna, Austria; 4. Eisai Co. Ltd., Tokyo, Japan.

Corresponding Authors: Margaret Moline, PhD, Clinical Research, Eisai, Inc., 100 Tice Boulevard, Woodcliff Lake, NJ 07677, USA, Phone: +1 (201) 949-4226, Fax: +1 (201) 949-4595, E-mail:

J Prev Alz Dis 2021;1(8):7-18
Published online December 3, 2020,



BACKGROUND: Irregular sleep-wake rhythm disorder (ISWRD) is a common sleep disorder in individuals with Alzheimer’s disease dementia (AD-D).
OBJECTIVES: This exploratory phase 2 proof-of-concept and dose-finding clinical trial evaluated the effects of lemborexant compared with placebo on circadian rhythm parameters, nighttime sleep, daytime wakefulness and other clinical measures of ISWRD in individuals with ISWRD and mild to moderate AD-D.
DESIGN: Multicenter, randomized, double-blind, placebo-controlled, parallel-group study.
SETTING: Sites in the United States, Japan and the United Kingdom.
PARTICIPANTS: Men and women 60 to 90 years of age with documentation of diagnosis with AD-D and Mini-Mental State Exam (MMSE) score 10 to 26.
INTERVENTION: Subjects were randomized to placebo or one of four lemborexant treatment arms (2.5 mg, 5 mg, 10 mg or
15 mg) once nightly at bedtime for 4 weeks.
MEASUREMENTS: An actigraph was used to collect subject rest-activity data, which were used to calculate sleep-related, wake-related and circadian rhythm–related parameters. These parameters included least active 5 hours (L5), relative amplitude of the rest-activity rhythm (RA) and mean duration of sleep bouts (MDSB) during the daytime. The MMSE and the Alzheimer’s Disease Assessment Scale-Cognitive Subscale (ADAS-Cog) were used to assess for changes in cognitive function.
RESULTS: Sixty-two subjects were randomized and provided data for circadian, daytime and nighttime parameters (placebo, n = 12; lemborexant 2.5 mg [LEM2.5], n = 12; lemborexant
5 mg [LEM5], n = 13, lemborexant 10 mg [LEM10], n = 13 and lemborexant 15 mg [LEM15], n = 12). Mean L5 showed a decrease from baseline to week 4 for LEM2.5, LEM5 and LEM15 that was significantly greater than with placebo (all p < 0.05), suggesting a reduction in restlessness. For RA, LS mean change from baseline to week 4 versus placebo indicated greater distinction between night and day with all dose levels of lemborexant, with significant improvements seen with LEM5 and LEM15 compared with placebo (both p < 0.05). The median percentage change from baseline to week 4 in MDSB during the daytime indicated a numerical decrease in duration for LEM5, LEM10 and LEM15, which was significantly different from placebo for LEM5 and LEM15 (p < 0.01 and p = 0.002, respectively).
There were no serious treatment-emergent adverse events or worsening of cognitive function, as assessed by the MMSE and ADAS-Cog. Lemborexant was well tolerated. No subjects discontinued treatment.
CONCLUSIONS: This study provides preliminary evidence of the potential utility of lemborexant as a treatment to address both nighttime and daytime symptoms in patients with ISWRD and AD-D.

Key words: Irregular sleep-wake rhythm disorder, Alzheimer’s disease, lemborexant.



Individuals with Alzheimer’s disease dementia (AD-D) commonly exhibit sleep disorders, particularly irregular sleep-wake rhythm disorder (ISWRD) (1, 2). ISWRD is a circadian rhythm sleep disorder, distinct from insomnia, which is characterized by the irregular distribution of sleep bouts across the 24-hour period rather than consolidated sleep at night (3). The most common symptoms of ISWRD are chronic sleep maintenance problems during the nighttime and a high level of daytime sleepiness (3). The pathology of ISWRD includes neuronal activity loss in the suprachiasmatic nucleus, a structure within the hypothalamus that controls circadian rhythms, and the pineal gland (3, 4).
The lack of a well-defined circadian pattern of sleep-wake behavior in patients with AD-D can present a challenge for caregivers (5). There are no pharmacologic treatments currently approved for ISWRD. The American Academy of Sleep Medicine (AASM) strongly recommends against the use of sedative-hypnotics in these patients owing to safety concerns, including increased risk of falls (6). Melatonin has not demonstrated efficacy in improving sleep in individuals with Alzheimer’s disease in clinical studies (7, 8), and the AASM does not recommend its use in elderly patients with dementia (6). Light therapy has been investigated as a potential nonpharmacologic treatment to improve sleep quality in patients with Alzheimer’s disease and related dementias (9). The AASM recommends its use versus no treatment in elderly patients with dementia (6), as some improvements in behavioral disorders have been reported (10). However, this recommendation was given a “strength value” of “Weak For,” as the quality of evidence was considered very low, as evaluated by the GRADE approach (6).
Consolidation of nighttime sleep and daytime wakefulness are the main goals of treatment for
ISWRD (3). Recent evidence suggests that a dysfunctional orexin system may play a role in the neuropathology of ISWRD (11, 12). Elevated orexin levels have been associated with both disturbed sleep and impaired cognition in patients with Alzheimer’s disease (11). Therapies targeting the orexin system, such as a dual orexin receptor antagonist (DORA), may improve sleep in individuals with Alzheimer’s disease (2, 13).
Lemborexant is a DORA that has been approved recently in the United States (14), Canada, and Japan for the treatment of insomnia in adults. In the pivotal phase 3 studies E2006-G000-304 (Study 304; SUNRISE-1; identifier NCT02783729) and E2006-G000-303 (Study 303; SUNRISE-2; identifier NCT02952820), lemborexant treatment provided significant benefit compared with placebo on polysomnogram-based and self-reported sleep onset and sleep maintenance outcomes over 1 month (Study 304), and patient-reported sleep onset and sleep maintenance outcomes over 6 months (Study 303), in subjects with insomnia disorder (15, 16). In both phase 3 clinical studies, lemborexant was well tolerated.
Here we describe results from an exploratory
phase 2 proof-of-concept and dose-finding clinical trial (E2006-G000-202 [Study 202]; identifier NCT03001557) that evaluated the effects of lemborexant compared with placebo on circadian rhythm parameters, nighttime sleep, daytime wakefulness and other clinical measures of ISWRD, in individuals with ISWRD and mild to moderate AD-D.



Study participants

This study enrolled men and women 60 to 90 years of age with documentation of diagnosis with AD-D on the basis of the National Institute on Aging/Alzheimer’s Association Diagnostic Guidelines and Mini-Mental State Exam (MMSE) (17) score 10 to 26. Subjects met criteria for circadian rhythm sleep disorder, irregular sleep-wake type (Diagnostic and Statistical Manual of Mental Disorders [5th edition]), and the International Statistical Classification of Diseases, Tenth Revision, as follows: complaint by the subject or caregiver of difficulty sleeping during the night and/or excessive daytime sleepiness associated with multiple irregular sleep bouts during a 24-hour period. Subjects also had frequency of complaint of sleep and wake fragmentation ≥ 3 days per week; duration of complaint of sleep and wake fragmentation ≥ 3 months; and mean sleep efficiency (SE) < 87.5% in the nocturnal sleep period and mean wake efficiency (WE) < 87.5% during the wake period, as measured by actigraphy during the screening period; and, as confirmed by actigraphy, a combination of sleep bouts of > 10 minutes during the wake period plus wake bouts of > 10 minutes during the sleep period, totaling at least 4 bouts per 24-hour period, ≥ 3 days per week. Subjects could also have no more than mild sleep apnea and be able to tolerate wearing an actigraphy device. Individuals with dementia other than AD-D and sleep disorders other than ISWRD were excluded. Additional details of major exclusion criteria are provided in the supplementary material.

Ethical Standards

This study received approval from the relevant Institutional Review Boards and Independent Ethics Committees and was conducted in adherence to Good Clinical Practice guidelines as required by the principles of the Declaration of Helsinki and the International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use. All protocol amendments were reviewed and approved by the institutional review board or independent ethics committee before study implementation. Details of protocol amendments are available on
Subjects or their legal representative signed the informed consent form. Caregivers signed a separate consent form. For the subject to enroll, there had to be one or more persons responsible to provide the required information for assessments, complete the sleep log for actigraphy and ensure that the subject was dosed at the appropriate time.

Study design

This multicenter, randomized, double-blind, placebo-controlled, parallel-group study was conducted at 57 sites: 47 in the United States, 9 in Japan and 1 in the United Kingdom, and started December 20, 2016. This study had three phases: the prerandomization phase, the randomization (core) phase and the extension phase (figure 1a). Here we present results from the prerandomization and randomization phases only, which completed July 26, 2018 (primary completion date); the extension phase completed on April 17, 2020.

Figure 1. (a) Study design; (b) Subject disposition. Visit 1 = Screening. Visit 2 = Caregiver visit; download actigraphy data. Visit 3 = Confirm eligibility and dispense study drug. Visit 4 = Subject and caregiver visit; download actigraphy data and perform safety assessments. Visit 5 = End-of-treatment assessments; download actigraphy data. Visit 6 = End-of-study assessments; download actigraphy data. *Sleep study: before randomization, the investigator was required to review a report detailing the potential subject’s Apnea-Hypopnea Index or equivalent. †Includes 14 subjects who were rescreened once and one subject who was rescreened twice. ‡Subjects were allowed to rescreen. Seven subjects rescreened and failed the second screening. Therefore, there were 151 individuals who were screen failures and 158 primary reasons for screen failure. BL, baseline, R, randomization, V, visit

*Including 14 subjects who rescreened once and 1 subject who rescreened twice; †Subjects were allowed to rescreen. Seven subjects rescreened and failed the second screening. Therefore, there are 151 individuals who were screen failures and 158 primary reasons for screen failure.


The prerandomization phase comprised a screening period and a baseline period. Eligible subjects were provided with an actigraph to wear continuously for at least the first 14 days of screening. During the screening period, subjects underwent a polysomnogram either at home or in the clinic to rule out moderate to severe sleep apnea (≥ 15 events per hour of sleep). Subjects who met eligibility criteria after at least 2 weeks of actigraphy could enter the randomization phase, in which they were randomized (1:1:1:1:1) to placebo or one of four lemborexant treatment arms (2.5 mg [LEM2.5], 5 mg [LEM5], 10 mg [LEM10] or 15 mg [LEM15]), stratified by country, for 4 weeks. Randomization was based on a computer-generated randomization scheme that was reviewed and approved by an independent statistician. Subjects and all personnel involved with the conduct and interpretation of the study, including investigators, site personnel and sponsor staff, were blinded to the treatment codes. Study drug was dispensed to the caregiver and was administered within 5 minutes of bedtime during the treatment period. Following the 4-week treatment period, there was a 2-week follow-up period without study medication to assess for possible rebound ISWRD symptoms and for safety. Eligible participants could enter an open-label extension phase for up to 30 months, or until program discontinuation, after the 2-week follow-up period.

Subjects were asked to wear an actigraphy device (MotionWatch8, CamNtech, Boerne, TX) continuously on their nondominant wrist for at least 14 days to qualify and for 28 days during placebo or lemborexant treatment. Subjects also wore the actigraph during the follow-up period. Actigraphy data were collected in 30-second epochs and scored centrally using a customized algorithm. The in-bed intervals and times when the actigraphs were removed from the wrists were provided to the central reader based on the sleep logs completed by the caregivers. At a minimum, participants were required to wear the actigraph for 5 complete days out of 7 days’ data. A day was considered complete as long as data from 90% of the 24-hour period was able to be scored.


This study evaluated the efficacy of lemborexant compared with placebo on changes from baseline in circadian, nighttime and daytime endpoints. Mean changes from baseline were evaluated over each week of treatment with lemborexant versus placebo for the following endpoints. All actigraphy-derived parameters were calculated based on the logged time in bed (nighttime) or logged time out of bed (daytime) as reported in the sleep log.

Circadian rhythm–related endpoints

Circadian rhythm–related endpoints included the least active 5 hours (L5), L5 start time (L5ST), most active 10 hours (M10), relative amplitude of the rest-activity rhythm (RA), interdaily stability (IS) and intradaily variability (IV). L5 was defined as the average activity across the least active 5-hour period of 24-hour sleep-wake rhythm (higher values indicate restlessness). For L5ST, the numbers represent clock times, with the two digits after the decimal point representing percentage of 60 minutes. M10 was defined as the average activity during the most active 10-hour period per 24-hour period (low levels indicating inactivity). RA was calculated as the difference between M10 and L5 divided by M10 plus L5. RA standardizes for activity-level differences across subjects and reflects strength of circadian signal; values closer to 1 represent rhythms with higher relative amplitudes. IS was derived by the ratio between the variance of the average 24-hour pattern around the mean and the overall variance, and gives an indication of the stability of the sleep-wake rhythm across days, and varies from zero (low stability) to 1 (high stability). IV was derived by the ratio of the mean squares of the difference between all successive hours (first derivative) and the mean squares around the grand mean (overall variance). IV gives an indication of ISWRD by quantifying the number and strength of transitions between rest and activity bouts, with a higher number indicating more fragmentation.

Daytime wake endpoints

Endpoints related to daytime wake included WE, wake fragmentation index (WFI) and mean number and mean duration of sleep bouts during the daytime. These endpoints were derived by actigraphy. WE was defined as wake time per daytime hours and calculated as 100% × the total duration of wake epochs during the wake period (ie, the time outside of the sleep period) divided by the duration of the daytime hours. WFI, which characterizes transitions between wake and sleep throughout the day, was calculated as the sum of an immobility index and a fragmentation index, with immobility index equal to epochs of immobility outside of the defined sleep period × 100, and fragmentation index equal to the number of ≤ 1-minute periods of mobility/total number of periods of mobility outside of the sleep period × 100. The mean number and mean duration of sleep bouts that occurred during the hours outside of the nocturnal sleep period were assessed, where a sleep bout was defined as continuous sleep of 10 minutes or longer. Lastly, total sleep time (TST) during the daytime, defined as minutes of sleep during the day, was also assessed.

Nighttime sleep endpoints

Endpoints related to nighttime sleep included actigraphy-derived SE, actigraphy-derived sleep fragmentation index (SFI) and TST during the nighttime. SE was calculated as 100% times the total duration of sleep epochs during the nocturnal sleep period. SFI was calculated as the sum of a movement index and a fragmentation index, with movement index = (epochs of wake per time in bed) × 100 and fragmentation index = (number of ≤ 1-minute periods of immobility/total number of periods of immobility of all durations during the nocturnal sleep period) × 100. This outcome measures the transitions between sleep and wake throughout the night; higher values indicate fragmented sleep. TST during the night was defined as minutes of sleep during the nighttime. The mean number and duration of wake bouts that occurred during the nocturnal sleep period, where a wake bout was defined as continuous wake of
10 minutes or longer, were also assessed.

Additional assessments

The MMSE (17) and the Alzheimer’s Disease Assessment Scale-Cognitive Subscale (ADAS-Cog) (18) were administered prior to and at the end of treatment to assess for changes in cognitive function. The Clinician’s Global Impression of Change–ISWRD version, the Neuropsychiatric Inventory (19) and the Sleep Disorders Inventory (18) were also assessed in this study, but these data will be reported separately.

Statistical analyses

The study objectives reflect the exploratory nature of this phase 2 study and were not categorized as primary or secondary, following a protocol amendment (Protocol Amendment 6; June 20, 2018).

Sample size

The sample size of this proof-of-concept study was approximately 60 subjects, reduced from approximately 125 subjects following a protocol amendment (Protocol Amendment 6; June 20, 2018). Sample size was reduced following the amending of the objectives and endpoints to reflect the exploratory nature of the proof-of-concept study. All statistical tests were based on the 5% level of significance (two-sided), unless otherwise stated. No multiplicity adjustments were made. Statistical analyses were performed using SAS version 9.4 (SAS Institute, Inc., Cary, NC). Responder analyses, network analyses and corresponding visualizations were created using the R statistical software package (20).

Populations analyzed

Efficacy analyses were performed on the Full Analysis Set (FAS) unless otherwise specified. The FAS was defined as the group of randomized subjects who received at least one dose of randomized study drug and had at least one post-dose efficacy measurement. The Safety Analysis Set (SAS) was defined as the group of randomized subjects who received at least one dose of randomized study drug and had at least one post-dose safety assessment.
Demographic and other baseline characteristics for the SAS were summarized for each treatment group using descriptive statistics. For all actigraphy parameters, baseline was defined as the average value during the designated days of screening. For L5, M10, RA, IS and IV parameters, the weekly averages were calculated by the actigraphy vendor. For these variables, the last record of the screening period was considered as the baseline (generally the average of the last 7 days) of the screening period. Efficacy evaluations in this study mainly focused on numerical changes for summary statistics and their clinical significance based on the limited number of subjects.
The change from baseline to week 4 of the following endpoints was analyzed using mixed models for repeated measures (MMRM) analysis on the FAS for lemborexant versus placebo: L5, M10, RA, IS, IV, mean WE, mean WFI, TST during the day, mean number and mean duration of sleep bouts during the daytime, mean SE, mean SFI, TST during the night, mean number and mean duration of wake bouts during the nighttime. The MMRM model included all data and was adjusted for the corresponding baseline value, country, treatment, visit (week 1, week 2, week 3 and week 4) and treatment-by-visit interaction. The MMRM model accounted for any missing data, and assumed that missing data were missing at random. An unstructured covariance matrix was used and, if the model failed to converge, an autoregressive matrix was used. Where data were normally distributed, least squares (LS) means, difference in LS means of each lemborexant dose compared with placebo,
95% confidence intervals and p values at the appropriate time point were presented.
To identify relevant efficacy variables, a Gaussian graphical model was developed post hoc using the R statistical software package. Regularization method was applied to infer a sparse network topology of interconnectedness among the efficacy variables.
Mean change from baseline in L5, average L5ST, mean duration of sleep bouts and average number of wake bouts were analyzed post hoc for LEM5 versus placebo at week 1, week 2, week 3 and week 4 using an MMRM model, adjusted for region and baseline value of the variable. Mean and standard error were plotted from the model at each time point to represent any longitudinal trends graphically.
The mean duration of sleep bouts during the daytime, one of the network analysis–identified variables, was analyzed separately post hoc to determine the treatment effect. Boxplots were produced, and percentage change from baseline at week 4 was compared for each dose group versus placebo. The Wilcoxon test was performed to compare pairwise means of each treatment dose with placebo.
Changes from baseline in the MMSE and ADAS-Cog were analyzed using analysis of covariance, adjusted for baseline value and country.

Responder analyses

Responder analyses were also conducted, in which responders were defined separately as:
• Subjects whose mean activity level dropped from baseline at week 4 during L5 (sleep) and whose mean duration of sleep bouts during the wake period decreased from baseline at week 4. A nominal threshold of 5% (rather than 0) was applied for the definition.
• Subjects whose mean duration of sleep bouts during the wake period decreased from baseline at week 4, whose mean RA of sleep-wake cycle improved from baseline at week 4 and whose mean IS of sleep-wake cycle improved.

In responder analyses, the percentage change from baseline at week 4 was used as the metric for change for each variable.


All subjects underwent routine safety assessments at specified visits, including questioning regarding treatment-emergent adverse events (TEAEs) and serious adverse events (SAEs); suicidality (assessed using an electronic version of the Columbia–Suicide Severity Rating Scale) (21); electrocardiograms; vital signs, weight; hematology and blood chemistry analysis; and urinalysis.



In total, 214 subjects were screened, 63 were randomized and 62 completed the randomization phase of this study and comprised the FAS and SAS (figure 1b). Fifty subjects randomized to lemborexant (12, 13, 13 and 12 subjects in the LEM2.5, LEM5, LEM10 and LEM15 groups, respectively) and 12 subjects randomized to placebo received at least one dose of study drug. All 62 subjects received study drug for the entire treatment period. Treatment groups were generally balanced with respect to most demographic variables across the five groups; however, the number of males versus females was not fully balanced across all groups (table 1). Baseline actigraphy characteristics were consistent with the presence of ISWRD (table 2). Mean baseline MMSE score was comparable across the five treatment groups and indicated mild to moderate Alzheimer’s disease (supplementary table S1).

Table 1. Baseline demographics

BMI, body mass index; SD, standard deviation.

Table 2. Summary of change from baseline to week 4 for circadian rhythm–related, daytime and nighttime outcomes

*Based on a mixed model for repeated measure analysis adjusted for baseline value, country, visit and treatment-by-visit interaction. †Numbers represent clock times, with the two digits after the decimal point representing percentage of 60 minutes. ‡Sleep fragmentation index was calculated based on the logged time in bed. CI, confidence interval; L5, least active 5-hour period per 24-hour period; LS, least squares; PBO, placebo; SD, standard deviation.


Efficacy outcomes

Network analysis of efficacy variables

As efficacy variables are interrelated, an advanced network analysis was performed to elucidate the relational structure of circadian rhythm variables and treatment (supplementary figure S1). The main efficacy variables identified from the network analysis were the mean duration of sleep bouts during the daytime, activity level during L5, start time of the L5 period and number of wake bouts at night.

Circadian rhythm–related outcomes

At week 4, mean L5 showed a significantly greater decrease from baseline versus placebo for LEM2.5, LEM5 and LEM15 (table 2), indicating a quieter and more restful nighttime sleep. When examined longitudinally over 4 weeks for the LEM5 dose, consistent improvements (decreases) from baseline in L5 were observed after each week of treatment (figure 2a).
Mean baseline L5ST ranged from 24.08 to 25.24 hours (corresponding to ~12:05 am to ~1:15 am) across all groups, meaning that L5 was occurring during the nighttime (table 2). Numerical LS mean decreases from baseline in L5ST were observed at week 4 with LEM5 and LEM15, which were not significantly different from placebo (table 2). Over the 4 weeks of treatment with LEM5, there was no consistent change in L5ST, suggesting no phase shift in the timing of the L5 of the circadian sleep-wake rhythm (figure 2b).
Only LEM5 demonstrated a numerical improvement versus placebo in the LS mean change from baseline in M10, but this treatment difference was not statistically significant (table 2). LS mean treatment difference in change from baseline indicated higher RAs with all dose levels of lemborexant compared with placebo, with significant increases seen with LEM5 and LEM15 (table 2). LEM5 demonstrated improvements in IS and IV versus placebo at week 4, but these improvements did not reach statistical significance compared with placebo (table 2).

Figure 2. Longitudinal plots of mean change from baseline in circadian, daytime and nighttime efficacy variables over 4 weeks of treatment for LEM5 versus placebo analyzed by mixed effects repeated measures analysis. (a) L5; (b) L5ST; (c) MDSB during the daytime; (d) WB during the night. Error bars represent SE. Mean and SEs were plotted from mixed models for repeated measures analyses. L5, mean least active 5-hour period per 24-hour period; L5ST, mean start hour of L5 (HH); LEM5, lemborexant 5 mg; MDSB, mean duration of sleep bouts (minutes); SE, standard error; WB, mean number of wake bouts


Daytime endpoints

Of the LEM doses, only LEM5 demonstrated a numerical increase from baseline in LS mean WE during the daytime, a numerical reduction from baseline in LS mean WFI (lower values indicate more consolidated wake during the daytime) and a numerical reduction from baseline in LS mean TST during the daytime at week 4; though these changes were not significantly different from placebo (table 2).
In the longitudinal analysis, greater numerical decreases from baseline in mean duration of sleep bouts during the daytime were observed in the LEM5 group compared with placebo across each study week (figure 2c), but the week 4 analysis showed no statistically significant treatment difference versus placebo (table 2).
Median percentage change from baseline to week 4 in mean duration of sleep bouts during the daytime indicated a decrease in duration with LEM5, LEM10 and LEM15 (supplementary figure S2). The greatest decreases occurred in the LEM5 and LEM15 treatment groups, and these changes were statistically significantly different versus placebo (p < 0.01 and p = 0.002, respectively).

Nighttime endpoints

LS mean changes from baseline to week 4 in nighttime endpoints indicated numerical increases in SE for LEM2.5 and LEM5, numerical improvements in SFI with LEM2.5, LEM5 and LEM15, indicating more consolidated (ie, less fragmented) sleep, and numerical improvements in mean TST during the night with LEM5 and LEM15; none of these changes were statistically significantly different versus placebo (table 2). Decreases from baseline to
week 4 in LS mean number of wake bouts during the night were observed in the LEM2.5 and LEM5 groups which were significantly greater than with placebo (table 2). The LS mean duration of wake bouts during the night increased for the LEM2.5, LEM5 and LEM15 groups, but these differences were not statistically significant compared with placebo.
When analyzed by treatment week, consistent decreases (improvements) from baseline in the mean number of wake bouts were observed at each time point for the LEM5 group, whereas increases from baseline were observed in the placebo group at Weeks 1, 2 and 4 (figure 2d).

Responder analyses

After 4 weeks, a greater percentage of subjects in each lemborexant treatment group met post hoc responder criteria, defined as > 5% decreases from baseline in both L5 and mean duration of sleep bouts during the daytime, compared with placebo (supplementary figure S3a). Additionally, after 4 weeks, a greater percentage of subjects in each lemborexant treatment group, versus placebo, met the more restrictive post hoc responder criteria, defined as changes from baseline at 4 weeks of > 0% for mean RA and IS, and < 0% for mean duration of sleep bouts during wake (supplementary figure S3b).

Cognitive assessments and safety outcomes

In this study, no significant worsening of cognition, as assessed by MMSE and ADAS-Cog, was observed by the end of the treatment period (supplementary table S1). The incidence of TEAEs was slightly higher for the highest dose of LEM15 (50.0%) compared with placebo (33.3%), and similar to placebo in the other lemborexant groups (23.1-30.8%) (table 3). Across the treatment groups, four subjects reported TEAEs of moderate severity; one subject in the LEM15 group reported somnolence of moderate severity. One severe TEAE, arthralgia, was reported by one subject in the LEM15 group. There were no deaths, no treatment-emergent SAEs and no TEAEs leading to study drug discontinuation reported (table 3). The most common TEAEs (reported in two or more subjects in any lemborexant group) were constipation, somnolence, arthralgia, headache and nightmare, and those events were not reported for placebo, LEM2.5 or LEM5. No falls or confusion were observed and no suicidality was reported in any lemborexant-treated subjects.

Table 3. Summary of treatment-emergent adverse events*†

*A TEAE is defined as an AE with onset date on or after the first dose of study drug up to 14 days after the last dose of study drug. †For each row category, a subject with two or more TEAEs in that category is counted only once. ‡If a subject had a single incident of an AE (Preferred Term) with a missing severity, the subject was counted in the ‘Missing’ category for that Preferred Term. If a subject had two or more AEs in the same system organ class (or with the same Preferred Term) with different severities, then the event with the maximum severity was used for that subject. Subjects with missing AE severity are counted under the ‘Missing’ category unless the subject already has another AE with severe intensity, in which case the subject is counted in the ‘Severe’ category. AE, adverse event; PBO, placebo; TEAE, treatment-emergent adverse event.



This exploratory randomized clinical study is the first to investigate the use of a drug affecting orexin neurotransmission in a patient population with ISWRD. Treatment with lemborexant improved 24-hour circadian rhythm variables, as demonstrated by increased RA, and helped to consolidate nighttime sleep by decreasing L5. Subjects were able to have longer, more restful and less fragmented sleep, a key goal in the treatment of ISWRD (3). Lemborexant exhibited treatment benefit, as detected by the interconnected efficacy variables in ISWRD patients on their circadian rhythm. Results of this study provide preliminary evidence that treatment with lemborexant may improve both 24-hour circadian rhythm variables and nocturnal sleep variables and impact the duration of daytime unplanned naps in subjects with ISWRD and AD-D. Additionally, these results suggest that proof-of-concept was established, as objective endpoints were identified that both characterized ISWRD in this patient population and were clinically relevant.
LEM5 appeared to be the most consistently effective dose in improving circadian rhythm–related, wake-related and sleep-rated actigraphy variables in this study. LEM5 demonstrated significant treatment differences versus placebo at week 4 in improving L5, RA and mean number of wake bouts during the night. LEM5 also resulted in less daytime sleep, as demonstrated by the greater numerical decreases from baseline in mean duration of sleep bouts during the day compared with placebo during each study week. Importantly, numerically higher RAs in circadian sleep-wake rhythms (ie, more distinction between night and day) were seen with all lemborexant dose levels.
Lemborexant was generally well tolerated in this population of individuals with Alzheimer’s disease and ISWRD. The rate of TEAEs was low, no treatment-emergent SAEs were reported and no new safety concerns were identified in this study. The safety profile in this study population was consistent with that observed in adult subjects with insomnia (15, 16). Additionally, treatment with lemborexant did not worsen the cognitive functions of this population of subjects with ISWRD and AD-D.
Dysregulation of the sleep-wake cycle is a common problem in patients with Alzheimer’s disease (22). One potential consequence for patients with Alzheimer’s disease who suffer from sleep disorders is an increased likelihood of institutionalization (23). However, at this time, the lack of approved pharmacologic treatments for patients with ISWRD and AD-D represents an unmet medical need. Some evidence is available to support the use of nonpharmacologic interventions, such as light therapy, behavioral techniques and increased social and physical activity during the daytime, to improve sleep in patients with Alzheimer’s disease (9, 10, 24). Both the American Geriatric Society and the AASM discourage the use of benzodiazepines in older adults (6, 25), as this drug class has been shown to be significantly associated with falls in the elderly population (26).
DORAs, which block the orexin system, may have the potential to improve sleep in patients with AD-D. Data regarding the treatment of insomnia (not ISWRD) in patients with mild to moderate Alzheimer’s disease have recently been added to the prescribing information for the DORA suvorexant (27).
Strengths of this study include the use of actigraphy, which can capture the full 24-hour sleep-wake pattern in treatment trials and has been a common method for assessing sleep in individuals with Alzheimer’s disease (28). Study limitations include the small sample size, which was, in part, due to slow recruitment. Additionally, the study duration was only 1 month.
These results provide important new information regarding the potential utility of lemborexant to address both nighttime and daytime symptoms that affect sleep-related quality of life of patients with ISWRD and AD-D, as well as reduce the burden of patients’ sleep disturbances on their caregivers and families. Further evaluation in future clinical trials is warranted to confirm the value of lemborexant in this patient population.


Funding: This study was sponsored by Eisai Inc. The sponsor participated in the design and conduct of the study; the collection, analysis and interpretation of data; and the preparation, review and approval of the manuscript.

Acknowledgement: Medical writing assistance was provided by Rebecca Jarvis, PhD, of ProScribe – part of the Envision Pharma Group and was funded by Eisai Inc. Envision Pharma Group’s services complied with international guidelines for Good Publication Practice (GPP3).

Declaration of conflicting interests: Drs Moline, Rabbee, Filippov and Dhadda are employees of Eisai Inc. Dr Bsharat is formerly an employee of Eisai Inc. Mr Kubota is an employee of Eisai Co. Ltd. Dr Thein is the director and founder of Pacific Research Network, which received funding from the study sponsor, Eisai Inc., for the conduct of this study. Mr Kemethofer is an employee of The Siesta Group, the central actigraphy scoring vendor.

Open Access: This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits use, duplication, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.



1. Brzecka A, Leszek J, Ashraf GM, et al. Sleep disorders associated with Alzheimer’s disease: a perspective. Front Neurosci 2018;12:330.
2. Musiek ES, Xiong DD, Holtzman DM. Sleep, circadian rhythms, and the pathogenesis of Alzheimer disease. Exp Mol Med 2015;47:e148.
3. Zee PC, Vitiello MV. Circadian rhythm sleep disorder: irregular sleep wake rhythm type. Sleep Med Clin 2009;4:213-218.
4. Zhu L, Zee PC. Circadian rhythm sleep disorders. Neurol Clin 2012;30:1167-1191.
5. Gehrman P, Gooneratne NS, Brewster GS, Richards KC, Karlawish J. Impact of Alzheimer disease patients’ sleep disturbances on their caregivers. Geriatr Nurs 2018;39:60-65.
6. Auger RR, Burgess HJ, Emens JS, Deriy LV, Thomas SM, Sharkey KM. Clinical practice guideline for the treatment of intrinsic circadian rhythm sleep-wake disorders: advanced sleep-wake phase disorder (ASWPD), delayed sleep-wake phase disorder (DSWPD), non-24-hour sleep-wake rhythm disorder (N24SWD), and irregular sleep-wake rhythm disorder (ISWRD). An update for 2015: an American Academy of Sleep Medicine clinical practice guideline. J Clin Sleep Med 2015;11:1199-1236.
7. Serfaty M, Kennell-Webb S, Warner J, Blizard R, Raven P. Double blind randomised placebo controlled trial of low dose melatonin for sleep disorders in dementia. Int J Geriatr Psychiatry 2002;17:1120-1127.
8. Singer C, Tractenberg RE, Kaye J, et al. A multicenter, placebo-controlled trial of melatonin for sleep disturbance in Alzheimer’s disease. Sleep 2003;26:893-901.
9. Figueiro MG, Plitnick BA, Lok A, et al. Tailored lighting intervention improves measures of sleep, depression, and agitation in persons with Alzheimer’s disease and related dementia living in long-term care facilities. Clin Interv Aging 2014;9:1527-1537.
10. Mishima K, Okawa M, Hishikawa Y, Hozumi S, Hori H, Takahashi K. Morning bright light therapy for sleep and behavior disorders in elderly patients with dementia. Acta Psychiatr Scand 1994;89:1-7.
11. Liguori C, Nuccetelli M, Izzi F, et al. Rapid eye movement sleep disruption and sleep fragmentation are associated with increased orexin-A cerebrospinal-fluid levels in mild cognitive impairment due to Alzheimer’s disease. Neurobiol Aging 2016;40:120-126.
12. Liguori C, Romigi A, Nuccetelli M, et al. Orexinergic system dysregulation, sleep impairment, and cognitive decline in Alzheimer disease. JAMA Neurol 2014;71:1498-1505.
13. Duncan MJ, Farlow H, Tirumalaraju C, et al. Effects of the dual orexin receptor antagonist DORA-22 on sleep in 5XFAD mice. Alzheimers Dement 2019;5:70-80.
14. Dayvigo (lemborexant) tablets [package insert]. Woodcliff Lake, NJ: Eisai Inc.; 2019.
15. Rosenberg R, Murphy P, Zammit G, et al. Comparison of lemborexant with placebo and zolpidem tartrate extended release for the treatment of older adults with insomnia disorder: a phase 3 randomized clinical trial. JAMA Netw Open 2019;2:e1918254.
16. Karppa M, Yardley J, Pinner K, et al. Long-term efficacy and tolerability of lemborexant compared with placebo in adults with insomnia disorder: results from the phase 3 randomized clinical trial SUNRISE 2. Sleep 2020.
17. Folstein MF, Folstein SE, McHugh PR. « Mini-mental state ». A practical method for grading the cognitive state of patients for the clinician. J Psychiatr Res 1975;12:189-198.
18. Rosen WG, Mohs RC, Davis KL. A new rating scale for Alzheimer’s disease. Am J Psychiatry 1984;141:1356-1364.
19. Trzepacz PT, Saykin A, Yu P, et al. Subscale validation of the neuropsychiatric inventory questionnaire: comparison of Alzheimer’s disease neuroimaging initiative and national Alzheimer’s coordinating center cohorts. Am J Geriatr Psychiatry 2013;21:607-622.
20. R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing. 2013;
21. Posner K, Brown GK, Stanley B, et al. The Columbia-Suicide Severity Rating Scale: initial validity and internal consistency findings from three multisite studies with adolescents and adults. Am J Psychiatry 2011;168:1266-1277.
22. Liguori C, Spanetta M, Izzi F, et al. Sleep-wake cycle in Alzheimer’s disease is associated with tau pathology and orexin dysregulation. J Alzheimers Dis 2020;74:501-508.
23. Pollak CP, Perlick D. Sleep problems and institutionalization of the elderly. J Geriatr Psychiatry Neurol 1991;4:204-210.
24. McCurry SM, Gibbons LE, Logsdon RG, Vitiello MV, Teri L. Nighttime insomnia treatment and education for Alzheimer’s disease: a randomized, controlled trial. J Am Geriatr Soc 2005;53:793-802.
25. American Geriatrics Society 2019 Beers Criteria Update Expert Panel. American Geriatrics Society 2019 Updated AGS Beers Criteria(R) for Potentially Inappropriate Medication Use in Older Adults. J Am Geriatr Soc 2019;67:674-694.
26. Woolcott JC, Richardson KJ, Wiens MO, et al. Meta-analysis of the impact of 9 medication classes on falls in elderly persons. Arch Intern Med 2009;169:1952-1960.
27. Belsomra (suvorexant) tablets [package insert]. Whitehouse Station, NJ: Merck & Co., Inc.; 2020.
28. Camargos EF, Louzada FM, Nóbrega OT. Wrist actigraphy for measuring sleep in intervention studies with Alzheimer’s disease patients: application, usefulness, and challenges. Sleep Med Rev 2013;17:475-488.



G. Klein1, P. Delmar1, G.A. Kerchner1, C. Hofmann1, D. Abi-Saab1, A. Davis2, N. Voyle2, M. Baudler1,3, P. Fontoura1, R. Doody1,3


1. F. Hoffmann-La Roche Ltd, Basel, Switzerland; 2. Roche Products Ltd, Welwyn Garden City, UK; 3. Genentech Inc., South San Francisco, CA, USA.

Corresponding Authors: Gregory Klein, Biomarkers and Translational Technology, Neuroscience and Rare Diseases, Basel, Switzerland. Email:, Phone: (+41) 616820759

J Prev Alz Dis 2021;1(8):3-6
Published online December 3, 2020,



Previous findings from the positron emission tomography (PET) substudy of the SCarlet RoAD and Marguerite RoAD open-label extension (OLE) showed gantenerumab doses up to 1200 mg every 4 weeks administered subcutaneously resulted in robust beta-amyloid (Aβ) plaque removal over 24 months in people with prodromal-to-moderate Alzheimer’s disease (AD). In this 36-month update, we demonstrate continued reduction, with mean (standard error) centiloid values at 36 months of -4.3 (7.5), 0.8 (6.7), and 4.7 (8.0) in the SCarlet RoAD (double-blind pooled placebo and active groups), Marguerite RoAD double-blind placebo, and Marguerite RoAD double-blind active groups respectively, representing a change of -57.0 (10.3), -90.3 (9.0), and -74.9 (10.5) centiloids respectively. These results demonstrate that prolonged gantenerumab treatment, at doses up to 1200 mg, reduces amyloid plaque levels below the amyloid positivity threshold. The ongoing GRADUATE Phase III trials will evaluate potential clinical benefits associated with gantenerumab-induced amyloid-lowering in people with early (prodromal-to-mild) AD.

Key words: Gantenerumab, Alzheimer’s disease, positron emission tomography, amyloid.




Alzheimer’s disease (AD) accounts for 60–80% of all cases of dementia globally (1). Currently, the available treatments for AD offer only limited benefits and there is an urgent need for disease-modifying therapies that reverse neuropathologic changes, or slow or stop neurodegeneration (1, 2).
AD pathogenesis is driven by the gradual accumulation of beta-amyloid (Aβ) plaques and neurofibrillary tangles (NFTs) in the brain (1, 3). In vitro and in vivo evidence suggests that soluble Aβ oligomers and insoluble Aβ plaques contribute to cognitive failure by causing neuronal loss, synaptic dysfunction and disconnection syndromes (4, 5). The recognition of Aβ accumulation as the earliest identifiable marker of AD has led to the development of amyloid positron emission tomography (PET), a neuroimaging technique that can be utilized to visualize Aβ accumulation that helps improve diagnostic accuracy and may also facilitate appropriate participant selection in clinical trials (6).
Gantenerumab is a fully humanized, anti-Aβ immunoglobulin (Ig) G1 that binds to Aβ species with high affinity for aggregated forms, including oligomers and plaques, and is thought to remove Aβ via microglia-mediated phagocytosis (7, 8). The long-term, pharmacodynamic effect of gantenerumab-induced Aβ plaque removal in participants with prodromal-to-mild and mild-to-moderate AD is currently being investigated in the PET substudies of the Phase III SCarlet RoAD (SR; NCT01224106) and Marguerite RoAD (MR; NCT02051608) open-label extension (OLE) studies, respectively (7). Interim results showed robust Aβ plaque removal with gantenerumab doses up to
1200 mg administered subcutaneously, with mean amyloid reductions of 59 centiloids and 51% of participants below the Aβ positivity threshold after 24 months (7). Here, we tested whether amyloid signal plateaus or continues to decline with continued therapy in the 36-month results of the ongoing OLE PET substudy.



Participants and study design

Complete details of the study designs and methodologies of SR and MR, the associated OLE studies, and the OLE PET substudies have been previously reported (7-9). Briefly, participants in the SR trial who received double-blind treatment and had ≥1 follow-up visit and those who were currently enrolled in the MR trial were eligible for participation in the OLE. Various titration schemes were used to allow OLE participants to gradually reach the target dose of gantenerumab 1200 mg per month while decreasing the risk of amyloid-related imaging abnormality (ARIA)-related adverse events. The target gantenerumab dose was reached within 6 to 10 months for SR OLE participants, and 2 to 6 months in MR OLE participants.
Participants of the OLE substudy were divided into three cohorts based on their prior exposure to gantenerumab and their stage of AD. The SR cohort included SR participants with all SR treatment arms pooled together (received gantenerumab 105 mg or 225 mg or placebo every 4 weeks during the double-blind phase), all participants in the SR cohort were off treatment for 16 to 19 months prior to OLE higher dosing. The MR double-blind placebo cohort (MR-DBP) included participants in the MR trial who received placebo during the double-blind phase and the MR double-blind active cohort (MR-DBA) included participants of the MR trial who received either 105 or 225 mg gantenerumab during the double-blind phase.

Amyloid-β plaque PET imaging and quantification

Amyloid PET scans were obtained at baseline and at 12, 24, and 36 months after baseline using intravenous
370 MBq 18F-florbetapir, with each 15-minute scan obtained 50 minutes after 18F-florbetapir injection. Participants who received a PET scan during the double-blind phase, within 9 to 12 months of OLE dosing, were not scanned at OLE baseline to minimize participant burden.
Volume-weighted, gray matter-masked standard uptake value ratios (SUVR) were calculated for six bilateral cortical regions using the Automated Anatomical Labeling (AAL) template, normalized by a cerebellar cortex reference region (10, 11). SUVR values were then converted to centiloid values as previously described, using the following linear transformation: Centiloid = SUVR*184.12 – 233.72 (7, 12). The threshold for amyloid positivity has been previously established as 24 centiloids, which corresponds to 1.40 SUVR units. The amyloid positivity threshold represents the quantitative threshold that best discriminates pathologically verified absence of plaques or sparse plaques from moderate-to-frequent plaques (13).

Statistical analysis

This analysis included all study participants who had a PET scan at OLE baseline (or 9–12 months prior to OLE dosing) and received ≥1 follow-up scan. PET centiloid values were analyzed using a mixed model for repeated measures (MMRM), with treatment visit, treatment group, and the interaction for treatment group by visit as independent variables. An unstructured covariance matrix was used to capture within-participant correlation.



Participant characteristics

A total of 67 participants with at least 1 post-baseline scan were enrolled in the OLE PET substudy (SR, n = 19; MR-DBP, n = 27; MR-DBA, n = 21). A total of 30 participants completed the 36-month scan (SR, n = 10; MR-DBP, n = 12; MR-DBA, n = 8). The baseline characteristics for both the overall population and the 36-month completers are shown in Table 1. More than half of the participants in each cohort were Apolipoprotein E (APOE) ε4 carriers (SR, 89%; MR-DBP, 67%; MR-DBA, 52%). Across all three cohorts, the mean [SE] baseline amyloid burden in centiloids was above the positivity threshold (SR, 49.6 [12.1]; MR-DBP, 91.1 [9.6]; MR-DBA, 79.6 [10.9]).

Table 1. Baseline characteristics of participants enrolled in the SR, MR-DBP, and MR-DBA cohorts, including 36-month completers

APOEε4, Apolipoprotein E; IQR, Interquartile range; MMSE, Mini-Mental State Examination; MR-DBA, Marguerite RoAD double-blind active; MR-DBP, Marguerite RoAD double-blind placebo; OLE, open-label extension; SE, standard error; SD, standard deviation; SR, SCarlet RoAD.


Amyloid PET results

Consistent with our previous report, reductions in mean amyloid burden were observed across cohorts after 12 and 24 months of open-label therapy, with 37% and 52% of participants, respectively, reaching levels below the amyloid positivity threshold (Figure 1) (7). Continued reductions beyond 24 months were observed after 36 months, with mean amyloid levels approaching zero centiloids across all cohorts. The absolute mean (SE) amyloid burden after 36 months were -4.3 (7.5), 0.8 (6.7), and 4.7 (8.0) centiloids for the SR, MR-DBP and MR-DBA cohorts, respectively, representing a change of -57.0 (10.3), -90.3 (9.0), -74.9 (10.5) centiloids respectively. Furthermore, the proportion of participants below the amyloid positivity threshold was 24 of 30 participants (80%) at 36 months (Figure 1).

Figure 1. Reduction of amyloid burden towards zero centiloids after 36 months of open-label therapy

*LS mean (SE); †Analyzed using an MMRM; LS, least-squares; MMRM, mixed model for repeated measures; MR-DBA, Marguerite RoAD double-blind active; MR-DBP, Marguerite RoAD double-blind placebo; SE, standard error; SR, SCarlet RoAD; SUVR, standard uptake value ratio.



This 36-month OLE PET substudy investigated the effect of gantenerumab on Aβ plaque removal on participants with prodromal-to-moderate AD. Prior results have shown that while the three cohorts began with considerably different mean baseline centiloid values, all three cohorts demonstrated a mean centiloid value just below the amyloid positivity threshold after
24 months of treatment with gantenerumab 1200 mg every 4 weeks. The latest results showed continued Aβ reduction with gantenerumab treatment below the amyloid positivity threshold, without plateau, with 80% of completers below the amyloid positivity threshold after 36 months of open-label therapy. Mean centiloid values of all three cohorts at this time are near a value of zero, which represents the mean amyloid burden expected in a healthy control group (12). Given that the SR and MR-DBA groups may have experienced some amyloid reduction due to low-dose gantenerumab treatment during the double-blind period of the SR and MR studies, the 90-centiloid reduction seen in the MR-DBP group represents the amyloid reduction that could be expected in a treatment-naïve population. The consistent reduction in Aβ suggests that gantenerumab is able to remove Aβ species successfully.
These findings may translate to clinical benefit in people with prodromal-to-mild AD as other studies with aducanumab and lecanemab (BAN2401) have observed amyloid PET reduction as well as clinical efficacy (7, 14, 15). Specifically, in a Phase Ib placebo-controlled study, aducanumab demonstrated reduced brain amyloid plaque levels after 24 months with a reduction in clinical decline as measured by the Clinical Dementia Rating–Sum of Boxes (CDR-SB) and Mini-Mental State Examination (MMSE) (14). In a Phase II placebo-controlled study, lecanemab produced a dose-dependent reduction in amyloid plaque levels after
18 months and a reduction in clinical decline as measured by AD Composite Score (15). In light of these studies, the current PET results suggest that the process of Aβ reduction at the gantenerumab dose of 1200 mg every
4 weeks has the potential to produce clinical benefits. The precise relation between amyloid reduction and clinical benefit is still an open question, including the question of whether a reduction to below amyloid positivity or to centiloid zero makes a difference in the clinical outcome and management of patients with early AD. The ongoing GRADUATE Phase III program evaluates the safety and efficacy of gantenerumab, subcutaneously administered, in participants with early AD. This program includes two global, double-blind, placebo-controlled trials in people with early AD, designed to maximize exposure to gantenerumab and to prospectively examine the correlation between amyloid-lowering and clinical outcomes.


Funding: This study was sponsored by F. Hoffmann-La Roche Ltd, Basel, Switzerland.

Acknowledgments: We would like to thank all the participants and their families, the investigators and site staff, and the entire study team for their time and commitment to the SCarlet RoAD and Marguerite RoAD OLE studies. Medical writing support was provided by Joshua Quartey, BSc, of Health Interactions and was funded by F. Hoffmann-La Roche Ltd.

Conflict of interest disclosures: GK, PD, GAK, CH, DA-S and PF were full-time employees of F. Hoffmann-La Roche Ltd during the conduct of the study. GK, PD, GAK, CH, DA-S, NV and PF are shareholders in F. Hoffmann-La Roche Ltd. AD and NV were full-time employees of Roche Products Ltd during the conduct of the study. AD is currently employed at the MRC Clinical Trials Unit at UCL. MB and RD are full-time employees and shareholders in F. Hoffmann-La Roche Ltd and Genentech Inc. CH has an Alzheimer’s disease-related patent planned which is relevant to this study.

Ethical standards: Institutional Review Boards (IRBs) approved the SCarlet RoAD and Marguertie RoAD studies, and all participants gave informed consent before participating.

Data sharing statement: Qualified researchers may request access to individual patient-level data through the clinical study data request platform: https://vivli. org. Further details on Roche’s criteria for eligible studies are available here: For further details on Roche’s Global Policy on the Sharing of Clinical Information and how to request access to related clinical study documents, see here: development/who_we_are_how_we_work/clinical_trials/our_commitment_to_ data_sharing.htm

Open Access: This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits use, duplication, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.


1. Alzheimer’s Association. 2019 Alzheimer’s disease facts and figures. Alzheimers Dement 2019; 15:321-387.
2. Wang S, Mims PN, Roman RJ, et al. Is beta-amyloid accumulation a cause or consequence of Alzheimer’s disease? J Alzheimers Parkinsonism Dement 2016; 1:007.
3. Lee JC, Kim SJ, Hong S, et al. Diagnosis of Alzheimer’s disease utilizing amyloid and tau as fluid biomarkers. Exp Mol Med 2019; 51:1-10.
4. Mucke L & Selkoe D. Neurotoxicity of amyloid β-protein: Synaptic and network dysfunction. Cold Spring Harb Perspect Med 2012; 2:a006338.
5. Delbeuck X, Van der Linden M, Collette F. Alzheimer’s disease as a disconnection syndrome? Neuropsychol Rev 2003; 13:79-92.
6. Ishii K. Amyloid positron emission tomography in the therapeutic strategies for Alzheimer’s disease. Brain Nerve 2017; 69:809-818.
7. Klein G, Delmar P, Voyle N, et al. Gantenerumab reduces amyloid-β plaques in patients with prodromal to moderate Alzheimer’s disease: A PET substudy interim analysis. Alzheimers Res Ther 2019; 11:101.
8. Ostrowitzki S, Lasser RA, Dorflinger E, et al. A phase III randomized trial of gantenerumab in prodromal Alzheimer’s disease. Alzheimers Res Ther 2017; 9:95.
9. Abi-Saab D, Andjelkovic M, Pross N, et al. MRI findings in the open label extension of the Marguerite RoAD study in patients with mild Alzheimer’s disease. J Prev Alz Dis 2017; 4:339 (P336).
10. Fleisher AS, Chen K, Liu X, et al. Using positron emission tomography and florbetapir F18 to image cortical amyloid in patients with mild cognitive impairment or dementia due to Alzheimer disease. Arch Neurol 2011; 68:1404-1411.
11. Barthel H, Gertz HJ, Dresel S, et al. Cerebral amyloid-β PET with florbetaben (18F) in patients with Alzheimer’s disease and healthy controls: A multicentre phase 2 diagnostic study. Lancet Neurol 2011; 10:424-435.
12. Klunk WE, Koeppe RA, Price JC, et al. The Centiloid Project: Standardizing quantitative amyloid plaque estimation by PET. Alzheimers Dement 2015; 11:1-15.e11-14.
13. Navitsky M, Joshi AD, Kennedy I, et al. Standardization of amyloid quantitation with florbetapir standardized uptake value ratios to the Centiloid scale. Alzheimers Dement 2018; 14:1565-1571.
14. von Rosenstiel P, Gheuens S, Chen T, et al. Aducanumab titration dosing regimen: 24-month analysis from PRIME, a randomized, double-blind, placebo-controlled Phase 1b study in patients with prodromal or mild Alzheimer’s disease. Neurology 2018; 90:Abstract S2.003.
15. Swanson C, Zhang Y, Dhadda S, et al. Treatment of early AD subjects with BAN2401, an anti-Aβ protofibril monoclonal antibody, significantly clears amyloid plaque and reduces clinical decline. Alzheimers Dement 2018; 14 (Suppl):P1668. DT-1601-1607.


J. Wang1, Q. Wei1, X. Wan2

1. School of Management, Wuhan Institute of Technology, Wuhan, China; 2. Institute of Income Distribution and Public Finance, School of Taxation and Public Finance, Zhongnan University of Economics and Law, Wuhan, China

Corresponding Authors: Xin Wan, Associate Professor, Institute of Income Distribution and Public Finance, School of Public Finance and Taxation, Zhongnan University of Economics and Law, Wuhan, China, 430073, Email:

J Prev Alz Dis 2020;
Published online November 28, 2020,



Objective: This study selects the health indicators of older adults to analyze the impact of tea drinking on health. Design: This is a panel data.
Setting: This study uses data from China Health and Nutrition Survey (CHNS), which covers nine provinces and ten waves, between 1997 and 2015.
Participants: a total of 706 old adults are consistently surveyed in six surveys on issues such as health and nutrition.
Measurements: Health of old adults is assessed by self-reported health (SRH), tea drinking is 0-1 dummy variable, and also analyze with the frequency of tea drinking. This study uses ordered probit model to analyze the influence of tea drinking on SRH.
Results: Findings reveal a significant negative correlation between tea drinking and SRH of older adults. It is shows that the significant positive correlation exists between the tea drinking frequency and SRH, but the quadratic term of tea frequency shows the significant negative correlation. It means drinking tea benefits older adults in terms of improved health, but excessive consumption of tea is not healthy for them. The heterogeneity analyses reveal that there are no significant geographic, tea-drinking pattern or gender differences in the conclusion that tea drinking is good for older adults’ health.
Conclusion: In this study, we find correlation between tea drinking and SRH of older adults, and tea drinking is beneficial toward the improvement of SRH, but drinking tea in excess is not good for older adults’ health.

Key words: Tea drinking, self-reported health, frequency of tea drinking, heterogeneity, older adults.



The proposal of “international tea day” adopted at the 41st session of Food and Agriculture Organization of the United Nations in 2019 has a far-reaching and significant impact on the development of the global tea industry and the promotion of tea culture. In China, with the improvement of residents’ living standards, tea has been widely consumed as a daily social and common beverage. The healthy function of tea has been gradually recognized by the academic circles under the multidisciplinary cross and penetration (1). However, a detailed review of the relevant literature at home and abroad shows no consensus on the health effects of the elders of tea drinking despite the wide range of individual experimental data. The effect of tea drinking on individual health varies when different countries, certain diseases, and other research methods are involved.
Following the year 2006, China became the country with the largest total tea consumption, and the problem of ageing is also becoming more serious and the health of older adults is of great concern. So the effect of tea drinking on health of older adults in China is a meaningful and valuable topic, which complements the current research and analysis on the role of tea drinking. This study selects the health indicators of older adults to analyze the impact of tea drinking on health. The innovations of this study are as follows: First, the research samples used are representative, rich, and effective. The large-scale epidemiological investigation on the influence of tea drinking in China has not been executed, and the conclusions of the existing research on tea drinking in a small range are inconclusive. Data in this study are derived from the CHNS. The data covered cities and towns across all cardinal points of China from 1989 to 2015. CHNS database includes demographic characteristics, economy, public resources, and health indicators, which meet the needs of this research. Second, individual SRH is used as the dependent variable to study the effect of tea drinking. SRH is used as an effective and popular measure of individual health, although other undiscovered or proven effects of tea drinking can also be used. Third, the analytical method is scientific. In this study, the ordered probit model is used for analysis. Considering the possible endogenous problems between tea drinking and SRH, the robustness test is conducted by employing methods, such as lag period. This study also analyzes the heterogeneity of tea drinking effects for different gender groups to enrich the research content.


Data and model

Model settings

Because the dependent variable involves four kinds of discrete values, the ordered probit model should be adopted, which is widely used to deal with multiple kinds of discrete data. The empirical model is set as follows:
healthi,t=α+βteai,t +θXi,t +γϕt +ηδi+μi,t, (1)
wherehealthi,t denotes the SRH for individual i surveyed in wave t, which is measured by ordered qualitative labels, such as “very good” (the value is 1), “good” (the value is 2), “fair” (the value is 3), and “poor” (the value is 4). SRH is a commonly used health measurement index, which reflects the comprehensive condition of personal subjective and objective health and can provide supportive information for the decision makers of disease prevention to an extent (2). SRH measure is inexpensive, can easily collect data, and is synthetic by construction. teai,t is the key variable, which is a dummy indicating the tea drinking of individual i in wave t. This study also examines the effects of the frequency of tea drinking on health. tea frequencyi,t denotes the tea drinking frequency of individual i in wave t, which is measured by ordered qualitative labels, such as “almost every day” and “four to five times a week.” The values are 1−7 in order.
Xi,t represents a series of control variables, including personal basic information, chronic disease, individual habits, dietary and activity preferences, healthy exercise and nutritional diet cognition. Personal basic information is the control variable for health economics (2, 3), including age, gender, height, weight, marital status, education, medical insurance, household registration type, income and household sanitation. Medical insurance and household registration are the dummy indicating. Income is measured by personal annual income deflator. The variable of household sanitation is measured by two dummy variables of having running water and flush toilet. Chronic diseases are important factors affecting the elders’ health and harm the brain, heart, kidney, and other important organs. Having hypertension, diabetes, myocardial infarction, stroke, and fracture are five dummy variables indicating chronic diseases. Individual habits are measured by two dummy variables of smoking and drinking. Diet and activity preferences also have indirect effects on health, which are measured by six dummy variables of preferences. The cognitions of healthy exercise and diet indirectly affect health, which are dummy variables. φt is wave fixed effects, δi is individual fixed effects, and μi,t is the error term.

Data description

The data used in this study were obtained from the CHNS, which is an ongoing international collaborative project among the Carolina Population Center at the University of North Carolina at Chapel Hill, the National Institute of Nutrition and Food Safety, and the Chinese Center for Disease Control and Prevention. The survey covered nine provinces, including Liaoning, Heilongjiang, Shandong, Jiangsu, Henan, Hubei, Hunan, Guangxi and Guizhou, including the waves 1989, 1991, 1993, 1995, 1997, 2000, 2004, 2006, 2009, and 2015. When the survey was conducted, a multistage, random cluster process was used to draw the sample surveyed in each province. Approximately 4,400 households were listed in the overall tracking survey, covering 19,000 individuals. Rich personal and family microdata of older adults are 65 years of age and older provided rich and detailed data for the research. The study constructed panel data and omitted data prior to years 1997 and the year of 2015 because of data of tea drinking missing. Tea drinking was derived from the question “Do you normally drink tea?” SRH was obtained from the question “Right now, how would you describe your health compared to that of other people your age?” This study excluded individuals who reported “unknown” from the sample. The results of variable statistical characteristics are presented in Table 1.

Table 1. Descriptive statistics


Results and related tests

Basic regression results

Column (1) and (2) in Table 2 shows the regression results of the ordered probit model with wave fixed effect, individual fixed effect, and different control variables that are successfully added. No matter how the control variables change, the significance and sign of the estimated coefficients of dummy variables of tea drinking remain unchanged. The results indicate that the estimated coefficient of dummy variable of tea drinking is significantly negative, which means drinking tea can significantly improve SRH of older adults compared with non-tea drinkers.
Considering the possible influence of chronic diseases, individual habits, dietary and activity preferences, cognitions of healthy exercise and diet on individual health status, this study further extends the basic regression and adds the above variables into analysis respectively. The results are shown in Columns (3), (4), (5) and (6) of Table 2. All regressions in Table 2 show that regardless of how the variables are added, the estimated coefficients of dummy variables of tea drinking are significant and negative, which are correlated with the dependent variables in all regressions. Therefore, tea drinking is beneficial to SRH of older adults. The reliability of the conclusion is further verified by various robustness tests.

Table 2. Tea drinking and SRH

Note: 1. ***, **, and * indicate 1%, 5%, and 10% significance levels. 2. p-value in parentheses.


Robustness test

To verify the robustness of the basic conclusion, the following analysis is considered. First, change the income variable. In certain families old adults may not earn income, but they consume total family income. In many rural families, personal income cannot be separated from family income. Deaton (2003) shows that equalized family income can well measure the utilization of family resources by family members (4). Therefore, this study re-measures individual income with per capita family income deflator. The regression results are shown in Column (1) in Table 3. The beneficial effect of tea drinking on SRH of older adults remains unchanged even after changing the income variable.

Table 3. Robustness test

Note: 1. ***, **, and * indicate 1%, 5%, and 10% significance levels. 2. p-value in parentheses.


Second, possible endogenous effects. Considering the effect of tea drinking in the early stage on health in the later stage, a problem in which explanatory variables are related to random error terms may arise. To solve this problem, explanatory variables with a lag of two periods from existing research are analyzed. The analysis results are presented in Column (2) in Table 3. After adding the control variables, the estimated coefficient of variable of tea drinking with a lag of two periods remains significantly negative.
Third, health dynamics. In view of the dynamic status of individual health, which is the past health status, may affect the current health status. Individual SRH with a lagged period is added into the model as a new control variable for regression. The results are shown in Columns (3) in Table 3. The regression results have once again confirmed that the estimated coefficient of tea drinking is significant and negative.
Fourth, change the dependent variable. Etilé and Milcent (2010) study the health samples in France and find that people with high income likely underestimate their health status because they have high health expectations (5). People with low income overestimate their health status for the lack of relevant health information and the low level of education. They think two categories of SRH indicators (“good” and “bad”) make sense. The treatment method is as follows: SRH answers of “very good” and “good” are classified into the same category, and new variable health2 is defined as “good” whose value is 1. The rest is 0. Regression results are displayed in Columns (4) in Table 3. In the regressions, SRH with one lag period is added simultaneously. Per capita household income variable is added into the regression of Column (4). The results reveal a significant positive correlation between the dummy variables of tea drinking and binary SRH variables. This finding is consistent with the basic conclusion because the value of dependent variable is 1 when SRH is good, indicating that drinking tea can help improve SRH. All the robust regression results have further verified the beneficial effects of drinking tea on SRH of older adults.

Table 4. Results of the extended analysis

Note: 1. ***, **, and * indicate 1%, 5%, and 10% significance levels. 2. p-value in parentheses.



Effect of drinking tea frequency

Tea polyphenols have beneficial effects at a certain level in the body. Heavy tea drinking by irregular steps leads to the ups and downs of components such as polyphenols, which offer far fewer healthy benefits than continuous drinking tea. Different ways of drinking tea can directly affect the curative and prevention effect of effective substances in tea, which is also one of the important reasons for the difference in the anti-cancer effect of tea drinking between eastern and western countries. On the basis of large-scale data in China, the study demonstrates whether the sustainability of drinking tea is conducive to the improvement of personal SRH of older adults.
Columns (1) and (2) in Table 4 report the effect of tea drinking frequency on the SRH of old adults. The results show that no matter how the control variables are adjusted, the significant positive correlation exists between the tea drinking frequency and SRH. But when the variables of chronic diseases are added in the model, the quadratic term of tea frequency shows the significant negative correlation with SRH. The results show that tea needs to be consumed frequently enough to reach some cumulative amount in order to contribute to the health of older adults, however, excessive consumption of tea is not healthy for older adults. The increase in tea drinking frequency indicates the stable existence of the healthy components in the body, which is beneficial for improving human body function and immunity.

Heterogeneity analysis

Locational factors

China has a vast territory, and cities in the north and south regions are divided by the Qinling Mountains-Huaihe River line. Northern regions include Liaoning, Heilongjiang, and Shandong; the rest are southern regions. The geographical location, climate characteristics, historical culture, political economy, and other aspects of the north and south are evidently different. Generally speaking, southerners are graceful and restrained. They pay attention to the process of producing tea and the utensils of making tea. Northerners are bold, unconstrained, are indifferent about the cultural attributes of drinking tea. Most northern regions do not produce tea. The most popular kinds of Chinese tea are from the South. Southerners prefer various kinds of tea, such as green, black, and white tea. Northern climate difference is large that southerners prefer black tea to warm their stomach when the weather is cold. By contrast, green tea is preferred in the summer. The bioavailability of the active components of tea and species differences in their functions may lead to different effects on people, whereas that of the active components of tea leaves may be different in the human body (1). Studies suggest that green tea is better than black tea in preventing certain diseases and improving health (6). The availability of tea polyphenols, catechins, and other components in tea drinking depends on the type of tea and how it is processed. As a result, the study analyzes the heterogeneity effect of tea drinking on SRH between north and south regions. A dummy variable, whose value of south regions are 1, and the rest are 0, is constructed. The interaction term of dummy variable and tea drinking variable is also added. The regression results suggest that interaction items are insignificant, when personal basic information, individual fixed effect and wave fixed effect are added. Therefore, no difference is observed in the effect of the different ways and types of tea drinking on SRH in the north and south of China. The most direct reason for the result is that the data do not involve the type of tea drinking.

Gender difference

It reveal that tea drinking may reduce iron absorption and increase the risk of iron deficiency, especially in vulnerable groups, for tea polyphenols combined with iron (7, 8). The present study discusses whether the conclusion of tea drinking is beneficial to SRH of older adults varies with gender in China. According to Model (1), the sample is divided into male and female groups for analysis. The regression results of the ordered probit model are presented in Columns (3) and (4) in Table 4. The analysis in Table 6 adopts per capita family income to measure income. The analysis results of the male and female groups show the estimated coefficients of the dummy variables whether tea drinking is significantly negative. Therefore, the conclusion of improving the health status of older adults by drinking tea is no significant gender differences. However, the coefficient values were greater in the male geriatric group than in the female geriatric group. This indicates that, excluding other controllable factors, tea consumption promotes the health of elderly men slightly better than that of elderly women.



Existing studies have neither used a wide range of data on tea drinking and older adults’ health in China to verify the correlation between them nor conducted an in-depth analysis. In this context, CHNS panel data, including demographic characteristics, economy, public resources, health, and other indicators, are selected in the present study. An ordered probit model is also utilized to analyze the relationship between tea drinking and older adults’ SRH. The results are as follows. First, a significant negative correlation exists between tea drinking and SRH, indicating that tea drinking is beneficial to the improvement of older adults’ SRH. In addition to the personal basic information, certain external factors can affect human health. Thus, this study adds dummy control variables of chronic diseases, individual smoking or drinking habits, diet preferences, activity preferences, cognitions of healthy exercise and diet, and performs regression. The results adding these dummy control variables respectively do not change the conclusion of tea drinking improving older adults’ SRH.
Second, the robustness test considers the following factors: change the control variables, eliminate the endogenetic effect, eliminate the possible health dynamic influence and the index of SRH is reprocessed. Different robustness test results also confirm that tea drinking is beneficial to older adults’ health. Third, the increase in the frequency of tea drinking is beneficial to the improvement of older adults’ SRH. Such an increase also indicates the stable existence of health components in the body, which are beneficial for improving the function and immunity of the human body. however, excessive consumption of tea is detrimental to the older adults’ health. Fourth, the conclusion of heterogeneity analysis reveals that no difference exists in the effect of different ways and types of tea drinking on SRH in the north and south of China, so as to gender.


Acknowledgement: J. Wang is supported by National Social Science Fund Youth Project of China (Project Number: 18CJY021) and X. Wan is supported by The paper is funded by the Fundamental Research Funds for the Central Universities (NO. 2722020JCG083).

Conflict of interest: The authors declared that they have no conflicts of interest to this work.

Ethical standards: The study used publicly available research data and did not violate ethical standards.



1. Chen L, Lee M J, Li H, Yang CS. Absorption, distribution, elimination of tea polyphenols in rats. Drug Metab Dispos 1997;25(9):1045-1050
2. Courtemanche C, Marton J, Ukert B, Yelowitz A, Zapata D. Early effects of the affordable care act on health care access, risky health behaviors, and self-assessed health. South Econ J 2018;84(3):660-691
3. Lockwood LM. Incidental bequests and the choice to self-insure late-life risks. Am Econ Rev 2018;108(9):2513-50
4. Deaton A. Health, inequality, and economic development. J Econ Lit 2003;41(1):113-158
5. Etilé F, Milcent C. Income-related reporting heterogeneity in self-assessed health: evidence from France. Health Econ 2010; 5(9):965-981
6. Pathy NB, Peeters P, Gils CV, et al. Coffee and tea intake and risk of breast cancer. Breast Cancer Res and Tre 2010;121(2):461-467
7. Zijp IM, Korver O, Tijburg LBM . Effect of Tea and Other Dietary Factors on Iron Absorption. Cri Rev in Food Sci and Nut 2000;40(5):371-398
8. Li W, Yang J, Zhu XS, Li SC, Ho PC. Correlation between tea consumption and prevalence of hypertension among Singaporean Chinese residents aged 40 years. J Hum Hypertens 2016;30: 11-17


A. De Mauleon1, J. Delrieu1, C. Cantet1, B. Vellas1, S. Andrieu1, P.B. Rosenberg2, C.G. Lyketsos2, M. Soto Martin1

1. Gérontopole, INSERM U 1027, Alzheimer’s Disease Research and Clinical Center, Toulouse University Hospital, Toulouse, France; 2. Department of Psychiatry and Behavioral Sciences, Johns Hopkins Bayview, Johns Hopkins University, Baltimore, United States; On behalf of the A3C study group: L. BORIES, A. ROUSTAN, Y. GASNIER, S. BORDES, M.N. CUFI, F. DESCLAUX, Y. GASNIER, V. FELICELLI, N. GAITS, T. UGUEN, P. TESTE, M. PERE-SAUN, J.F. PUCHEU, S. BORDES, J.P. SALLES.

Corresponding Authors: Adelaide de Mauleon, MD, Gerontopôle de Toulouse, Department of Geriatric Medecine, Toulouse University Hospital, 224, avenue de Casselardit, 31059 TOULOUSE Cedex 9, France, Phone : +, Fax : +, E-mail:

J Prev Alz Dis 2020;
Published online November 26, 2020,



BACKGROUND: To present methodology, baseline results and longitudinal course of the Agitation and Aggression in patients with Alzheimer’s Disease Cohort (A3C) study.
Objectives: The central objective of A3C was to study the course, over 12 months of clinically significant Agitation and Aggression symptoms based on validated measures, and to assess relationships between symptoms and clinical significance based on global ratings.
Design: A3C is a longitudinal, prospective, multicenter observational cohort study performed at eight memory clinics in France, and their associated long-term care facilities.
Setting: Clinical visits were scheduled at baseline, monthly during the first 3 months, at 6 months, at 9 months and at 12 months. The first three months intended to simulate a classic randomized control trial 12-week treatment design.
Participants: Alzheimer’s Disease patients with clinically significant Agitation and Aggression symptoms lived at home or in long-term care facilities.
Measurements: Clinically significant Agitation and Aggression symptoms were rated on Neuropsychiatric Inventory (NPI), NPI-Clinician rating (NPI-C) Agitation and Aggression domains, and Cohen Mansfield Agitation Inventory. Global rating of agitation over time was based on the modified Alzheimer’s Disease Cooperative Study-Clinical Global Impression of Change. International Psychogeriatric Association “Provisional Diagnostic Criteria for Agitation”, socio-demographics, non-pharmacological approaches, psychotropic medication use, resource utilization, quality of life, cognitive and physical status were assessed.
Results: A3C enrolled 262 AD patients with a mean age of 82.4 years (SD ±7.2 years), 58.4% women, 69.9% at home. At baseline, mean MMSE score was 10.0 (SD±8.0), Cohen Mansfield Agitation Inventory score was 62.0 (SD±15.8) and NPI-C Agitation and Aggression clinician severity score was 15.8 (SD±10.8). According to the International Psychogeriatric Association agitation definition, more than 70% of participants showed excessive motor activity (n=199, 76.3%) and/or a verbal aggression (n=199, 76.3%) while 115 (44.1%) displayed physical aggression. The change of the CMAI score and the NPI-C Agitation and Aggression at 1-year follow-up period was respectively -11.36 (Standard Error (SE)=1.32; p<0.001) and -6.72 (SE=0.77; p<0.001).
Conclusion: Little is known about the longitudinal course of clinically significant agitation symptoms in Alzheimer’s Disease about the variability in different outcome measures over time, or the definition of a clinically meaningful improvement. A3C may provide useful data to optimize future clinical trials and guide treatment development for Agitation and Aggression in Alzheimer’s Disease.

Key words: Agitation/aggression, dementia, cohort, validated measurements, trials.



The syndrome of agitation and aggression (A/A) in Alzheimer’s disease (AD) encompasses a range of affective, verbal or motor disturbances such as restlessness, cursing, aggression, hyperactivity, combativeness, wandering, repetitive calling out, irritability and disinhibition (1). A/A occurs in as many as 29% (2,3) of people with AD living at home, and up to 40-60% of those living in long term care facilities (LTCF) (4). A/A is among the most prevalent, persistent and disturbing neuropsychiatric syndrome (NPS). Its severity and frequency increase with disease progression (4). Moreover, A/A has major adverse consequences for patients, families and health care systems including worse quality of life for patients and their caregivers (5), greater disability with earlier institutionalization (6), accelerated transition from prodromal AD to dementia (7), accelerated transition from mild dementia to severe dementia or death (8) and higher health care costs (9). Thus, the management of A/A is a major priority in caring for patients with AD.
Currently the management of A/A remains a challenge for clinicians and caregivers due to the lack of safe and efficient medications as well as due to the difficulty of implementing best evidence non-pharmacological approaches in “real life” clinical setting (10). Although, medication development for treatment of A/A has seen advances in recent years, major methodological questions remain (11, 12). In part, this stems from limited natural history data about generalizable A/A cohorts: most data come from research whose main objective was to study cognitive and functional parameters (e.g., 13), in which patients with significant NPS were excluded or where NPS were inadequately quantified (2). Hence, for the most part these studies have been inadequate to describe the natural evolution of A/A over time or to determine associated clinical characteristics of NPS. Further, none investigated the variability of A/A measures, variances inherent in these measures, or factors influencing this variability, which are issues crucial to trial design and interpretation of this results.
We hypothesized that A/A in AD has a predictable course and associated factors, and that a longitudinal prospective observational survey specifically assessing A/A in patients with AD would provide useful data for treatment development. The overarching aim of the Agitation and Aggression in patients with Alzheimer’s Disease a Cohort (A3C) study was to assess the evolution and longitudinal course of A/A in patients with AD.
The central objective was to study the natural course of symptoms in patients with clinically significant A/A over 12 months of follow-up, with special attention to the first three which is a commonly used duration for NPS trials.
Secondary objectives included estimating the minimal clinically important differences (MCID) in outcomes and assessing the variance of A/A symptoms over time.




A3C is a longitudinal prospective multicenter observational cohort study performed at eight memory clinics from southwest France and their associated LTCF: Castres, Foix, Lannemezan, Lavaur, Lourdes, Montauban, Tarbes and Toulouse. Toulouse University Hospital was the coordinating center. Clinical visits (V) were scheduled at baseline (V1), monthly during the first 3 months of follow-up (V2 to V4), at 6 months (V5), at 9 months (V6) and at 12 months (V7) during a 1-year follow-up period. The first three months of A3C were designed to simulate a classic randomized controlled trial 12-week treatment design. Participants were recruited between December 2014 and August 2017. The last follow up visit took place in June 2018.


Participants were men and women, aged 60 years and older, with possible Alzheimer’s dementia according to NIA-AA’s criteria (14), with or without cerebrovascular components, and regardless of Mini Mental State Examination (MMSE) score. Participants had clinically significant agitation defined broadly by the presence of significant symptoms on at least one of the following NPS as rated on the Neuropsychiatric Inventory (NPI): A/A, disinhibition, aberrant motor behavior and/or irritability (15). Clinically significant was defined as NPI agitation/aggression domain score ≥ 4 with NPI frequency score ≥ 2 at entry. Participants also met the International Psychogeriatric Association (IPA) provisional definition of agitation in cognitive disorders (16).
1. Patients could live at home or in a LTCF. To be included, community dwelling patients had to have an identified primary caregiver, who visited at least three times a week for several hours and supervised patient’s care, and was available to accompany the patient to study visits and to participate in the study. Patients living in a LTCF had lived in the facility for at least two months before inclusion. Patients were excluded if: they had other brain diseases (e.g., extensive brain vascular disease, Parkinson’s disease, other dementias or traumatic brain injury), major depressive episode according to DSM-IV(TR) criteria (, or serious illness that would impair their ability to perform study assessments; the agitation or aggression was attributable to concomitant medications, active medical or psychiatric conditions; had clinically significant psychosis with a NPI domain’s score (hallucinations or delusions) ≥ 4 or were participating in a clinical trial.
Participants and their caregivers took part in the study voluntarily: written informed consent was obtained from all patients (or legal representatives) and caregivers (for the community dwelling population). Each participant’s capacity to give consent was assessed in clinical interviews by clinicians experienced in dementia research. Consent was personally provided if the participant was found to be capable. If the participant was not fully capable of consent, then it was obtained from an authorized legal representative. A3C had ethical approval and oversight from the local Institutional Review Board (Toulouse University Hospital).

Institutional long-term care facilities

In this study, an LTCF was defined as a place of communal living where care and accommodation are provided as a package by a public agency, nonprofit company or private company. LTCF included assisting living, nursing home and other long-term care facilities.

Data collection

At baseline and at every follow-up visit, data collection was performed by trained professionals with clinical experience, during to face-to-face interviews, and recorded on standardized case record form. All raters were standardized trained to perform the scales used in the study. A special standardized training was performed in all clinicians’ raters for primary outcomes: mADCS-CGIC, CMAI and NPI-C A/A. Visits were performed in outpatient memory clinics for community dwelling patients and their caregivers. For institutionalized patients, data were collected from the LTCF staff, in the majority of cases the same each patient’s “referent staff” was interviewed each rating. Table 1 shows the investigation schedule for participants and their primary caregivers, if applicable.

Table 1. A3C investigation schedule for participants and their primary caregivers if applicable

Abbreviations: V=Visit, M0=baseline, M1=1 month, M2=2 months, M3=3 months, M6=6 months, M9=9 months, M12=12 months, NPI=neuropsychiatric inventory, IPA=International Psychogeriatric Association, CGI-S=Clinical Global Impression of Severity, NPI-C=neuropsychiatric inventory clinician rating scale, CMAI=Cohen Mansfield agitation inventory, ADSC-CGIC=Alzheimer disease cooperative study clinical global impression of change, MMSE=mini mental state examination, ADL=activities daily living, QoL-AD=quality of life of patient with Alzheimer’s disease (Logsdon scale), RUD=resource utilization in dementia instrument. *if patient living at home with an identified primary caregiver



Participant age, gender, education, living arrangement and community care services were recorded using a structured questionnaire directed to patients and/or their caregivers as appropriated at baseline. The socio-demographic characteristics were recorded from the primary caregiver for patients living at home by an identified primary caregiver. Changes in living arrangement and community care services were noted at each visit. Whether the patient lived in a LTCF and a dementia special care unit were both recorded.

Medical characteristics

Medical history of past and current conditions was recorded with a focus on cardio-vascular conditions, fractures, cancers, renal failure, sensory disabilities, gastro-intestinal, neurologic and psychiatric diseases. At baseline, caregiver current medical history and ongoing treatments was collected when appropriate. At each visit, clinical examination of participants was performed; changes concerning pharmacological treatments with focus on anti-dementia treatments (Donepezil, Rivastigmine, Galantamine, Memantine), other psychotropic drugs and intercurrent events (hospitalizations, falls, undernutrition) since the last visit were collected.

NPS assessment

Agitation and aggression symptoms

Agitation severity was rated by validated measures such as the A/A domain from the Neuropsychiatric Inventory (NPI) (17), the Neuropsychiatric Inventory Clinician rating (NPI-C) (18) and the Cohen Mansfield Agitation Inventory (CMAI) (1).
The NPI A/A domain measures frequency and severity of A/A symptoms. The identified caregiver rated the A/A NPI domain for symptoms frequency (in a 1-4 scale: occasionally [less than once per week], often [about once per week], frequently [several times per week but less than every day] or very frequently [more than once per day], respectively) and severity (in a 1-3 scale: mild, moderate and marked, respectively). The NPI’s scoring yields a composite (frequency x severity) score of 1-12 for the domain. The NPI A/A domain also quantifies caregiver distress on a scale 0-5: none, minimal, mild, moderate, marked or extremely marked.
NPI-C (18) measures the severity of A/A based on a combined domain score of distinct agitation (13 items) and aggression (8 items) domains (NPI-C-A/A). Each NPI-C domain measures: (1) item frequency on a 1-4 scale: less than once per week, about once a week, several times per week but less than every day or more than once per day respectively, (2) item severity domain on a 1-3 scale: minimal, mild, moderate and marked respectively , (3) caregiver distress on a 0-5 scale: none, minimal, mild, moderate, marked or extremely marked, and (4) item clinician severity on a 0-3 scale: none, mild, moderate and marked, based on clinician judgement. The combined clinician severity score of both domains (agitation and aggression) ranges from 0 to 63 and is the NPI-C rating of interest in A3C. NPI-C is a clinician-rated questionnaire.
The Cohen Mansfield Agitation Inventory (CMAI) (1) is a caregiver-rated questionnaire. It quantifies the frequency of 29 behaviors exhibited by the patient on a 7-point scale from never (1), less than once a week (2), once or twice a week (3), several times a week (4), once or twice a day (5), several times in a day (6) to several times in an hour (7) throughout the preceding 2 weeks. Total score ranged from 29 to 203. A higher score indicated more severe NPS.
The Alzheimer Disease Cooperative Study-Clinical Global Impression of Change (ADCS-CGIC) (19) is a global rating of change and was developed to assess clinically significant change in symptoms over time in AD clinical trials by experienced clinicians. The modified ADCS-CGIC agitation domain version (20) rates agitation five areas globally: mood lability, emotional distress, physical agitation, verbal aggression and physical aggression. It defines the severity of agitation from absent, not at all ill (1), to borderline ill (2), mildly ill (3), moderately ill (4), markedly ill (5), severely ill (6), or among the most extremely ill patients (7) at baseline. During follow-up, the mADCS-CGIC agitation domain rated global clinical change in agitation as: very much improved (1), much improved (2), minimally improved (3), no change (4), minimally worse (5), much worse (6), very much worse (7) compared to baseline symptoms.
The Clinical Global Impression of Severity (CGI-S) is a clinician-rated, 7-point scale that is designed to rate the severity of the subject’s agitation symptoms at baseline using the investigator’s judgment and past experience with the subjects who have the same symptoms (21).
The International Psychogeriatric Association (IPA) “Provisional Diagnostic Criteria for Agitation” (16), identifies three groups of agitation symptoms:
– Excessive motor activity (moving continuously, swinging, gesturing, pointing, repetitive mannerism, restless).
– Verbal aggression (screaming, speaking aloud in an excessive way, coarseness, yelling, shouting, voice bursts).
– Physical aggression (tearing, pushing, resisting, hitting, kicking people or objects, scratching, biting, throwing objects, hitting oneself, slamming doors, tearing things apart, destroying property).

Other neuropsychiatric symptoms

The NPI-C measures a total of 12 individual domains besides agitation and aggression: delusions, hallucinations, depression/dysphoria, anxiety, elation/euphoria, apathy/indifference, disinhibition, irritability/lability, aberrant motor behavior, sleep, appetite and eating disorders, aberrant vocalizations. Each NPI-C domain is rated as with the agitation and aggression NPI-C domains. Each NPI-C domain is included in the NPI except for aberrant vocalizations.
Psychotropic medication
Psychotropic medications were differentiated according to ACT coding as antipsychotic, antidepressant, hypnotic, anxiolytic and other drugs. All medications were recorded at each visit based on the patient’s prescription drug that was verified by the physician from the memory clinic or by the nurse form the LTCF.

Non-pharmacological approaches

The study team was documenting non-pharmacologic treatments and/or approaches for A/A and were classified into three groups according their targets: (1) targeting the patient, (2) targeting the informal or professional caregiver and (3) targeting the environment. For example, caregiver supportive interventions were collected as binary variables (yes/no) such as caregiver training in education about dementia, communication skills, improving caregiver mismatch of her expectations and dementia severity and, finally, assessment or informal caregiver’s burden or mood disorders. Concerning the environment, the following interventions were collected as binary variables (yes/no): improving excess/lack of stimulation, patient isolation, establishing an everyday structured routine, proposing meaningful activities adapted to the patient’s abilities and tastes (10). Intervention by different health professionals was also recorded in both settings.

Non behavioral and psychological assessment

Cognitive assessment

Time since diagnosis of AD was recorded. Cognitive impairment was rated on Mini Mental State Examination (MMSE) to evaluate orientation, memory, attention, concentration, denomination, repetition, comprehension, ability to formulate a whole sentence and to copy polygons (22). Disease severity at entry was defined as mild (≥21), moderate (20-15), moderately severe (14-10), or severe (<10). If AD biomarkers in cerebrospinal fluid were measured to help the diagnosis was noted.

Functional evaluation

Physical impairment was based on Katz’s activities of daily living (ADL) scale (23). This is a 6-item scale with a total score ranging from 0 from 6. A higher score indicates less functional impairment. One leg balance test was performed to evaluate risk of falls. The risk of fall increases if the one leg balance test is less than or equal to 5 seconds (24).

Quality of life

Quality of life of was based on QoL-AD (25). This scale evaluates 13 items self- or caregiver-report: physical health, energy, mood, living situation, memory, family, marriage, friends, self, ability to do chores, ability to do things for fun, money and life as a whole. Each item is rated on a 1-4 score scale: poor, fair, good, excellent respectively. Total score ranges from 13 to 52. A higher score indicated a better quality of life.

Resource Utilization in Dementia

Health care resources consumed by patients with AD were assessed with the Resource Utilization in Dementia (RUD) instrument (26). This questionnaire collected data about medical resources (inpatient stays, outpatient visits and medication), community care services (district nurse, home help, day care, transportation, meals on wheels), and time spent by the caregiver on ADL and instrumental ADL.

Statistical analysis

To describe the characteristics at baseline, we presented frequencies and percentages for the qualitive variables, and the mean ±Standard Deviation (SD) for the quantitative variables. To compare the characteristics of participants between patients in LTCF vs patients at home at baseline we used the Chi-square test or the Fisher’s exact test (if theorical frequency<5) for the qualitative variables. For the quantitative variables, we used the Student test for Gaussian distributions and the Kruskal-Wallis non parametric test for non-Gaussian distributions. To estimate the change from baseline of CMAI and NPI-C-A/A, we used a linear mixed model with time in continuous. Mixed models included all available data (M0, M1, M2, M3, M6, M9 and M12). We included subject-specific random effects to take into account the intra-subject correlation: a random intercept to take into account the heterogeneity of the CMAI and NPI-C-A/A at baseline and a random slope to take into account the heterogeneity of the slopes between subjects. The centre-specific random intercept was not included because this term was not significant. All statistical analyses were performed using SAS software version 9.4 (SAS Institute Inc, Cary, NC).



Baseline characteristics of the A3C cohort

Table 2 shows the baseline characteristics of the participants. Study patients were very elderly with mean age above 80 years and the majority were women. The greatest majority lived at home alone or with an informal caregiver. Almost two-thirds had a cardio-vascular risk factor but only a minority had a psychiatric history. Based on mean MMSE most had moderate or more severe dementia of several years’ duration. Fewer than half received an AD specific medication, mostly anticholinesterases, but ~80% were receiving a psychotropic. The mean duration of follow-up for patients was 9.5 months (Standard Deviation (SD) ± 4.3).

Table 2. Baseline characteristics of the A3C cohort (N=262)

*mean (standard deviation); **if patient living at home with an identified primary caregiver; Abbreviations: MMSE = Mini Mental State Examination; AD = Alzheimer’s disease; ADL = Katz’s activities of daily living scale; A/A = agitation/aggression.

Figure 1 presents subject disposition at follow-up in detail. Of the 262 patients enrolled 86 (32.8%) subjects dropped out over the 1-year follow-up. Attrition during the three first months, the critical period of A3C study, was 13.0% (n=34).

Figure 1. Flow chart of the A3C study

Abbreviations: V = Visit.

At baseline, study patients in LTCF were older (p=0.0003), more physically (p<0.0001) and cognitively (p<0.0001) disabled than the home-based subgroup. All types of psychotropic medications were more frequent among patients living in LTCF: antipsychotics (p=0.004), antidepressants (p=0.0006), anxiolytics (p<0.0001) and hypnotics (p<0.0001). However, the total score of NPI and the total score of CMAI were not significantly different between both subgroups at baseline (p=0.31 and p=0.25, respectively).

Baseline characteristics of neuropsychiatric symptoms (NPS)

By the IPA agitation definition, most participants had excessive motor activity (76.3%) and/or a verbal aggression (76.3%), while 115 (44.1%) displayed physical aggression. Table 3 shows the characteristics of A/A ratings and other NPS in study participants.

Table 3. Baseline characteristics of neuropsychiatric symptoms of the A3C cohort (N=262)

*mean (standard deviation) ; Abbreviations: CGI-S = Clinician’s Global Impression of Severity; NPI = Neuropsychiatric Inventory; NPI-C = Neuropsychiatric Inventory Clinician rating scale; CMAI = Cohen Mansfield Agitation Inventory; AMB = aberrant motor behavior; A/A = agitation/aggression; IPA = International Psychogeriatric Association.


Longitudinal courses of agitation symptoms over 12 months of follow-up

The CMAI score decreased significantly between the baseline (mean = 61.5; [Standard Error SE±1.0]), the 6 months of follow-up (V5) (mean = 50.5; [SE±1.0]) and the 12 months (V7) (mean = 50.1; [SE±1.2]). The mean of the NPI-C-A/A clinician severity score was 15.5 (SE±0.7) at baseline, 8.8 (SE±0.6) at 6 months (V5) and 8.8 (SE±0.7) at 12 months (V7). The change of the CMAI score and the NPI-C A/A at 1-year follow-up period was respectively -11.36 (SE=1.32; p<0.001) and -6.72 (SE=0.77; p<0.001). The figure 2 presents the change of the CMAI score and the change of the NPI-C-A/A score during the 1-year follow-up period. The figure 3 shows the variation of the modified ADCS-CGIC agitation domain version at each visit of follow-up (V2 to V7) with the modified ADCS-CGIS at baseline.

Figure 2. Change of the total CMAI and the total NPI-C-A/A during the follow-up (results from mixed linear models)

Figure 3. Comparison of the modified ADCS-CGIC agitation domain version at each visit of follow-up (V2 to V7) with the baseline (V1)

Abbreviations: V=Visit, mADSC-CGIC= modified Alzheimer disease cooperative study clinical global impression of change agitation domain.



Two hundred sixty-two patients with clinically significant A/A were enrolled in A3C study, most living at home, with moderate to severe dementia. At baseline, more than 70% showed excessive motor activity and/or a verbal aggression while fewer than half displayed physical aggression. At baseline, psychotropic medication was prescribed to 80%. Agitation symptoms experienced the greatest decreases during the first three months of follow-up, and A/A continued to improve through 12 months.
Concerning the specific study design of the A3C study, a monthly visit schedule during the first 3 months was chosen to address the primary aim of the A3C study. The first three months of A3C intends to simulate a classic randomized control trial 12-week treatment design. Evolution, variability and associations between different outcome measures will be specifically studied during this time. Subsequently, visits every three months were chosen until the end of a year for the purposes of detecting changes in NPS in shorter periods of time. In fact, since NPS are characterized by frequent fluctuations as well as by differences in the concurrent presentation of different symptoms, shorter intervals between assessments is needed in order to better and more precisely describe their course. Further data of the A3C will help to identify different A/A trajectories based on variations in change over the time in the frequency and the severity of symptoms and their associated factors, to study the coexistence of other clinically significant NPS, and to analyze patterns, and impact of pharmacological and other non-pharmacological approaches in the management of clinically significant A/A.
A recent systematic review estimated the incidence of clinically significant agitation in nursing home patients to be 18.8% over 12 months and 36% over 4-years (27). Several studies evaluated disease progression of agitation in AD: six studies reported an increase in severity/frequency of agitation over time, two studies presented mixed results and one showed a decrease (27). To our knowledge, to date no study describes the course of A/A overtime in community-dwelling AD patients. Interestingly, in A3C study, a major decrease of A/A symptoms was observed during the first 3 months of follow-up which slowly continued to decrease over the course until the end of follow-up. Certainly, the A3C cohort benefits from a management for the treatment of NPS that may include medications and/or non-pharmacological approaches that may be considered as “usual care” since no intervention was implemented in A3C, and as consequence, A3C still studies natural history of the agitation syndrome in usual care settings. However, the specific design of A3C with a follow up with short periods of time between visits, could be considered as a way of intervention similar to clinical trials. This could explain the continued decrease of agitation symptoms over time during the entire follow-up.
Several organizations, including the Food and Drug Administration and the European Alzheimer’s Disease Consortium, have expressed interest in better characterizing NPS, such as psychosis or depression in AD, which would be highly relevant for treating NPS in AD (28). Consensus diagnosis criteria for Agitation in AD were recently proposed (16). To our knowledge, the A3C study is the first cohort study using these criteria in a longitudinal observational study design. The whole A3C population met the criteria for A/A syndrome based on IPA criteria definition: three quarters showed excessive motor activity and/or verbal aggression and physical aggression occurred in a lesser patient. Gaining clarity about the clinical entity of A/A is of great importance, since its different phenotypes may delineate underlying disruptions in specific neuronal regions and/or circuitry and shed light on etio-pathogenesis enhancing the development of pharmacologic or non-pharmacologic treatments for specific A/A phenotypes. Aggressive behavior may respond to a drug differently than excessive motor activity behavior. Moreover, improving knowledge about pathogenesis pathways of A/A may lead to the study of biomarkers and to the increase of the use of biomarkers to maximize the productivity of clinical trials for NPS (12).
Of the common NPS, the natural history of A/A (phenomenology, course and associated factors) in AD is least well understood resulting in the lack of a “gold standard” efficacy outcome in therapeutics research development. In fact, little is known about the natural course of clinically significant A/A in AD patients, about factors influencing this course or about the variability of different outcome measures over time, such as the NPI or CMAI. This is even more evident for new scales such as the NPI-C. Findings from A3C will provide a better estimation of placebo group variability in trials, thus allowing for, more precise power estimates. Moreover, knowledge of the impact of demographic and other variables, such as vascular diseases or other NPS, on the trajectory of A/A overtime might improve the homogeneity of the sample population. In order to assess agitation response to treatment, three approaches have been used in clinical trials: 1) structured caregiver interviews (NPI-A/A, CMAI), 2) expert clinician scale ratings (NPI-C A/A), and 3) structured global ratings (modified ADCS-CGIC, CGI-C) based on judgment of experienced clinicians (11, 29). In order to complement NPS ratings based on caregiver report, clinical global ratings are used, since their strength is their being derived from experienced clinicians. A recent EU-US-CTAD Task Force (12) highlighted that choosing the best outcome measure for clinical trials was the key to treatment development for A/A and proposed to use a combining clinician- and caregiver-derived outcome as primary efficacy outcome measure in absence of gold standard (12). In the meantime, this Task Force encouraged using existing datasets to construct an evidence-based single novel measure of agitation by selecting items subsets of existing scales. Data from the A3C study will help in answering this question and in moving forward the field.
The main limitation of A3C is attrition of more than 30 % during one year of follow-up. Attrition is common in cohorts of older adults, especially when patients are affected by a severe and progressive chronic disease such as AD (30). The AD patients from A3C were notably and medically frail, and present a particularly severe complex form of disease with major complications such as distressing NPS and as consequence, with a higher risk of adverse outcomes that may explain this higher attrition compared with previous cohorts of AD patients. However, attrition during the three first months, the critical period of A3C study, was much lower (<20%). The data of the attrition will also help for calculating sample size in future trials. The second limitation is that the diagnosis of AD was based on clinical criteria and there was no requirement for biomarker confirmation. Therefore, our population has a possible AD. The lumbar puncture was only performed in 23 patients (8.9%); neither physio pathological biomarkers nor neuro-imaging biomarkers were performed in a standardized protocol since A3C cohort was a usual care survey, and in general it is not clinical standard of care to assess AD biomarkers in a cohort with such advanced dementia as A3C.
The A3C study addresses a clinically important population, AD older patients with disruptive NPS, which are often under assessed and excluded from the research field. Thus, this study gives the opportunity if developing research in this vulnerable population. In addition, data from A3C study may improve clinical practice by better defining and measuring agitation and, consequently, by better targeting pharmacological and non-pharmacological treatments.
Little is known about the longitudinal course of clinically significant A/A in AD patients, about factors influencing this course or about the variability of different outcome measures over time, such as the NPI or CMAI, or the definition of a clinically meaningful improvement in these scales. This is even more evident for new scales such as the NPI-clinician rating. A3C study may provide useful data in order to improve clinical practice and to optimize future clinical trials of treatments for agitation symptoms in AD.


Acknowledgments: L. Bories, A. Roustan, Y. Gasnier, S. Bordes, M.N. Cufi, F. Desclaux, Y. Gasnier, N. Gaits, M. Péré-Saun, S. Bordes.

Funding: The A3C cohort was supported by Ethypharm and Toulouse University Hospital.

Conflict of Interest: C. Lyketsos declares: 1) Grant support (research or CME) from NIH, Functional Neuromodulation, Bright Focus Foundation and 2) Payment as consultant or advisor from Avanir, Astellas, Roche, Karuna, SVB Leerink, Maplight, Axsome, Global Institute on Addictions. None conflict for the others authors.

Ethical Standards: A3C had ethical approval and oversight from the local Institutional Review Board (Toulouse University Hospital).

Open Access: This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits use, duplication, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.



1. Cohen-Mansfield J. Conceptualization of agitation: results based on the Cohen-Mansfield agitation inventory and the agitation behavior mapping instrument. International Psychogeriatrics. 1996;8(Suppl 3):309–15.
2. Gonfrier S, Andrieu S, Renaud D, Vellas B, Robert PH. Course of neuropsychiatric symptoms during a 4-year follow up in the REAL-FR cohort. Journal of Nutrition Health Aging. 2012;16(2):134-7.
3. Kales HC, Lyketsos CG, Miller EM, Ballard C. Management of behavioral and psychological symptoms in people with Alzheimer’s disease: an international Delphi consensus. International Psychogeriatrics. 2018; 31 (1): 83-90.
4. Selbaek G, Engedal K, Benth JS, Bergh S. The course of neuropsychiatric symptoms in nursing home patients with dementia over a 53-month follow-up period. International Psychogeriatrics. 2014; 26 (1):81-91.
5. Gonzalez-Salvador MT, Arango C, Lyketsos CG, Barba AC. The stress and psychological morbidity of the Alzheimer patient caregiver. International Journal of geriatric psychiatry. 1999; 14(9): 701-10.
6. Okura T, Plassman BL, Steffens DC, Llewellyn DJ, Potter GG, Langa KM. Neuropsychiatric symptoms and the risk of institutionalization and death: the aging, demographics, and memory study. Journal of American Geriatric Society. 2011;59(3):473‑81.
7. Taragano FE, Allegri RF, Krupitzki H, et al. Mild behavioral impairment and risk of dementia: a prospective cohort study of 358 patients. Journal of Clinical Psychiatry. 2009;70(4):584-92.
8. Peters ME, Schwartz S, Han D, et al. Neuropsychiatric symptoms as predictors of progression to severe Alzheimer’s dementia and death: the Cache County Dementia Progression Study. American Journal of Psychiatry. 2015;172(5):460-5.
9. Costa N, Wubker A, de Mauleon A, et al. Costes of care of agitation associated with dementia in 8 European countries: results from the RighTimePlaceCare Study. Journal of American Medical Directors Association. 2018; 19 (1): 95 e1-e10.
10. Kales HC, Gitlin LN, Lyketsos CG. Assessment and management of behavioral and psychological symptoms of dementia. British Medical Journal. 2015; 350: h369.
11. Soto M, Andrieu S, Nourhashemi F, et al. Medication development for agitation and aggression in Alzheimer disease: Review and discussion of recent randomized clinical trial design. International Psychogeriatrics. 2014: 1-17.
12. Sano M, Soto M, Carillo M, et al. Identifying better outcome measures to improve treatment of agitation in dementia: a report from the EU/US/CTAD Task Force. Journal of Prevention of Alzheimer’s Disease. 2018; 5(2): 98-102.
13. Mueller SG, Weiner MW, Thal LJ, et al. Ways toward an early diagnosis in Alzheimer’s disease: The Alzheimer’s Disease Neuroimaging Initiative (ADNI). Alzheimer’s & Dementia: Journal of Alzheimer’s Association. July 2005;1(1):55 66.
14. McKhann GM, Knopman DS, Chertkow H, et al. The diagnosis of dementia due to Alzheimer’s disease: Recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimers Dementia. 2011;7(3):263–9
15. Trzepacz PT, Saykin A, Yu P, et al. Alzheimer’s Disease Neuroimaging Initiative. Subscale validation of the neuropsychiatric inventory questionnaire: comparison of Alzheimer’s disease neuroimaging initiative and national Alzheimer’s coordinating center cohorts. American Journal of Geriatric Psychiatry. 2013; 21:607-22.
16. Cummings J, Mintzer J, Brodaty H, et al. Agitation in cognitive disorders: International Psychogeriatric Association provisional concensus clinical and research definition. International Psychogeriatrics. 2015; 27(1):7-17
17. Cummings, JL, Mega M, Gray K, Rosenberg-Thompson S, Carusi DA, Gornbein J. The neuropsychiatric inventory: comprehensive assessment of psychopathology in dementia. Neurology, 1994. 44(12): p. 2308-14.
18. de Medeiros K, Robert P, Gauthier S, et al. The Neuropsychiatric Inventory-Clinician rating scale (NPI-C): reliability and validity of a revised assessment of neuropsychiatric symptoms in dementia. International Psychogeriatrics. 2010; 22:984–994.
19. Schneider LS, Olin JT, Doody RS, et al. Validity and reliability of the Alzheimer’s Disease Cooperative Study-Clinical Global Impression of Change. The Alzheimer’s disease cooperative Study. Alzheimer disease Association Disorder. 1997; 11 Suppl 2:S22-32.
20. Drye LT, Ismail Z, Porsteinsson AP, et al., CitAD Research Group. Citalopram for agitation in Alzheimer’s disease: design and methods. Alzheimers Dementia. 2012;8(2):121-30.
21. Guy W. ECDEU Assessment Manual for Psychopharmacology, revised. 1976. Bethesda, MD: US Department of Health, Education and Welfare.
22. Folstein MF, Folstein SE, McHugh PR. “Mini-Mental State”: a practical method for grading the cognitive state of patients for the clinician. Journal of Psychiatric Research. 1975;12:196–8.
23. Katz S, Ford AB, Moskowitz RW, Jackson BA, Jaffe MW. Studies of illness in the aged. The index of ADL. A standardized measure of biological and psychological function. Journal of American Medical Association. 1963; 185: 914-19.
24. Sourdet S, Van Kan GA, Soto ME, et al. Prognosis of an abnormal one-leg balance in community-dwelling patients with Alzheimer’s disease: a 2-year prospective study in 686 patients of the REAL.FR study. Journal of American Medical Director Association. 2012; 13(4): 407.
25. Logsdon R.G., Gibbons, LE, McCurry, SM and Terri, L. Quality of life in Alzheimer’s disease: Patient and caregiver reports. Journal of Mental Health and Aging. 1999; 5(1), 21-32.
26. Wimo A, Nordber G. Validity and reliability of assessments of time. Comparisons of direct observations and estimates of time by the use of the Resource Utilisation in Dementia (RUD) –instrument. Archive of Gerontology and Geriatrics. 2007;44(1):71-81 (2007).
27. Anatchkova M, Brooks A, Swett L, et al. Agitation in patients with dementia: a systematic review of epidemiology and association with severity and course. International psychogeriatrics. 2019; 11:1-14.
28. Lyketsos CG, Lee HB. Diagnosis and treatment of depression in Alzheimer’s disease. A practical update for the clinician. Dementias and Geriatric Cognitive Disorders. 2004; 17 (1-2): 55-64.
29. Soto M, Abushakra S, Cummings J, et al. Progression in treatment development for neuropsychiatric symptoms in Alzheimer’s disease: focus on agitation and aggression. A report from the EU/US/CTAD Task Force. The journal of Prevention if Alzheimer’s Disease. 2015; 2(3): 184-188.
30. Coley N, Gardette V, Toulza O, et al. Predictive factors of attrition in a cohort of Alzheimer disease patients. The REAL.FR study. Neuroepidemiology. 2008;31(2):69‑79.