jpad journal

AND option

OR option

ALZHEIMER’S DISEASE COMPOSITE SCORE: A POST-HOC ANALYSIS USING DATA FROM THE LIPIDIDIET TRIAL IN PRODROMAL ALZHEIMER’S DISEASE

 

S.B. Hendrix1, H. Soininen2,3, A.M.J. van Hees4, N. Ellison1, P.J. Visser5,6, A. Solomon2,7,8, A. Attali4, K. Blennow9,10, M. Kivipelto2,7,8, T. Hartmann11,12

 

1. Pentara Corporation, Salt Lake City, UT, USA; 2. Department of Neurology, Institute of Clinical Medicine, University of Eastern Finland and Kuopio University Hospital, Kuopio, Finland; 3. Neurocenter, Department of Neurology, Kuopio University Hospital, Kuopio, Finland; 4. Danone Nutricia Research, Nutricia Advanced Medical Nutrition, Utrecht, the Netherlands; 5. Department of Psychiatry and Neuropsychology, Alzheimer Center Limburg, University of Maastricht, Maastricht, the Netherlands; 6. Department of Neurology, Alzheimer Center, VU University Medical Center, Amsterdam, the Netherlands; 7. Department of Clinical Geriatrics, NVS, Karolinska Institutet, Huddinge, Sweden; 8. Clinical Trials Unit, Department of Geriatric Medicine, Karolinska University Hospital, 14152 Huddinge, Sweden;
9. Department of Psychiatry and Neurochemistry, Institute of Neuroscience and Physiology, The Sahlgrenska Academy at University of Gothenburg, Mölndal, Sweden; 10. Clinical Neurochemistry Laboratory, Sahlgrenska University Hospital, Mölndal, Sweden; 11. Deutsches Institut für Demenz Prävention (DIDP), Medical Faculty, Saarland University, Homburg, Germany; 12. Department of Experimental Neurology, Saarland University, Homburg, Germany

Corresponding Author: Suzanne B Hendrix, Pentara Corporation, 2180 Claybourne Avenue, Salt Lake City, UT 84109 USA. Email: shendrix@pentaracorp.com; Phone: +1 (801) 898-7241

J Prev Alz Dis 2019;
Published online September 10, 2019, http://dx.doi.org/10.14283/jpad.2019.33

 


Abstract

As research evolves in prodromal AD, the need to validate sufficiently sensitive outcome measures, e.g. the Alzheimer’s Disease Composite Score (ADCOMS) is clear. In the LipiDiDiet randomized trial in prodromal AD, cognitive decline in the study population was much less than expected in the timeframe studied. While the primary composite endpoint was insufficiently sensitive to detect a difference in the modified intention to treat population, the per-protocol population showed less decline in the active than the control group, indicating better treatment effects with regular product intake. These results were further strengthened by significant benefits on secondary endpoints of cognition and function, and brain atrophy. The present post-hoc analysis investigated whether ADCOMS could detect a difference between groups in the LipiDiDiet population (138 active, 140 control). The estimated mean change in ADCOMS from baseline (standard error) was 0.085 (0.018) in the active and 0.133 (0.018) in the control group; estimated mean treatment difference -0.048 (95% confidence intervals -0.090, -0.007; p=0.023), or 36% less decline in the active group. This suggests ADCOMS identified the cognitive and functional benefits observed previously, confirming the sensitivity of this composite measure.

Key words: Alzheimer’s disease, prodromal, cognitive function, nutrients, Souvenaid, Fortasyn.


 

Introduction

Prodromal Alzheimer’s disease (AD) is characterized by mild cognitive and functional impairment with defined changes in specific biomarkers (1-3). The LipiDiDiet trial was one of the first randomized clinical trials conducted in subjects with prodromal AD who were selected using the clinical and biomarker-based criteria originally described by Dubois et al. (1). The trial investigated the effects of the specific nutrient combination Fortasyn Connect (Souvenaid) on cognitive, functional, and other disease related parameters in this population (4). We previously reported that the intervention had no significant effect in the primary analysis on the 2-year primary endpoint, a 5-item neuropsychological test battery (NTB), yet significant differences for this endpoint were found in the pre-defined secondary analysis of the per-protocol population and the pre-defined subgroup analysis (4). Of note, in this trial population, the rate of cognitive decline as measured by the NTB score was several times less than expected, which means that the primary endpoint was insufficiently sensitive to detect a difference between the interventional and control groups (4). While such an observation adds important information about the early clinical course of prodromal AD (5, 6), it clearly highlights the ongoing need for more sensitive tools to detect changes in cognitive performance in this population.
Evaluating the effects of interventions for mildly affected populations with only limited cognitive and functional decline and subtle impairment such as subjects with prodromal AD, requires the use of sufficiently sensitive and informative composite outcome measures. The Clinical Dementia Rating – Sum of Boxes (CDR-SB) has been proposed as such a measure (7). More recently, the Alzheimer’s Disease Composite Score (ADCOMS) was developed as a broader composite clinical outcome measure for trials in prodromal and mild AD dementia (8). ADCOMS consists of cognitive and functional items from three commonly used scales in AD dementia trials: the Alzheimer’s Disease Assessment Scale – cognitive subscale (ADAS-cog), Mini-Mental State Examination (MMSE), and CDR-SB. In subjects with early AD, the combination of selected items from these scales was shown to have the highest sensitivity for measuring changes and intervention effects over time compared with the individual scales (8). Preliminary results from the first randomized controlled trial using ADCOMS as the primary outcome measure were interpreted as supporting the applicability of this composite score in subjects with early AD (9). However, more studies are needed to establish general applicability across different trial settings and the contribution of the different subdomains to the composite.
ADCOMS has been proposed as a new standard outcome measure for trials in prodromal AD; therefore, we did a post-hoc analysis of data from the LipiDiDiet trial primarily to compare Fortasyn Connect and control groups using ADCOMS and its subdomains as a potentially more sensitive measure of intervention effects than the NTB used in the primary analysis. An additional aim of the analysis was to use data from subjects with prodromal AD to provide broader knowledge of ADCOMS as a single clinical outcome measure in early AD trials.

 

Subjects and methods

Detailed methods for the LipiDiDiet trial (Netherlands Trial Registry NTR1705) were published previously (4). In summary, LipiDiDiet was a 24-month, double-blind, parallel-group, multi-center randomized controlled trial (11 sites in Finland, Germany, the Netherlands, and Sweden), with optional 12-month double-blind extensions. Eligible participants with prodromal AD, defined according to the International Working Group (IWG)-1 criteria (1), were randomly assigned (1:1) to active intervention (once-daily 125 mL drink containing the multinutrient combination Fortasyn Connect provided by Nutricia [Zoetermeer, the Netherlands]) or a same-taste iso-caloric control product. The primary outcome was the change in a cognitive function composite z-score based on five items of an NTB. CDR-SB was a secondary outcome while ADAS-cog-13 and MMSE were exploratory parameters. Participants provided written consent and the trial was approved by ethics committees of all sites and done in accordance with the Declaration of Helsinki and International Conference on Harmonization Good Clinical Practice guidelines.
We used the LipiDiDiet trial data to do a post-hoc analysis of outcomes included in the ADCOMS tool, which consists of four ADAS-Cog subscale items (delayed word recall, orientation, word recognition, and word finding difficulty), two MMSE items (orientation time and drawing), and all six CDR-SB items (personal care, community affairs, home and hobbies, judgement and problem solving, memory, and orientation), as described previously by Wang and colleagues (8).
In this analysis, ADCOMS scores were calculated using the selected 12 items and corresponding partial least squares coefficients. Composite scores range from 0.0 to a maximum of 1.97, where higher values indicate worse performance. The contribution of the separate subdomains (ADAS-cog, MMSE, and CDR-SB) to the total score was explored by calculating the separate domains based on the same items and coefficients. Total ADCOMS scores and subdomain scores were calculated only if subject data were available for all 12 items. Statistical analyses were performed as planned using linear mixed models for repeated measures with real measurement time as continuous variable (primary model) or planned visit time as categorical variable (planned sensitivity model) in a modified intention-to-treat (mITT) population of all participants randomly assigned, excluding data after the start of rescue medication (defined as use of active product or Alzheimer’s disease medication after dementia diagnosis). Further details about these statistical models were described previously (4). Additional sensitivity analyses using the primary and sensitivity models with baseline in the outcome vector, a 2-sided, independent t-test, and a non-parametric Mann-Whitney U test were performed to test the robustness of results. Effect sizes were reported using Cohen’s d standardized effect size calculated based on the mean treatment difference over 24 months, estimated in the mixed model and pooled SD based on the sample size at the 24-month visit. Similar analyses were also done on a per-protocol dataset excluding participants with major protocol deviations.

 

Results

This analysis includes data obtained from 311 participants with prodromal AD (153 active group and 158 control group) enrolled between April 20, 2009, and July 3, 2013. In the mITT population, data were available for the post-hoc ADCOMS analysis from 278 participants (138 active and 140 control) at baseline, 225 (109 active and 116 control) at month 12, and 164 (73 active and 91 control) at month 24, which is comparable to the data available for the mITT analysis of the NTB primary outcome in the original paper (4).
ADCOMS scores at baseline were 0.258 (standard deviation [SD] 0.143, n=138) in the active group and 0.247 (SD 0.140, n=140) in the control group (Table 1a). Figure 1 shows changes in ADCOMS scores and subdomain scores during the 24-month intervention period. While both groups showed higher ADCOMS scores over time, worsening was 36% less in the active group than in the control group (Figure 1A). The estimated mean change from baseline (standard error) was 0.085 (0.018) in the active group and 0.133 (0.018) in the control group; the corresponding estimated mean treatment difference was -0.048 (95% confidence intervals -0.090 to -0.007; p=0.023). Analysis of the ADCOMS subdomains (Figures 1B-D) showed that the difference between active and control groups was greatest for the six-item CDR-SB subdomain (34% less worsening) and the 2-item MMSE subdomain (63% less worsening). The estimated mean change from baseline (standard error) was 0.065 (0.016) in the active group and 0.099 (0.016) in the control group for the six-item CDR-SB subdomain (p=0.033), and 0.007 (0.005) in the active group and 0.019 (0.005) in the control group for the 2-item MMSE subdomain (p=0.065). No differences between groups were observed for the 4-item ADAS-cog subdomain. The planned sensitivity analysis showed significant differences between groups over 24 months in worsening of ADCOMS scores (p=0.023) and worsening of six-item CDR-SB (p=0.032), while there was a trend on the 2-item MMSE (p=0.068) and no difference on the 4-item ADAS-cog (p=0.499). The additional sensitivity analyses on ADCOMS and subdomains confirmed the results (ADCOMS: primary model with baseline in the outcome vector, p=0.038; t-test, p=0.059; Mann-Whithney U test, p=0.036).
Per-protocol analysis including baseline data from 257 participants (129 active and 128 control) confirmed the findings in the mITT analysis (Table 1b).

Table 1. Post-hoc analysis of ADCOMS and its subdomains using LipiDiDiet trial data. (a) mITT population; (b) per-protocol population

Table 1. Post-hoc analysis of ADCOMS and its subdomains using LipiDiDiet trial data. (a) mITT population; (b) per-protocol population

1. Higher scores indicate worse performance; 2. Data for active and control groups are presented as observed means and SD; 3. Difference is calculated as (active − control) based on least squares means for change from baseline over 24 months as estimated in the mixed model; 4. Percent less worsening active vs control based on least squares means for change from baseline over 24 months as estimated in the mixed model; 5. MM (mixed model): linear mixed model for longitudinal data with change from baseline as outcome, baseline score and baseline MMSE as covariates, and real measurement time as a continuous variable. P value for effect of intervention over 24 months; 6. MMs (planned sensitivity model): mixed model for repeated measures with change from baseline as outcome, baseline score and baseline MMSE as covariates, and planned visit time as a categorical variable. P value for effect of intervention over 24 months; 7. Cohen’s d standardized effect size calculated based on the mean treatment difference over 24 months as estimated in the mixed model and the pooled SD. Results are presented so that a positive effect size indicates improved performance in the active vs. control group and vice versa; mITT=modified intention-to-treat: all randomly assigned participants, excluding visit data after the start of rescue medication; PP=per-protocol: all participants from the modified intention-to-treat population, excluding the respective visits of participants with major protocol deviations defined during a data review of masked data; ADCOMS= Alzheimer’s disease composite score. ADAS-cog=Alzheimer’s disease assessment scale–cognitive subscale. MMSE=mini-mental state examination. CDR-SB=clinical dementia rating – sum of boxes. CI=confidence interval. SD=standard deviation.

 

Effect size analyses of changes from baseline over 24 months on ADCOMS score showed Cohen’s d values of 0.31 in the mITT population and 0.39 in the per-protocol population, indicating a small to medium effect in the active group (10). Effect sizes >0.2 were also observed for the MMSE and CDR-SB subdomains in the mITT (0.27 and 0.25, respectively) and per-protocol (0.25 and 0.33, respectively) analyses.

Figure 1. Changes in ADCOMS and its subdomains during the 24-month intervention

Figure 1. Changes in ADCOMS and its subdomains during the 24-month intervention

(A) Alzheimer’s Disease Composite Score. (B) Clinical Dementia Rating – Sum of Boxes 6-item subdomain. (C) Alzheimer’s Disease Assessment Scale–cognitive subscale 4-item subdomain. (D) Mini-Mental State Examination 2-item subdomain. Data are observed mean change from baseline; error bars are standard error.  * p<0.05 (mixed model, modified intention-to-treat).

 

Discussion

Research practice in subjects with prodromal AD is still evolving, and since the 24-month LipiDiDiet trial database was locked, there has been a growing recognition that combined cognitive-functional measurement tools may provide a more sensitive way to assess the efficacy of novel interventions than those currently available (7, 11). To reflect contemporary research practice, we used ADCOMS in a post-hoc analysis of the LipiDiDiet trial data and found a significant intervention effect for Fortasyn Connect over 24 months in subjects with prodromal AD. The active group showed significantly less clinical decline over 24 months as measured by ADCOMS, and this effect was driven largely by differences in the CDR-SB and MMSE subdomains. We previously reported a significant benefit for Fortasyn Connect using CDR-SB and showed that stabilization of CDR-SB scores was more pronounced with increasing baseline MMSE (4), which supports the notion that early rather than late treatment within the prodromal phase of dementia may lead to better outcomes when using CDR-SB as a cognitive-functional measure. ADCOMS data in this post-hoc analysis (data not shown) also suggest that earlier intervention is associated with better outcomes for Fortasyn Connect.
The ADCOMS score is weighted toward the CDR-SB which functions as the framework of the score, but only takes on values from 0.5 to 7 (in increments of 0.5) for the majority of participants. The MMSE and ADAS-cog items provide further discriminatory ability between these seven points, enhancing the performance of the scale, but not performing as reliably when isolated. The inclusion of multiple measures of important cognitive domains stabilizes estimates and protects against spurious results. The CDR-SB has historically been more sensitive to progression, but less sensitive to treatment effects due to low variability, contrasted with cognitive scales which have been more sensitive to treatment effects but also highly variable. The weighted combination was designed to combine changes between points on the CDR-SB with detailed changes in cognitive items, with the sensitive items potentially differing from one study to another. In this case, the CDR-SB items and the MMSE items were sensitive to changes, and the ADAS-cog items were less sensitive, allowing the ADCOMS scale to detect treatment related changes due to both functional and cognitive contributions.
The effect size analysis reported here indicates that the magnitude of the intervention effect measured using ADCOMS was large enough to be clinically detectable. The effect size for ADCOMS (Cohen’s d 0.31) was similar to the value previously reported for CDR-SB (0.33) (4). The magnitude of the intervention effects seen with ADCOMS and CDR-SB, both in this analysis and the original trial report (4), were more pronounced in the per-protocol analysis, possibly reflecting the importance of long-term protocol adherence.
These results should be interpreted with caution because of the post-hoc nature of the analysis with a relatively new cognitive-functional measurement tool. Nevertheless, ADCOMS was developed using robust methodology (8), and these analyses further contribute to the validation of ADCOMS in clinical trials in subjects with early AD and suggest applicability and sensitivity across different intervention strategies in the earliest stages of dementia. Our post-hoc ADCOMS analyses are consistent with the overall findings from the LipiDiDiet trial (4) and in combination with data from other authors (8), provide further evidence that ADCOMS, a broad measure of cognitive function, may be useful over a range of interventions and trial designs in early AD.
In conclusion, this analysis suggests that the cognitive and functional benefits observed in the LipiDiDiet trial were also identified using ADCOMS, adding to the accumulating evidence validating this sensitive and broad composite outcome measure in prodromal AD trials.

 

Funding: The research leading to these results was mainly funded by the European Commission under the 7th framework program of the European Union (grant agreement number 211696). Additional funding was provided by the EU Joint Program – Neurodegenerative Disease Research (MIND-AD grant); Kuopio University Hospital, Finland (EVO/VTR grant); and Academy of Finland (grant 287490). These funders had no role in the design and conduct of the study; in the collection, analysis, and interpretation of data; in the preparation of the manuscript; or in the review or approval of the manuscript. This post-hoc analysis was funded by Danone Nutricia Research and performed by Pentara Corporation. The corresponding author had final responsibility for the decision to submit for publication.

Acknowledgments: We thank all participants enrolled in the study and their families; all members of the LipiDiDiet clinical study group; all investigators and on-site study staff for their efforts in the conduct of the field work.

Conflict of interest: SBH and NNE report financial compensation for statistical analysis from Danone Nutricia Research. HS reports personal fees from ACImmune and MERCK, outside the submitted work. AMJH and AA are employees of Danone Nutricia Research. PJV reports grants from Inn ovative Medicine Initiative and ZonMw, during the conduct of the study, and non-financial support from GE Healthcare and grants from Biogen, outside the submitted work. AS reports grants from Academy of Finland, during the conduct of the study, and grants from Alzheimerfonden Sweden and Stockholm County Council (ALF), outside the submitted work. MK reports grants from EU Joint Program – Neurodegenerative Disease Research (MIND-AD), during the conduct of the study, and grants from Alzheimerfonden Sweden, Stockholm County Council (ALF), Academy of Finland, Swedish Research Council, Knut and Alice Wallenberg Foundation, Center for Innovative Medicine at Karolinska Institutet, Sweden, and Stiftelsen Stockholms Sjukhem, Sweden, outside the submitted work. TH reports grants from EU FP7 (LipiDiDiet), EU Joint Program – Neurodegenerative Disease Research (MIND-AD), and Danone Nutricia Research (LipiDiDiet Extension), during the conduct of the study. KB has nothing to disclose.

Ethical standards: The study was approved by ethics committees of all sites and done in accordance with the Declaration of Helsinki and International Conference on Harmonization Good Clinical Practice guidelines.

 

References

1. Dubois B, Feldman HH, Jacova C, et al. Research criteria for the diagnosis of Alzheimer’s disease: revising the NINCDS-ADRDA criteria. Lancet Neurol. 2007;6:734-746.
2. Dubois B, Feldman HH, Jacova C, et al. Advancing research diagnostic criteria for Alzheimer’s disease: the IWG-2 criteria. Lancet Neurol. 2014;13:614-629.
3. Dubois B, Hampel H, Feldman HH, et al. Preclinical Alzheimer’s disease: Definition, natural history, and diagnostic criteria. Alzheimers Dement. 2016;12:292-323.
4. Soininen H, Solomon A, Visser PJ, et al. 24-month intervention with a specific multinutrient in people with prodromal Alzheimer’s disease (LipiDiDiet): a randomised, double-blind, controlled trial. Lancet Neurol. 2017;16:965-975.
5. Hamel R, Kohler S, Sistermans N, et al. The trajectory of cognitive decline in the pre-dementia phase in memory clinic visitors: findings from the 4C-MCI study. Psychol Med. 2015;45:1509-1519.
6. Ellis KA, Szoeke C, Bush AI, et al. Rates of diagnostic transition and cognitive change at 18-month follow-up among 1,112 participants in the Australian Imaging, Biomarkers and Lifestyle Flagship Study of Ageing (AIBL). Int Psychogeriatr. 2014;26:543-554.
7. U.S. Department of Health and Human Services Food and Drug Administration. Early Alzheimer’s Disease: Developing Drugs for Treatment Guidance for Industry (Draft Guidance) 2018. Available from: https://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidances/UCM596728.pdf.
8. Wang J, Logovinsky V, Hendrix SB, et al. ADCOMS: a composite clinical outcome for prodromal Alzheimer’s disease trials. J Neurol Neurosurg Psychiatry. 2016;87:993-999.
9. Swanson CJ, Zhang Y, Dhadda S, et al. Treatment of early AD subjects with BAN2401, an anti-Aβ protofibril monoclonal antibody, significantly clears amyloid plaque and reduces clinical decline. Alzheimer’s Association International Conference; 20 – 26 July 2018; Chicago, USA2018. p. Abstract ID: 27531.
10. Cohen J. Statistical power analysis for the behavioral sciences (Second edition). 2 ed: Lawrence Erlbaum Associates; 1988.
11. Vellas B, Bateman R, Blennow K, et al. Endpoints for pre-dementia AD trials: A report from the EU/US/CTAD Task Force. J Prev Alzheimers Dis. 2015;2:128-135.

METHODOLOGICAL ASPECTS OF THE PHASE II STUDY AFF006 EVALUATING AMYLOID-BETA -TARGETING VACCINE AFFITOPE® AD02 IN EARLY ALZHEIMER’S DISEASE – PROSPECTIVE USE OF NOVEL COMPOSITE SCALES

 

S. Hendrix1, N. Ellison1, S. Stanworth1, L. Tierney2, F. Mattner2, W. Schmidt2, B. Dubois3, A. Schneeberger2

 

1. Pentara Corporation, Salt Lake City, UT, USA; 2. AFFiRiS AG, Vienna, Austria; 3. Cognitive and behavioral disease center and Alzheimer´s institute, Research INSERM Unit “Cognition, Neuroimaging, and brain diseases, Salpêtrière University Hospital, Paris, France

Corresponding Author: Suzanne Hendrix, Pentara Corporation, Salt Lake City, UT, USA, Telephone: ++1-801-898-724, Fax: ++1-801-486-7467, shendrix@pentaracorp.com

J Prev Alz Dis 2015;2(2):91-102
Published online May 13, 2015, http://dx.doi.org/10.14283/jpad.2015.67

 


Abstract

BACKGROUND: Optimized scales and composite outcomes have been proposed as a way to more accurately measure Alzheimer’s disease related decline. AFFITOPE® AD02, is an amyloid-beta (Aβ)-targeting vaccine to elicit anti-Aβ antibodies. IMM-AD04, commonly known as Alum, originally designated as a control agent, appeared to have disease-modifying activity in a multicenter, parallel group phase II study in early AD patients.

OBJECTIVES: To develop adapted outcomes for cognition, function and a composite scale with improved sensitivity to decline and treatment effects in early AD (mild plus prodromal AD) based on historical data and to assess these adapted outcomes in this phase II study.

DESIGN: Data from public datasets was analyzed using a partial least squares model in order to identify an optimally weighted cognitive outcome, Adapted ADAS-cog, and an optimally weighted ADL outcome, Adapted ADCS-ADL which were prospectively defined as co-primary endpoints for the study and were also combined into a composite scale. Data from 162 patients in the placebo groups of ADCS studies and 156 mild patients in the ADNI I study were pooled for this analysis. The Adapted ADAS-cog scale considered 13 ADAS-cog items as well as several Neuropsychological test items and CogState items, the Adapted ADCS-ADL considered all ADCS-ADL items.  After the pre-specified analyses were complete, additional adapted and composite scales were investigated in a post-hoc manner. Evaluation of the adapted and composite scales was performed on Phase II trial data for AFFITOPE® AD02 (AFF006, Clinical Trial Identifier: NCT01117818) and historic data in early AD. Least square means, standard deviations, and least squares mean to standard deviation ratios were compared among adapted and composite scales and traditional scales for the 5 treatment groups in the phase II study and overall for the historic data. Treatment effect sizes and p-values were also compared for the phase II study.

RESULTS:  Cognitive items that were selected for the adapted cognitive scale (aADAS-cog) and had the highest weights were Word Recall, Word Recognition, and Orientation. Delayed Word Recall and Digit Cancellation were among the items excluded due to lack of improved sensitivity to decline. Highly weighted ADL items included in the adapted functional scale (aADCS-ADL) were using the telephone, traveling, preparing a meal/snack, selecting clothing, shopping and using appliances.  Excluded items were primarily basic ADLs such as eating, walking, toileting and bathing. Comparisons between traditional scales and primary outcome adapted scales show improved sensitivity to group differences with the adapted scales in the phase II trial. Most of the improvement in the sensitivity of the aADAS-cog and the aADCS-ADL is due to a larger treatment difference observed rather than the improved sensitivity to decline in the comparison groups.

CONCLUSION:  To our knowledge, this is the first study to prospectively use optimized scales as primary endpoints and to demonstrate the superior power of optimized scales and composites in early disease. Although it is possible that the treatment difference between randomized groups is due to a factor other than the treatment itself, for instance baseline imbalance, the improved power to detect these differences still argues in favor of the adapted scales. The issue of oversensitivity to detect treatment effects is controlled by selection of the alpha level for significance, and in our case will happen less than 5% of the time. Clinical relevance of the treatment difference should be assessed separately from statistical significance, and in this phase II study, is supported by significant or similar sizes of effect on function, behaviour and quality of life outcomes, which are important to patients and caregivers.

Key words: Aluminum, statistical methods, composite, prodromal, Alzheimer´s disease.  


 

Introduction 

Recent years have witnessed numerous failures in the development of Alzheimer´s disease (AD) therapeutics (1, 2). Reasons include the inability to intervene early enough in the disease process as a result of the low specificity of the 1984 diagnostic criteria (3), the complex and thus far not entirely understood pathophysiology of the disease (i.e. Aβ, the focus of many failed trials (4, 5), might be the wrong target) and a lack of validated biomarkers, among others. Today, progress in the diagnosis of AD along with a better understanding of its natural course now allows for earlier interventions.  Therapeutic interventions are now shifting to subjects in earlier disease stages, defining a need for scales that are more sensitive to change in early disease stages such as prodromal AD.

Currently, there is no single scale that can measure AD related decline at early stages. There are over 60 scales relevant for AD covering cognitive impairment, activities of daily living, behavior, and quality of life (2). Common tests, such as the ADAS-cog, were designed for patients with moderate AD. By definition, patients with pre-dementia AD have only subtle cognitive and functional defects. Consequently, patients in the early stage of the disease perform near the ceiling of traditional scales (6, 7). Multiple studies have found that ADAS-cog lacks an adequate response to MCI (8-11), highlighting the gap in our ability to discern disease progression and potential treatment effects at the earliest stages of the disease. At the same time, there is a growing level of understanding of early AD changes. For example, episodic memory and timed executive functioning are two of the most responsive, early cognitive domains that are changed in the healthy elderly to the pre-dementia AD and mild cognitive impairment (MCI) stage (12, 13) as well language, word finding and orientation difficulties (14-16).

Several studies have proposed alternative composite scores, which combine multiple cognitive and functional items into a total score to improve scale sensitivity and to potentially allow for shorter or smaller studies (15, 17-19). Both the FDA and the EMA have additionally suggested that they would not exclude the use of a validated composite scale for the MCI disease stage as sufficient proof for market approval (20, 21).

Here, we develop adapted cognitive and functional scores as well as a combined composite score in order to measure AD related decline in a way that improves sensitivity to decline for patients with mild and prodromal AD. We analyzed pooled Alzheimer’s Disease Neuroimaging Initiative (ADNI) mild and Alzheimer’s Disease Cooperative Study (ADCS) study placebo data to identify optimal items for measuring disease progression in this early AD population resulting in the creation of an adapted cognitive scale and an adapted functional scale that were then combined into a composite scale. We used a phase II clinical study in mild and prodromal AD patients to test the adapted and composite scales. The clinical study used investigated AFFITOPE® AD02, an Aβ targeting vaccine eliciting anti-Aβ antibodies. IMM-AD04, commonly known as Alum, which was originally designated as the control agent, appeared to have disease-modifying activity. Additionally, we used the phase II study data to investigate other questions about composites in a post-hoc manner.

Methods

An optimized scale for measuring cognitive decline, adapted ADAS-cog, and an optimized scale for measuring functional decline, adapted ADCS-ADL, were both developed using a partial least squares (PLS) regression model applied to historical data from the ADCS and ADNI I. A composite outcome was also created by summing the two adapted scales to measure overall decline. These adapted and composite scales were designated as primary outcomes in the AFF006 phase II study, providing prospective validation data.

Adapted ADAS-cog (aADAS-cog)

The Alzheimer Disease Assessment Scale-cognitive subscale (ADAS-cog) (22, 23) is a test battery that assesses performance on eleven cognitive tasks or items: orientation, three trials of a 10-word list learning task, three trials of a 12 word recognition task, recall of instructions, comprehension of commands, object and finger naming, word finding difficulty, expressive language, language comprehension, ideational praxis, and constructional praxis. The ADAS-cog scale extends from 0 to 70, with higher scores indicating greater cognitive impairment. Two additional items are included in the 13-item version of the ADAS-cog: Delayed Word Recall and Digit Cancellation.

The ADAS-cog was used as the basis for the adapted ADAS-cog (aADAS-cog) scale. The following items were considered for inclusion in the aADAS-cog scale: 1) All ADAS-cog13 items; 2) CogState (New Haven, CT, USA): Continuous paired and associate learning (CPAL), Identification Task (IDN), One Back Memory Task (ONB), Detection Task (DET), Go/No Go Task (GONG); 3) Verbal PAL Immediate/Delayed (Neuropsychological Test Battery, NTB); 4) NTB Category Fluency; and 5) NTB Digit Span forward and backward. All 3 historical studies measured the ADAS-cog13. The items from the ADAS-cog13 were included in the PLS model described below, and the additional items were then considered for inclusion based on individual sensitivity to decline from literature references as described below.

ADCS data are publicly available under (http://www.adcs.org/). The ADCS cohort included the placebo group from the 3-arm NSAID study (24) and the placebo group from the Homocysteine study (25). The analysis included data from 162 patients in the pooled placebo group who had cognitive data at 18 months.

Additional data used in this analysis were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (http://www.loni.ucla.edu/ADNI/). The ADNI cohort included multiple diagnostic groups: patients with AD, subjects with MCI, and healthy elderly (cognitively normal) participants. We used the June 4, 2013 sample including data from 156 mild patients who had cognitive data at 18 months.

The following two methods were used to create the aADAS-cog adapted scale: 1) PLS model using the Alzheimer´s disease cooperative study (ADCS) nonsteroidal anti-inflammatory drug (NSAID) trial, the ADCS Homocysteine trial and the Alzheimer’s Disease Neuroimaging Initiative (ADNI) datasets and 2) individual item sensitivity to decline as reported in the literature.

Adapted ADCS-ADL (aADCS-ADL)

The Alzheimer’s Disease Cooperative Study – Activities of Daily Living (ADCS-ADL) scale (26) is an inventory of informant-based items to assess activities of daily living and instrumental activities of daily living. Informants were asked whether patients attempt each of 23 items in the inventory and, if so, to comment on their levels of performance. Each criterion is graded on the level of dependence: patient performs independently (3 points), patient performs with assistance (1-2 points), or patient is unable to perform (0 points). The level of ADL-dependence is graded via the sum of these item scores with a total score of 0 indicating complete dependence and a total score of 78 indicating total independence.

The adapted ADCS-ADL (aADCS-ADL) included the complete set of items from ADCS-ADL in the PLS model, where items were removed or reweighted to improve sensitivity to decline; items from other ADL scales, such as the DAD which was measured only in the ADNI study, were not considered in this adapted scale, due to minimal availability of data.

The PLS model was applied to the pooled data from the ADCS NSAID study and ADCS Homocysteine study; however, the ADNI data was not used since the ADCS-ADL was not collected in that study.

Composite  

The aADAS-cog and aADCS-ADL were combined to create one prospectively defined global composite referred to as “Composite.” A simple sum of the two adapted scores, using the weights obtained prior to scaling them to 100 points, was calculated for the Composite. This score was then scaled to a 100 point range. This approach equates the point values of the cognitive and functional components of the scale rather than weighting them equally.

Partial Least Squares Regression Method  

The PLS model referred to above was fit using Proc PLS in SAS® v9.4, in order to identify the combination of either cognitive or functional items that correlated best with decline over time. The initial PLS models included change scores for all ADAS-cog13 or ADCS-ADL items as predictors and time since baseline as the response variable. The Variable Importance for Projection (VIP) statistic was calculated for each predictor in the model (27, 28).The VIP summarizes the contribution a variable makes to the model; therefore, if a variable has a small value of VIP, then it is a prime candidate for deletion. The variable with the lowest VIP is dropped and the PLS model is run again (backward selection) continuing until all variables have a VIP of 0.5 or greater (27, 28). The optimized combination of items was designated as this final weighted item combination.

Test clinical data  

Adapted and composite scales were developed for use in a randomized, placebo-controlled, parallel group, double-blind, multicenter phase II trial to assess the clinical and immunological activity as well as the safety and tolerability of repeated s.c. administrations of AFFITOPE® AD02 (AFF006, Clinical Trial Identifier: NCT01117818). Complete details on the study design, patient population, and vaccination application can be found in Schneeberger et al (29).

Comparison of adapted to traditional rating scales

Comparisons of adapted and composite scales to traditional scales for the same domains were used to assess the performance of the novel scales in the phase II study. The aADAS-cog was compared to ADAS-cog11 for assessment of cognition, aADCS-ADL was compared to ADCS-ADL for assessment of function, and the Composite was compared to CDR-sb for a global measure of disease progression.

Any comparison of absolute means and standard deviations or means and standard deviations of change scores can be misleading, even if they are standardized to the total range of the scale since the range is somewhat arbitrary and only part of the range is actually relevant to this stage of disease. For these reasons, we compare only the MSDRs within each of the control groups to evaluate the ability of the scales to measure decline sensitively. The standardized treatment effects are also compared to test the assumption of proportionality of treatment effects and also to see whether scales that are more sensitive to decline are also more sensitive to treatment effects, resulting in non-proportionality of treatment effects.

LSMean to Standard Deviation Ratios (LSMSDRs) for the 5 treatment groups from this phase II study were used to compare the sensitivity to decline of the scales. The expectation was that the MSDRs for the control group(s) or ineffective groups would be larger for the adapted scales than for the traditional scales, resulting in more power to see treatment effects based on the assumption of a proportional treatment effect.  Additionally, the standardized treatment difference was compared between novel and traditional scales, using the decline in the comparison group(s) as the reference standard. A proportional treatment effect would be indicated with a similar percent slowing of progression in the active group versus the comparison group(s). 

Post-Hoc analysis and resulting scales

After analyzing the phase II study, there was a concern that the Composite was potentially over weighting cognition at the expense of function. In addition, the CogState and NTB items had not been assessed in a historical dataset, leading to the following questions which were addressed post-hoc:

1- What effect did the CogState and NTB items have on the Composite and aADAS-cog;

2- What was the impact of functional (aADCS-ADL) versus cognitive (aADAS-cog) weighting on the Composite?

Additional composite and adapted scales were derived to answer these questions as described in the following sections.

Adapted ADAS-cog 2

 The Adapted ADAS-cog 2 (aADAS-cog 2) is a variation of the pre-specified aADAS-cog that excludes CogState and NTB items and rescales the remaining items so that the range of the score is 0-100. This scale was calculated to investigate the effect that CogState and NTB items had on the pre-specified aADAS-cog.

Composite 2 

Composite 2 is a variation of Composite that excludes CogState and NTB items while keeping the same individual item weights. This composite was created to investigate the effect of CogState and NTB items on the Composite.

Balanced Composite

The Balanced Composite is the sum of the aADAS-cog and aADCS-ADL after each is scaled to 50 points resulting in equal weight for cognition and function.

Balanced Composite 2

Balanced Composite 2 is the sum of aADAS-cog 2 and aADCS-ADL after each is scaled to 50 points resulting in equal weight for cognition and function.

Empirical Composite

A PLS model was fit to determine the optimal weighting of cognition (aADAS-cog 2) and function (aADCS-ADL).  The aADAS-cog 2 and aADCS-ADL were included in the PLS model and the derived weights were used to create the Empirical Composite.  The CogState and NTB items were excluded since these items were not available in the historical data used to derive the weighting. This analysis resulted in a 69% weighting of cognition and a 31% weighting of function.

Comparisons of Scales

The additional scales of aADAS-cog 2 and Composite 2 were calculated to determine what effect the CogState and NTB items had on the pre-specified aADAS-cog and Composite. All other post-hoc composites were derived to investigate how different weights on cognition and function affect the performance of a composite score. These post-hoc scales were compared to the pre-specified aADAS-cog, aADCS-ADL, Composite and traditional scales using the same methods and statistics as above applied to the pooled historical datasets as well as the 5 groups from the phase II study.

Results 

Weighted item combination for adapted ADAS-cog

The final PLS model identified six weighted items that efficiently measure decline based on the VIP criterion out of the 13 possible cognitive items after seven iterations, each of which removed one item from the adapted scale and optimally weighting the remaining items.

Based on this final model, it was determined that the best combination included the following ADAS-cog items: Word Recall, Orientation, Word Recognition, Recall Instructions, Spoken Language and Word Finding with weights shown in Table 1.   

CogState items and Verbal Paired Associates Learning Immediate and Delayed Recall (VPAL) from NTB were not available in historic datasets, although they were available in the phase II study. Based on literature references, some of the items were determined to be sensitive to change and therefore likely to improve the sensitivity of the ADAS-cog combination (30, 31). However, since there was no way to determine the weights from historic datasets, the weight for these items was an average of the weights selected for other items, scaled based on the range of the new item.  Weights of each item were then scaled, such that the range of the new adapted scale would be 0 to 100.

The aADAS-cog score is calculated by summing each item in the composite after the item has been multiplied by its associated weight.  The weights for all ADAS-cog items (Table 1) were results from the PLS iterations and the other item weights were derived so as to give average weight to these items:

AdaptedADAS-cog (aADAScog) = 2.02 * Word Recall + 1.65 * Orientation + 1.74 * Word Recognition 0.68 * Recall Instructions + 0.99 * Spoken Language + 1.24 * Word Finding + 6.62 * ONB + 0.19 * VPAL + 0.24 * Category Fluency.

Weighted Item results from PLS for Adapted ADCS-ADL scale 

The final model selected and assigned weights to 15 items from the 23 possible item options: 1) find his/her personal belongings; 2) go shopping; 3) performs hobbies/pastimes; 4) obtain a hot/cold beverage for him/herself; 5) make him/herself a meal or snack; 6)  talk about current events; 7) watch television; 8) keep appointments; 9) get around (or travel) outside of his/her home; 10) he/she left alone; 11) use a household appliance; 12) select his/her clothes for the day/ dressing; 13) read a magazine, newspaper or book; 14) use a telephone; and 15) write things down. The following items were excluded from the composite: 1) eating, 2) walking, 3) toileting, 4) cleaning / clearing dishes 5) garbage/litter 6) bathing 7) grooming and 8) conversation.

Weights of each item were scaled, such that the range of the composite is 0 to 100 (Table 1).  The items that were included in the composite were used to calculate the adapted ADCS-ADL by summing the items after individual weights have been applied:

aADCS-ADL = 1.54 * Belongings + 1.95 * Shopping + 1.24 * Hobbies + 2.10 * Beverage + 2.02 * Meal + 1.27 * Current events + 1.44 * TV + 1.83 * Keeping Appointments + 2.05 * Travel + 1.82 * Alone + 1.91 * Appliance + 2.72 * Clothes + 1.97 * Read + 3.39 * Telephone + 1.83 * Writing.

Composite scale as a global primary study outcome 

The composite primary outcome combines both the aADAS-cog and aADCS-ADL to create an outcome that is sensitive to decline in cognition and function.  The weights of the items were rescaled so that Composite ranged from 0 to 100 (Table 1).  The calculation for the Composite is as follows:

Composite = 1.66 * Word Recall + 1.35 * Orientation + 1.42 * Word Recognition + 0.55 * Recall Instructions + 0.81 * Spoken Language + 1.01 * Word Finding + 5.42 * ONB + 0.15 * VPAL + 0.19 * Category Fluency + 0.28 * Belongings + 0.35 Shopping + 0.23 * Hobbies + 0.38 * Beverage + 0.37 * Meal + 0.23 * Current Events + 1.26 * TV + 0.33 * Keeping Appointments + 0.37 * Travel + 0.33 * Alone + 0.35 * Appliance + 0.49 * Clothes + 0.36 * Read + 0.62 * Telephone + 0.33 * Writing.

The percent contribution for each item as well as for cognitive and functional items combined is shown in Table 1. The composite score was weighted higher for cognition than for function, due to the points on the cognitive scale reflecting smaller changes in the course of the disease than the points on the functional scale. This was also based on the stage of disease which was expected to have a larger decline in cognition than in function.

Table 1. Item contribution to the adapted and composite scales. Max: maximum

 

Adapted scales showed minimal or no improvement in the control group MSDRs 

Since 4 of the treatment groups in the phase II test study performed similarly and one (2mg IMM-AD04) showed a decrease in the decline rate compared to the other 4, the 4 groups were then treated as “control” groups. This was supported by comparing the decline rates for the traditional scales in the historical pooled placebo mild data (ADCS placebo data from 2 studies pooled with ADNI mild data) to the decline rates in the 4 ineffective treatment groups (Figure 1) and noting that the historical groups declined faster than these 4 groups. The IMM-AD04 2mg group was assumed to have a positive treatment effect, and the effect sizes in the IMM-AD04 2mg group were calculated relative to the 4 “control” groups. The expectation is that a treatment difference would be more easily detected with the adapted scales relative to the traditional scales, primarily due to a larger MSDR in the “control” groups.

Figure 1. MSDR for Traditional and Composite Scales for AFF006 and Historic Data

MSDR change from baseline at 18 months. Composite 2 is Composite without CogState and NTB items. *Estimates for Pooled Historic are biased due to using same data set to derive and assess this scale.

MSDRs were similar between aADAS-cog and ADAS-cog11, and also between aADCS-ADL and ADCS-ADL, indicating minimal, if any, improvement in precision of measurement of decline, or possibly an improvement in precision of measurement that was counteracted by a milder patient population in the AFF006 study compared to the historical pooled mild patient population. This is supported by the observation that the MSDRs for the traditional scales were larger in the historical group compared to the AFF006 study. Alternatively, the reduced decline rate could be due to a small treatment effect in the “control” groups. The MSDR for the CDR-sb is consistently larger than for the composite score, indicating good precision in measurement of decline over time for the CDR-sb. This is consistent with historical studies that have shown that CDR-sb measures decline consistently and often more sensitively than other scales, even in a pre-dementia stage of disease.

Observed treatment differences were not proportional for adapted scales compared to traditional scales

Treatment effect sizes in the 2mg IMM-AD04 group compared to the “control” groups, as measured by the percentage slowing of decline, were larger for the adapted scales with effects of 36% to 53% for aADAS-cog compared to 32% to 51% for ADAS-cog11; 36% to 44% for aADCS-ADL compared to 12% to 37% for ADCS-ADL; and 43% to 56% for the Composite compared to 19% to 37% for the CDR-sb (Table 2). The difference was especially large comparing the Composite to the CDR-sb, indicating that it has minimal sensitivity to group differences, some of which may be due to treatment effects.

Table 2. MSDRs, Effect sizes and p-values at 18 Months

*Uses original weights from PLS model – excluding NTB and CogState items; 1. For functional and global scales ADNI was excluded from the Historical Pooled groups.

The 2mg IMM-AD04 group demonstrated substantially smaller MSDRs (0.2263) than the “control” groups for the aADAS-cog (1 mg IMM-AD04: 0.6843, 25µg 1mg: 0.936, 25µg 2mg: 0.523, and 75µg 2mg: 0.5595), the aADCS-ADL, Composite, the ADAS-cog11 and the ADCS-ADL (Table 2), consistent with a treatment effect or an unusually slowly declining group. The CDR-sb was the only scale that showed similar MSDR in the 2mg IMM-AD04 group (0.6452) and in the 4 comparison groups (1 mg IMM-AD04: 0.8051,  25µg 1mg:  0.6984, 25µg 2mg: 0.7364, and 75µg 2mg: 0.5283).

Adapted scales sensitivity in mildest AD versus worse mild AD

Adapted scales were tested for their sensitivity within different disease stages by assessing patients at “less mild” (MMSE<23) and “mildest” (MMSE 23+) stages from the phase II clinical study (Table 3).

Table 3. Statistics for Mildest (MMSE 23+) vs. Less Mild (MMSE <23)

The adapted cognitive scale showed similar sensitivity within the “control” groups as measured by MSDR to the traditional scales within the mildest AD group and also within the less mild group, with the exception that the CDR-sb showed more sensitivity to decline within the less mild group for all 4 “control” groups. Cognitive scales, both adapted and traditional, showed similar decline rates in the mildest and less mild patient populations for all 4 “control” groups, but ADL scales were generally more sensitive to decline within the less mild group compared to the mildest group. The composite scale performed similarly in the mildest and less mild populations, due to its cognitive emphasis, but the CDR-sb declined more in the less mild group, similar to the ADL scales.

Treatment effects for the IMM-AD04 2mg group in the mildest subjects were strong for both adapted and traditional cognitive and functional scales, but the Composite had a much larger treatment effect than the CDR-sb. In the less mild subjects, no cognitive effects were seen with either the aADAS-cog or ADAS-cog11, but the aADCS-ADL and the Composite had much larger treatment effects than the ADCS-ADL and the CDR-sb.

Post-Hoc Results

Additional scales of aADAS-cog 2 and Composite 2 were calculated to determine what effect the CogState and NTB items had on the pre-specified aADAS-cog and Composite. All other post-hoc composites, Balanced Composite, Balanced Composite 2 and Empirical Composite, were derived to investigate how different weights on cognition and function affect the performance of a composite score.

Effect of CogState and NTB Items 

Results were compared for the aADAS-cog versus the aADAS-cog 2 (which excluded CogState and NTB items). Effect sizes from the aADAS-cog ranged from 36 to 53%, with p-values between 0.022 and 0.254.  Effect sizes from the aADAS-Cog 2 ranged from 101 to 102%, with p-values between 0.013 and 0.134.

Similar results are seen when comparing the results from the Composite and Composite 2 (which excluded CogState and NTB items).  Effect sizes from the Composite ranged from 43 to 56%, with p-values between 0.032 and 0.227.  Effect sizes from Composite 2 ranged from 50 to 63%, with p-values between 0.005 and 0.108.

It appears that the presence of CogState and NTB items hurts the sensitivity to differences of the adapted and composite scales, although it improves (increases) the MSDR within the “control” groups.  

Impact of functional versus cognitive weighting on the composite outcome 

All post-hoc adapted and composite scales were assessed in the phase II data set and in the historic data (Table 4). Sensitivity to decline and to treatment effects was better for the Optimized Composite compared to Composite and Composite 2. The Balanced Composite and Balanced Composite 2 were not as sensitive to group differences as Composite and Composite 2, even though the MSDRs were higher for the balanced composites, suggesting better sensitivity to decline.

Table 4. Composites with Alternate Weighting of Cognition and Function

*Unadjusted Mean is used instead of LSMean; 1. For functional and global scales ADNI was excluded. 

Discussion 

We developed adapted cognitive, aADAS-cog, and functional, aADCS-ADL, scales as well as a composite score, Composite, combining both cognition and function, with the goal of establishing scales that are superior to existing ones in measurement of potential decline and treatment effects of patients with early AD. Using this approach, we found that optimizing scale assessment outcomes improved their performance over traditional scales for each domain by demonstrating minimal improvement in the MSDR and an increased signal that the active treatment group had over the “control” groups. The improvement in the MSDR was not as large as anticipated, partly due to bias issues, since the anticipated improvement in MSDR was based on obtaining and testing scales in the same data set. It may also be due to selection of a milder patient population for this study than the pooled mild population that was used for development of the adapted scales. Another possibility is that some of the treatment groups in this study that were assumed to have no effect may actually be demonstrating a slowing of clinical decline.

The increase in treatment effect that was seen was contrary to the usual assumption of proportionality in the treatment effect that is the basis of most sample size calculations. One possible explanation for this increased effect is the possibility that a treatment that only affects AD related decline would be better able to demonstrate a treatment effect on an outcome that is targeted to AD specific decline. The smaller effect seen on the CDR-sb could be due to the CDR-sb measuring non-disease related decline such as normal aging complaints. An AD specific treatment effect would not be expected to slow these types of decline.

Both function and cognition were seen to change, with more cognitive change in the mildest patients, and almost no cognitive change in the less mild patients. More functional change was seen in the less mild patients but was also evident in the mildest half of the patients. The Composite combined cognition and function, but weighted the two unequally, based on assigning cognitive and functional points the same weight. Additional weighting was performed based on the empirically estimated weights of cognition (69%) and function (31%) as well as “balanced” weighting with 50% weight on cognition and 50% weight on function. The disease progression as measured with the Composite in the mild population investigated in this phase II clinical study may be influenced by an overweighting on cognition, however, this did not result in more sensitivity to detect treatment group differences. Changes in function were seen and weighting function equally with cognition resulted in similar, and somewhat stronger, detection of treatment effects, consistent with an AD specific effect rather than a cognition specific effect. The Empirical Composite detected treatment differences with the most sensitivity of any of the composites, presumably due to its weighting cognition and function based on natural weightings of these domains in this stage of disease.

To our knowledge, this is the first study to prospectively use optimized composites as primary endpoints and to demonstrate the superior power of optimized composites in early disease. It was interesting to note that inclusion of the CogState and NTB items in the aADAS-cog scale actually substantially decreased its power to detect group differences, supporting a strictly empirical approach over one based on combining empirical results with literature or expert opinion.

The PLS method employed in this work improves power of the outcome by eliminating items that don´t decline over time and optimally weighting declining items. It also incorporates principle component methodology to account for item correlation. Related methodologies to produce composite scores have been successfully applied including the ADAS Tree (32) and ADAS-cog revisited (33).

Historically, many developers of composite scales have assumed proportionality in the treatment effect with the use of more sensitive scales to measure AD, primarily to support statements about increases in power or decreases in sample sizes that could be expected with a more sensitive endpoint. This assumption would imply that the effect sizes would be similar across traditional and adapted scales, but that the p-values for treatment differences would be more significant for the adapted scales due to the increased sensitivity (increased MSDR of the control group) of the adapted scales relative to the traditional scales. But this is not what is seen in this phase II study.

Based on the MSDRs of the “control” groups in this study, the adapted scores did not perform substantially better than the traditional scales, and in some cases performed worse, which was not too surprising due to the lack of correction for bias in the results of the historical data analysis. However, treatment effects measured were larger for adapted scales vs. traditional scales. This indicates that the impact of an optimized composite on the power of a study may depend more on the ability of the optimized scale to detect treatment effects than on the ability of the scale to measure disease related decline in the control or comparison group. The much larger group differences for the adapted scales and the composite compared to the traditional scales may be due to measuring more disease specific decline, calling into question the common assumption of proportional treatment effects.

These findings are consistent with the theories that led to the development of these adapted scales. For instance, if a treatment shows 20% slowing on a traditional scale that is comprised of 50% relevant item points and 50% irrelevant item points in the particular disease stage, it would be expected to increase to 40% slowing on an adapted scale that only includes the relevant points. In other words, a treatment effect wouldn’t be expected to impact the points on the scale that represent noise.

Carefully designed outcome measures for AD can make a big difference in the ability of a clinical study to detect true treatment effects. Current scales leave room for improvement even in a mild AD population, and would be even less effective in earlier stages of disease. Cognition and function are both changing at this stage of the disease, but appear to change at differing rates, bringing into question the idea that they should be similarly sensitive to change. Careful attention to measurement issues in clinical trials will result in improved power for detecting true treatment effects and, at the same time, more confidence in negative results.

Acknowledgments: We would like to thank all the investigators of AFF006 and the DSMB board for their significant contribution to the study. We would also like to thank the Alzheimer’s Disease Cooperative Study (ADCS) for data collection and sharing. Additional data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen Idec Inc.; Bristol-Myers Squibb Company; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; ; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Medpace, Inc.; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Synarc Inc.; and Takeda Pharmaceutical Company. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.

Competing interests: AS, FM, WS, and LT are employees of AFFiRiS, the company that commercializes the AFFITOPE® technology described in the manuscript.  SH, NE, and SS are consultants for AFFiRiS through Pentara Corporation. BD declares no conflict of interest.     

Data and materials availability: The Phase II clinical trial described in this study  is registered at www.clinicaltrials.gov, Identifier: NCT01117818.  

Ethical standards: The clinical data described in this study was performed in compliance with Good Clinical Practice (GCP), the Declaration of Helsinki (2013), and local legal and regulatory requirements and applicable international regulations. 

References

1. Brooks, L.G. & Loewenstein, D.A. Assessing the progression of mild cognitive impairment to Alzheimer’s disease: current trends and future directions. Alzheimers Res Ther 2010; 2, 28.

2. Robert, P. et al. Review of Alzheimer’s disease scales: is there a need for a new multi-domain scale for therapy evaluation in medical practice? Alzheimers Res Ther 2010; 2, 24.

3. Lopez, O.L., McDade, E., Riverol, M. & Becker, J.T. Evolution of the diagnostic criteria for degenerative and cognitive disorders. Curr Opin Neurol 2011; 24, 532-41.

4. Doody, R.S. et al. Phase 3 trials of solanezumab for mild-to-moderate Alzheimer’s disease. N Engl J Med 2014; 370, 311-21.

5. Salloway, S. et al. Two phase 3 trials of bapineuzumab in mild-to-moderate Alzheimer’s disease. N Engl J Med 2014; 370, 322-33.

6. Cano, S.J. et al. The ADAS-cog in Alzheimer’s disease clinical trials: psychometric evaluation of the sum and its parts. J Neurol Neurosurg Psychiatry 2010; 81, 1363-8.

7. Karin, A. et al. Psychometric evaluation of ADAS-Cog and NTB for measuring drug response. Acta Neurol Scand 2014;129, 114-22.

8. Doraiswamy, P.M., Kaiser, L., Bieber, F. & Garman, R.L. The Alzheimer’s Disease Assessment Scale: evaluation of psychometric properties and patterns of cognitive decline in multicenter clinical trials of mild to moderate Alzheimer’s disease. Alzheimer Dis Assoc Disord 2001;15, 174-83.

9. Hobart, J. et al. Putting the Alzheimer’s cognitive test to the test II: Rasch Measurement Theory. Alzheimers Dement 2013; 9, S10-20.

10. Samtani, M.N. et al. An improved model for disease progression in patients from the Alzheimer’s disease neuroimaging initiative. J Clin Pharmacol 2012; 52, 629-44.

11. Samtani, M.N. et al. Disease progression model in subjects with mild cognitive impairment from the Alzheimer’s disease neuroimaging initiative: CSF biomarkers predict population subtypes. Br J Clin Pharmacol 2013; 75, 146-61.

12. Caselli, R.J. et al. The neuropsychology of normal aging and preclinical Alzheimer’s disease. Alzheimers Dement 2014; 10, 84-92.

13. Howieson, D.B. et al. Trajectory of mild cognitive impairment onset. J Int Neuropsychol Soc 2008;14, 192-8.

14. Albert, M.S. et al. The diagnosis of mild cognitive impairment due to Alzheimer’s disease: recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimers Dement 2011; 7, 270-9.

15. Hort, J. et al. Spatial navigation deficit in amnestic mild cognitive impairment. Proc Natl Acad Sci U S A 2007; 104, 4042-7.

16. Ostberg, P., Fernaeus, S.E., Hellstrom, K., Bogdanovic, N. & Wahlund, L.O. Impaired verb fluency: a sign of mild cognitive impairment. Brain Lang 2005; 95, 273-9.

17. Hendrix, S., Wells B. Time Course of Cognitive Decline in Subjects With Mild Alzheimer’s Disease Based on ADAS-cog Subscales and Neuropsychological Tests Measured in ADNI. Abstract P4-096. Alzheimer Dement 2010; 6: e50.

18. Hendrix S et al. A new tool for optimizing responsiveness to decline in early AD – abstract OC12. J of Nutr Health Aging 2012; 16: 805.

19. Monteiro, I.M. et al. Addition of a frequency-weighted score to the Behavioral Pathology in Alzheimer’s Disease Rating Scale: the BEHAVE-AD-FW: methodology and reliability. Eur Psychiatry 2001; 16 Suppl 1, 5s-24s.

20. European Medicines Agency. Concept paper on need for revision of the guideline on medicinal products for the treatment of Alzheimer’s disease and other dementias. 24 October 2013. EMA/CHMP/617734; 2013. http://www.ema.europa.eu/docs/en_GB/document_library/Scientific_guideline/2013/10/WC500153464.pdf. Accessed on 10 March 2015.

21. U.S. Department of Health and Human Services, Food and Drug Administration. Draft Guidance for Industry. Alzheimer’s disease: Developing drugs for the treatment of early stage disease. February 2013. http://www.fda.gov/downloads/drugs/guidancecomplianceregulatoryinformation/guidances/ucm338287.pdf. Accessed on 10 March 2015.

22. Mohs, R.C. et al. Development of cognitive instruments for use in clinical trials of antidementia drugs: additions to the Alzheimer’s Disease Assessment Scale that broaden its scope. The Alzheimer’s Disease Cooperative Study. Alzheimer Dis Assoc Disord 1997;11 Suppl 2, S13-21.

23. Rosen, W.G., Mohs, R.C. & Davis, K.L. A new rating scale for Alzheimer’s disease. Am J Psychiatry 1984; 141, 1356-64. 

24.   Aisen PS, Schafer KA, Grundman M, Pfeiffer E, Sano M, et al. Effects of rofecoxib or naproxen vs placebo on Alzheimer disease progression: a randomized controlled trial. JAMA 2003; 289: 2819-2826.

25.   Aisen PS, Schneider LS, Sano M, Diaz-Arrastia R, van Dyck CH, et al. High-dose B vitamin supplementation and cognitive decline in Alzheimer disease: a randomized controlled trial. JAMA 2008; 300: 1774-1783.26.Galasko, D. et al. An inventory to assess activities of daily living for clinical trials in Alzheimer’s disease. The Alzheimer’s Disease Cooperative Study. Alzheimer Dis Assoc Disord 1997;11 Suppl 2, S33-9.

27. Wold, H. Estimation of Principal Components and Related Models by Iterative Least Squares,” in P. R. Krishnaiah, ed., Multivariate Analysis, New York: Academic Press,1966.

28. Wold, S. PLS for Multivariate Linear Modeling, QSAR: Chemometric Methods in Molecular Design. Methods and Principles in Medicinal Chemistry,1994.

29. Schneeberger A, et al. Results from a phase II study to assess the clinical and immunological activity of AFFITOPE® AD02 in patients with early Alzheimer’s disease. J Prev Alz Dis 2015;2(2):103-114

30. Lowndes, G.J. et al. Recall and recognition of verbal paired associates in early Alzheimer’s disease. J Int Neuropsychol Soc 2008; 14, 591-600.

31. Maruff, P. et al. Clinical utility of the cogstate brief battery in identifying cognitive impairment in mild cognitive impairment and Alzheimer’s disease. BMC Psychol 2013; 1, 30.

32. Llano, D.A., Laforet, G. & Devanarayan, V. Derivation of a new ADAS-cog composite using tree-based multivariate analysis: prediction of conversion from mild cognitive impairment to Alzheimer disease. Alzheimer Dis Assoc Disord 25 2011; 73-84.

33. Raghavan, N. et al. The ADAS-Cog revisited: novel composite scales based on ADAS-Cog to improve efficiency in MCI and early AD trials. Alzheimers Dement 2013; 9, S21-31.