jpad journal

AND option

OR option

COHORT-SPECIFIC OPTIMIZATION OF MODELS PREDICTING PRECLINICAL ALZHEIMER’S DISEASE, TO ENHANCE SCREENING PERFORMANCE IN THE MIDDLE OF PRECLINICAL ALZHEIMER’S DISEASE CLINICAL STUDIES

 

K. Sato1,2, T. Mano2, R. Ihara3, K. Suzuki4, Y. Niimi5, T. Toda2, T. Iwatsubo1, A. Iwata3, for Alzheimer’s disease Neuroimaging Initiative, Japanese Alzheimer’s disease Neuroimaging Initiative, and The A4 Study Team

 

1. Department of Neuropathology, Graduate School of Medicine, The University of Tokyo, Japan; 2. Department of Neurology, The University of Tokyo Hospital, japan; 3. Department of Neurology, Tokyo Metropolitan Geriatric Medical Center Hospital, Japan; 4. Division of Neurology, Internal Medicine, National Defense Medical College, Japan; 5. Unit for Early and Exploratory Clinical Development, The University of Tokyo Hospital, Japan

Corresponding Author: Dr. Atsushi Iwata, Department of Neurology, Tokyo Metropolitan Geriatric Medical Center Hospital, 35-2 Sakaecho Itabashi-ku, Tokyo 173-0015, Japan, Phone: 81-3-3964-1141, FAX: 81-3-3964-2963, , E-mails: iwata@m.u-tokyo.ac.jp

J Prev Alz Dis 2021;
Published online July 5, 2021, http://dx.doi.org/10.14283/jpad.2021.39

 


Abstract

Background: Models that can predict brain amyloid beta (Aβ) status more accurately have been desired to identify participants for clinical trials of preclinical Alzheimer’s disease (AD). However, potential heterogeneity between different cohorts and the limited cohort size have been the reasons preventing the development of reliable models applicable to the Asian population, including Japan.
Objectives: We aim to propose a novel approach to predict preclinical AD while overcoming these constraints, by building models specifically optimized for ADNI or for J-ADNI, based on the larger samples from A4 study data.
Design & Participants: This is a retrospective study including cognitive normal participants (CDR-global = 0) from A4 study, Alzheimer Disease Neuroimaging Initiative (ADNI), and Japanese-ADNI (J-ADNI) cohorts.
Measurements: The model is made up of age, sex, education years, history of AD, Clinical Dementia Rating-Sum of Boxes, Preclinical Alzheimer Cognitive Composite score, and APOE genotype, to predict the degree of amyloid accumulation in amyloid PET as Standardized Uptake Value ratio (SUVr). The model was at first built based on A4 data, and we can choose at which SUVr threshold configuration the A4-based model may achieve the best performance area under the curve (AUC) when applied to the random-split half ADNI or J-ADNI subset. We then evaluated whether the selected model may also achieve better performance in the remaining ADNI or J-ADNI subsets.
Result: When compared to the results without optimization, this procedure showed efficacy of AUC improvement of up to approximately 0.10 when applied to the models “without APOE;” the degree of AUC improvement was larger in the ADNI cohort than in the J-ADNI cohort.
Conclusions: The obtained AUC had improved mildly when compared to the AUC in case of literature-based predetermined SUVr threshold configuration. This means our procedure allowed us to predict preclinical AD among ADNI or J-ADNI second-half samples with slightly better predictive performance. Our optimizing method may be practically useful in the middle of the ongoing clinical study of preclinical AD, as a screening to further increase the prior probability of preclinical AD before amyloid testing.

Key words: Amyloid beta, preclinical Alzheimer’s disease, machine learning, predictive model.


 

 

Introduction

Preclinical Alzheimer’s disease (AD), which corresponds to positive brain amyloid beta (Aβ) accumulation in healthy individuals without an evidence of cognitive decline (1-3), is getting focused as the target of clinical trials aiming to develop disease-modifying therapies for AD (4). Positive amyloid accumulation on amyloid positron emission tomography (PET) or lowered levels of Aβ42 in the cerebrospinal fluid (CSF) are used as the gold standard to include participants into clinical trials for preclinical AD (1).
It is estimated that approximately one-third of cognitive normal elderly individuals have positive Aβ (5), which means that if randomly selected, it is necessary to screen 3 times more clinically eligible participants by PET amyloid imaging or CSF lumbar puncture to determine if they are actually amyloid positive or not. Indeed, in the A4 study in which 1,000 participants were included to conduct a double-blinded randomized clinical trial of solanezumab versus a placebo (6), more than 10,000 clinically normal individuals were initially screened, and then the eligible 3,300 participants were further screened by PET amyloid imaging.
If we have some predictive index that can increase the prior probability for the positive Aβ accumulation, the above cost/labor-consuming screening processes could become more efficient with a smaller number of participants requiring PET screening (7, 8). For example, an earlier study reported predicting Aβ of cognitive normal participants from an Alzheimer’s disease Neuroimaging Initiative (ADNI) cohort (9) used demographic features of age, sex, education, APOE ε4 status, and cognitive scores, increasing the positive predictive value to 0.65 compared to the reference prevalence of 0.41 (7).
Meanwhile, in case of a Japanese cohort such as the Japanese Alzheimer’s disease Neuroimaging Initiative (J-ADNI) (10-12) cohort, there is a concern in deriving similar predictive models from this cohort due to the limited number of eligible cognitive normal participants. There are fewer than 100 participants included without lack of the necessary data in the J-ADNI (10, 12), so it is considered difficult to construct statistically robust models trained and validated within the Japanese cohort alone to date.
On the other hand, it might be also unsatisfying to apply the models derived from the external population out of the Japanese cohort directly, due to the potential heterogeneity of study participants among different cohorts. In other words, models derived from Anti-Amyloid treatment in Asymptomatic Alzheimer’s disease (A4), ADNI, or Australian Imaging Biomarkers and Lifestyle Study of Ageing (AIBL) cohort data (13) might not always be applicable to the J-ADNI cohort as they are, since the variable importance of each feature in the model can differ depending on the cohort, due to the difference in the distribution of participants’ basic demographics. For example, baseline age and education or even the proportion of those with positive Aβ are shown to be significantly different between ADNI and J-ADNI cohorts (10). These problems might have prevented the development of clinical models that effectively predict preclinical AD in a Japanese cohort.
As one of the solutions to overcome these constraints, here we propose to utilize models trained based on the A4 cohort data, which is a large dataset with more than 3,000 participants as of late 2019. Since the data characteristics of A4 participants and Japanese cohort (i.e. J-ADNI here) participants could somehow differ as we mentioned above, we optimized the A4-based models, thereby making the models more suitable to the J-ADNI cohort. Our proposing procedure is composed of two stages: the first is to generate numerous patterns of prediction models based on the A4 data with the varying standard uptake value ratio (SUVr) thresholds, and the second is to find the most appropriate SUVr threshold configuration among them so that the model based on the SUVr configuration would perform best in the randome-half of J-ADNI (or ADNI) dataset. The SUVr threshold is the critical cut-off to determine if there is amyloid accumulation in the PET or not (14) but is not always strictly established in the A4 study cohort, so adjusting the SUVr threshold leads to the varied allocation of amyloid positive/negative binary status in each case of the original A4 data. This is the operational procedure made solely for the purpose of identifying the best-performing models for other cohort data, and then we evaluate whether the obtained model based on the determined SUVr threshold can also take the better performance in the remaining J-ADNI (or ADNI) subset. Such ‘optimization’ procedure might allow us to build more flexible models, thereby enhancing the applicability of the obtained models to any external cohorts such as J-ADNI or ADNI. Practically, our proposed method might be useful as a predictive index available in the actual clinical study settings for preclinical AD, e.g., as a screening to increase the prior probability of preclinical AD just in the middle of ongoing preclinical AD studies.

 

Methods

Data acquisition and preprocessing

This study was approved by the University of Tokyo Graduate School of Medicine institutional ethics committee (ID: 11628-(3)). Informed consent is not required because this was observational study using publicly available data. We used the datasets of the A4 study and ADNI obtained from the Laboratory of Neuro Imaging (LONI) (https://ida.loni.usc.edu) in October 2019 and the J-ADNI dataset obtained from National Bioscience Database Center (NBDC) (https://humandbs.biosciencedbc.jp/en/hum0043-v1) in June 2018 with the approval of the data access committee. The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), PET, other biological markers, and clinical and neuropsychological assessments can be combined to measure the progression of mild cognitive impairment (MCI) and early AD. For up-to-date information, see www.adni-info.org.
In this study, we used the data of cognitive normal participants. General inclusion criteria for the cognitive normal participants were determined in reference to an earlier study on the preclinical Alzheimer cognitive composite (PACC) (3), defined as follows: participants ages 65 to 85 years old (* 60 to 84 years old for cases from the J-ADNI cohort) at the time of screening with a global Clinical Dementia Rating (CDR-global) score of 0, with MMSE score (27-30) and Delayed Recall score on the Logical Memory IIa subtest (8-15) for participants with 13 or more years of education, or with MMSE (25-30) and Delayed Recall score (6-13) for participants with 12 years or less of education.
To determine Aβ accumulation status in A4 study cohort, with/without (binary) positive Aβ-PET (florbetapir) at a varying threshold level of Standardized Uptake Value ratios (SUVr) (value corresponding to the ‘Composite_Summary’ in the ‘A4_PETSUVR.csv’ file) was used (Supplemental Table 1). Meanwhile, in the ADNI data, due to the limited number of eligible participants with missing data, we used CSF Aβ42 < 192 pg/mL (values of median batch in the ‘UPENNBIOMK_MASTER.csv’ file) as the criterion for positive Aβ accumulation (15). In the J-ADNI cohort, cases with CSF Aβ42 < 333 pg/mL (values in the ‘pub_csf_apoe.tsv’ file) (10) or with positive findings on the visual assessment of PiB-PET results (as listed in the ‘pub_petqc.tsv’ file) (16) were determined as positive Aβ.
We used the following clinical and laboratory features, which are available commonly in A4, ADNI, and J-ADNI datasets, as exploratory variables to include into the models: age at baseline, sex (male or female: binary), education years, with/without parental history of AD (binary), with/without elevated Clinical Dementia Rating sum of boxes (CDR-SB) at baseline (≥0.5 or not: binary), with/without APOE ε4 allele(s) (binary), and the baseline PACC score. Other features such as brain MRI or blood test results as used in our previous studies (11, 17, 18) were not included because they are not always available from A4, ADNI, and J-ADNI cohorts in a unified manner. Since the A4 study dataset up to 2019 contains baseline data alone and the participants’ sequential changes have not been available, we also used the baseline data alone from the ADNI and J-ADNI datasets. The parental history of AD was regarded as positive if there was a statement that the participant’s father or mother had been diagnosed with AD, and it was regarded as negative if there was no such statement or the data were missing.
The PACC score (3) is the composite score, which is calculated from the sum of Z scores from 4 items: (1) the Total Recall score from the Free and Cued Selective Reminding Test (FCSRT), (2) Delayed Recall score on the Logical Memory IIa subtest from the Wechsler Memory Scale, (3) Digit Symbol Substitution Test score from the Wechsler Adult Intelligence Scale-Revised, and (4) MMSE total score [3]. Since the PACC score was not calculated in the ADNI and J-ADNI studies, we calculated the virtual PACC score by using the score of “LDELTOTAL: for Logical Delayed, the score of ”DIGITSCOR” for the Digit Symbol Substitution Test, and the total MMSE score. Furthermore, instead of using the FCSRT test score, which was not conducted in ADNI and J-ADNI studies, we used the delayed recall scores of ADAS-cog13 (Q4) in ADNI and J-ADNI datasets as in the earlier study (3). The Z scores of each of the 4 items were calculated within each ADNI or J-ADNI cohort in reference to the data of the cognitive normal cohort as allocated at baseline (“DX_bl” of “CN” (cognitive normal) or “SMC” (subjective memory complaints) in ADNI and the “COHORT” of “NL”: (normal) in J-ADNI.
Missing data were handled by using the list-wise method: samples with missing data in the above modeling features were excluded from the analysis. Eventually, we included n = 3233 unique eligible cases of the A4 study cohort, n = 86 eligible cases of the ADNI cohort, and n = 50 eligible cases of the J-ADNI cohort.

Concepts of our proposed method

Here we explain how to demonstrate the practical effectiveness of our proposed ideas. First, we built a large number of models based on the varying SUVr configurations (Figures 1A, 1B) to predict positive Aβ within the A4 cohort data. Then, we evaluated the performance of these models, as calculated by area under the curve (AUC) as a performance metric of binary prediction models available regardless of the threshold value, in each of the half-split subgroups of the external cohort from A4 comprised of cognitive normal participants (Figure 1C, 1D). Suppose we know the Aβ status of each case in subgroup1 (Figure 1C), while we do not know the status of each case in subgroup2 (Figure 1D). When we compare the distribution of predictive performance results across all SUVr configurations (from 1 to k here) between the subgroups (Figure 1E), the true correlation should fall into the significantly negative (Figure 1F), non-significant (Figure 1G), or significantly positive (Figure 1H) categories. If we can observe that the actual correlation is consistently positive using the various datasets for evaluation (e.g. ADNI and J-ADNI here), to find which model’s SUVr configuration achieves the highest performance in one subgroup, this would also result in the near-highest performance in the rest another subgroup with unknown Aβ status (Figure 2A & 2B). We call the procedure to find the SUVr configuration with the highest AUC in one subgroup the ‘optimization of models’.
The significant-positive correlation (Figure 1F) is the prerequisite for this optimization. Although the half-split subgroups derived from the same cohort might tend to have a positive correlation due to the similar variance in their participants’ demographical data, such a tendency is not always validated, especially in cohorts which are far smaller (e.g. ADNI or J-ADNI cognitive normal cases) than the A4 cohort. If the correlation between subgroups occasionally becomes negative (Figure 1F) or non-significant (Figure 1G), the optimization will not work. Therefore, our goal in this study was to confirm that the correlation between the half-split validation subgroups (Figures 1C versus 1D) is reproducibly significantly-positive (Figure 1F-H) and then to assess the degree of AUC improvement by employing this optimization procedure (Figures 2A & 2B), using the ADNI and J-ADNI datasets as validation.

Figure 1. Conceptual outline of our proposed method

We at first built a large number of models based on varying SUVr configurations (A, B), then we evaluated the performance of these models in each of the half-split external cohort (= ADNI or J-ADNI here) subgroups of cognitive normal participants (C, D). We supposed that we know the Aβ status of each case in subgroup1, while we do not know the statuses in subgroup2. When we compare the predictive performance results’ distribution across different SUVr configurations (from 1 to k) between the external cohort half-split subgroups (e.g. ADNI or J-ADNI), the actual correlation should fall into the either significantly negative (F), non-significant (G), or significantly positive (H).

 

Processing workflow: model training and performance evaluation

A detailed data processing workflow is outlined in Supplemental Figure 1. The target of A4-cohort predictive models is whether they are with/without positive Aβ-PET (florbetapir) (binary) which are determined at varying SUVr threshold levels. In the model training, the SUVr threshold continuously varied by 0.01 from 0.99 to 1.47, corresponding to the [mean – 0.5 SD] and the [mean + 2 SD] of SUVr distribution in all the A4 data. Furthermore, we excluded the Aβ-negative cases with an SUVr barely lower than the threshold, between which the margin range is varied, in order to exclude possible false-negative cases. This exclusion procedure substantially also acts to exclude possible false-positive cases, clarifying the difference between cases with and without positive Aβ. Simultaneously adjusting with the above SUVr threshold, the “exclusion range” (Supplemental Figure 1A) is also adjusted continuously by 0.01 from 0 to 0.09, where 0.09 corresponds to [0.5 SD] of A4-SUVr. Taken together, cases whose SUVr is higher than the [threshold value] are defined as Aβ-positive, and the cases whose SUVr is lower than the [threshold value – exclusion range value] are defined as Aβ-negative (Supplemental Figure 1A). We here define this way of varying Aβ allocation and the eligible case inclusion as “SUVr configuration,” which is used to generate a large number of models (Figure 1B). This SUVr configuration can be changed into 48 SUVr threshold patterns *10 exclusion range patterns = 480 combination patterns in total.
Since the small proportion of cases within the exclusion range is eliminated, the eligible A4 dataset A_k, which is from the A4 cohort cases (n = 3233), is slightly different depending on each SUVr configuration k (k=1,2,…480) (Supplemental Figure 1B). Then a randomly selected 70% of A_k were further picked up as the A4 training subgroup A’k; using this A’k subgroup, we trained a model M_k predictive for positive Aβ (Supplemental Figure 1C). For the model Mk, we separately constructed 2 types of models, one of them including APOE ε4 status into its features (denoted as “model with APOE”), and another not including APOE ε4 status into the model (“model without APOE”) (Supplemental Figure 1C). This is because APOE ε4 is one of the strongest determinants of the CSF Aβ42 level (19), while a model without APOE ε4 status would be more convenient to use as a screening index. The training was conducted with 10-fold cross validation and by a penalized generalized linear regression (GLM) algorithm using R package “caret” (20). Automated optimization of penalized GLM hyperparameters was conducted with grid-search by the caret function.
Then the predictive performance of the model M_k was validated in the ADNI and J-ADNI cohort data, out of the original A4. We split the ADNI and J-ADNI cohorts into half subgroups (“subgroup1” & “subgroup2”) randomly (Supplemental Figure 1G, in Figures 1C & 1D) while retaining equal proportions of Aβ positive between the half subgroups using the “caret” package function (“createDataPartition”), then we aimed to compare the performance between ADNI subgroups or between J-ADNI subgroups. The predictive performance was measured with the metric of area under the curve (AUC), which is calculated by the predicted probability for the positive Aβ of each case in the applied dataset (Supplemental Figure 1D).
Since the randomly sampled A’k subgroup yields a slightly different model (Supplemental Figure 1C) every sampling time, we repeated the above processing steps (B-D: circled with gray color) 5 times in each k (shown with dagger mark [†]). We named the median from 5 times of AUC results as the vXi,k (Supplemental Figure 1E), which means it is derived from the k-th configuration-based model Mk applied to the subgroup Xi.
As the configuration can vary for 480 types as described above, the full validation results (k=1, 2, …480) for one subgroup are represented by a vector with a length of 480. For example, when one cohort X (= ADNI or J-ADNI) data are split into subgroup X1 and subgroup X2, vectors representing the results for these subgroups, which correspond to the result list of Figures 1C and 1D, are described as follows (Supplemental Figure 1F):

VX1=[vX1,1,vX1,2,…vX1,480]
VX2=[vX2,1,vX2,2,…vX2,480]

Then we measured the correlation between V_ADNI1 and VADNI2, and between VJADNI1 and VJADNI2.
The above process (steps A-F) was repeatedly performed for each ADNI and J-ADNI half-split subgroup (Supplemental Figure 1G), which are randomly separated 30 times in total (shown with the asterisk [*]), eventually yielding 30 sets of [VADNI1,VADNI2,VJADNI1,VJADNI2].
Next, we again explain how the ‘optimization’ is conducted using Figure 1 & 2, the example scatter plot of VX1 (plotted on X-axis) versus VX2 (plotted on Y-axis) across all 480 patterns of SUVr configurations in one randomization time (*). On this plot, the Pearson correlation between the VX1 and VX2 was R = 0.967 (p < 0.001). When we choose the ka of which vX1,ka takes maximum among the VX1, the performance AUC with the same ka-th SUVr configuration (= vX2,ka) would also be approximately the highest among VX2. In other words, based on the assumption that the correlation between the vectors VX1 versus VX2 is significantly-positive (as in Figure 1H), we can optimize the predictive model in reference to the half subgroup X1 alone so that the model takes the most of the best performance both in X1 and the rest from the half subgroup X2 of which performance distribution is unknown to us, by choosing the k of which vX1,k is the highest among VX1.
And when we choose the kb of which vX1,kb takes minimum among the VX1, the performance AUC with the same kb-th SUVr configuration (= vX2,kb) would also be approximately the lowest among VADNI2: the difference between the vX2,ka and vX2,kb just corresponds to the theoretically-maximum AUC improvement expected to be achievable by the present “optimization” procedure (Figure 2A).
Furthermore, we compared the optimized result and the non- optimized result based on the conventionally-used SUVr configuration (e.g., threshold of 1.15 (21)). Supposing an i-th SUVr configuration with a threshold of 1.15 and exclusion range of 0, we measured the difference between the above vX2,ka and the resulting AUC vX2,i of the i-th configuration in subgroup2. This difference just corresponds to the AUC improvement expected to be achieved by using this optimization procedure [Figure 2B], compared to the conventional settings when not using “optimization” as in earlier studies.

Figure 2. Evaluation of performance improvement

If we could observe that the actual correlation is consistently positive using the several datasets for evaluation (e.g. ADNI and J-ADNI here) as in the Figure 1H, the “optimization” to take the SUVr configuration of the model achieving the highest performance in one subgroup would also result in the near-highest performance in the rest of another subgroup with unknown Aβ status (A, B).

 

Statistical analysis

All data handling and statistical analysis were performed using the software R 3.5.1 (R Foundation for Statistical Computing, Vienna, Austria). For numerical data, we used median and interquartile ranges (IQR) for summarization and the Wilcoxon rank sum test or analysis of variance (ANOVA) test for comparisons between groups. For categorical data, we used frequency and percentage for summarizing and Fisher’s exact test for the group comparison. For calculating correlations between two numerical vectors, we used Pearson’s correlation. A P-value less than 0.05 was regarded as statistically significant if not mentioned otherwise.

 

Results

Overview of the demographical distribution of the included cohorts

Basic demographics are shown in Supplemental Table 1, revealing slight differences among the data of the 3 included cohorts (A4, ADNI, and J-ADNI). The J-ADNI cohort participants had a significantly younger median age, were more predominantly male, and had fewer years of education than the other 2 cohorts. There was no significant difference among the 3 cohorts in the distribution of CDR-SB, parental history of AD, APOE ε4 status, and the baseline PACC.
In addition, we also evaluated the performance of each single feature for predicting positive Aβ in each of the included cohorts. A heatmap of AUC result values as the predictive performance of the corresponding features (in columns) in each corresponding cohort (in rows) is shown in Supplemental Figure 2A. For the A4 cohort on this heatmap, a SUVr threshold of 1.15 was used (21). Each of the features except for APOE has a different level of association with the positive Aβ status, depending on the cohort.

Cohort-specific “optimization» of models

Next, we obtained the predictive performance of models based on the varying SUVr configuration evaluated with AUC in the ADNI and J-ADNI subgroups. We visualized the examples of the result vectors VADNI1, VADNI2, VJADNI1, and VJADNI2, summarizing the AUC from 480 different SUVr configurations (48 types of SUVr thresholds × 10 types of exclusion ranges: Supplemental Figure 1A) by converting them into heatmap matrices for clarify (Supplemental Figure 2B). Each cell in the heatmaps represent the performance AUC value of the model based on the corresponding SUVr configuration, where the row denotes the SUVr threshold and the column denotes the exclusion range. For the results from ADNI (Supplemental Figure 2B, left) or J-ADNI (Supplemental Figure 2B, right) data, we can see that the AUC performance results distribute differently depending on the SUVr configuration, and that the AUC performance results distribute differently largely depending on the cohort.
By choosing the darkest cell in the heatmap from the ADNI-subgroup1 (Supplemental Figure 2B), we can select the SUVr for each model’s performance as that which is the highest in the ADNI subgroup1. As there was a positive correlation of R = 0.767 (p < 0.001) between the AUC heatmap of the ADNI subgroup1 and ADNI subgroup2 (Supplemental Figure 2B, left), the selected SUVr configuration would also take the near-best performance when applied to the rest of ADNI subgroup2. The same is true to the pair of J-ADNI subgroups (Supplemental Figure 2, right), between which there was a positive correlation of R = 0.493 (p < 0.001). This “optimization” procedure is generally cohort-specific since each cohort has specific spatial distribution of the resulting AUC heatmaps. Conversely, by choosing the lightest cell in the heatmap from ADNI subgroup1, the selected SUVr configuration would also show the near-lowest performance when applied to the rest of the ADNI subgroup2. The difference between the near-highest and the near-lowest AUC within subgroup2 corresponds to the “expected maximum AUC improvement achievable by optimization,” the difference between the worst AUC when the “optimization” was not used, and the best AUC when the “optimization” was used.
Now the set of [VADNI1,VADNI2,VJADNI1,VJADNI2] (as in Supplemental Figure 2B) is repeatedly obtained for 30 times of ADNI and J-ADNI randomization (Supplemental Figure 1G, [*]), at first based on the model “with APOE”: Figure 3A shows the distribution of the obtained correlation coefficients (as in Figure 1F-H) between VADNI1 and VADNI2 (summarized as Figure 3A, [a] & [b]) or between VJADNI1 and VJADNI2 (summarized as Figure 3A, [c] & [d]), repeated 30 times in total. In the ADNI cohort of models “with APOE” (Figure 3A [a]), Pearson’s correlation coefficient between the ADNI subgroup1 and subgroup2 was a mean of 0.897 (the mean’s 95% CI: 0.877 – 0.917), and the correlation coefficient > 0 and p-value < 0.05 were simultaneously observed in 30/30 of randomization (*) trials, fully meeting the prerequisite of our “optimization” method. The expected maximum AUC improvement width was a mean of 0.077 (the mean’s 95% CI: 0.069 – 0.085) (Figure 3B [a]), and the expected AUC improvement when compared to the AUC in a model of SUVr threshold 1.15 was a mean of 0.033 (95% CI: 0.022 – 0.043) (Figure 3C [a]), e.g. AUC value improved from 0.724 to 0.774 in a representative case. Similarly, in the ADNI cohort by models “without APOE” (Figure 3A [b]), the correlation coefficient was a mean of 0.517 (the mean’s 95% CI: 0.444 – 0.582), and the correlation coefficient > 0 and p-value < 0.05 were simultaneously observed in 30/30 of randomization trials (*). The expected maximum AUC improvement width was a mean of 0.107 (the mean’s 95% CI: 0.086 – 0.129) (Figure 3B [b]), and the expected AUC improvement when compared to the AUC in a model of SUVr with a threshold of 1.15 was a mean of 0.075 (95% CI: 0.057 – 0.093) (Figure 3C [b]), e.g. AUC value improved from 0.61 to 0.69 in a representative case. In comparison, the expected maximum AUC improvement achievable by the “optimization” was greater in models “without APOE” than in models “with APOE” (Figure 3B) in ADNI (Figure 3B [a] versus [b], and Figure 3C [a] versus [b]).

Figure 3. Correlation coefficients between the resultant AUC vectors from half-split subgroups and the degree of AUC improvement

Box plots show the distribution of the obtained correlation coefficients (as in Figure 1F-H) between V_ADNI1 versus V_ADNI2 (A, [a] & [b]), or between V_JADNI1 and V_JADNI2 (A, [c] & [d]), repeated 30 times in total. Each box corresponds to the range between the lower and upper quartiles (Q1 and Q3, respectively), and the range between whiskers corresponds to the data distribution within the range of [Q1 – 1.5*IQR, Q3 + 1.5*IQR]. In the ADNI cohort (A, [a] & [b]), 30/30 of results both with models “with APOE” (A, [a]) or “without APOE” (A, [b]) showed a significantly positive correlation between V_ADNI1 versus V_ADNI2. In the J-ADNI cohort, 22/30 results of models “with APOE” (A, [c]) were significantly positive, and 26/30 results of models “without APOE” (A, [d]) were significantly positive. The expected maximum AUC improvement achievable by the “optimization” (B), and the expected AUC improvement achievable by “optimization” when compared to the model based on the SUVr threshold of 1.15 without optimization (C) are plotted. In all models ([a]-[d]), the mean of “expected AUC improvement” was significantly higher than 0 (i.e. its lower 95% CI > 0), and a model “without APOE” in the ADNI cohort had approximately 0.10 of AUC improvement.

 

In the J-ADNI cohort models “with APOE” (Figure 3A [c]), the correlation coefficient between J-ADNI subgroup1 and subgroup2 was a mean of 0.301 (the mean’s 95% CI: 0.107 – 0.495), and a significant and positive correlation was observed in 22/30 randomization (*) trials, showing occasionally unsuccessful “optimization.” The expected maximum AUC improvement width was a mean of 0.011 (the mean’s 95% CI: 0.001 – 0.020) (Figure 3B [c]), and the expected AUC improvement when compared to the AUC in a model of SUVr with a threshold 1.15 was a mean of 0.009 (95% CI: 0.003 – 0.016) (Figure 3C [c]), e.g. AUC value showed few improvement from 0.65 to 0.65 in a representative case. Furthermore, in the J-ADNI cohort models “without APOE” (Figure 3A [d]), the correlation coefficient was a mean of 0.353 (95% CI: 0.258 – 0.448), and a significant and positive correlation was observed in 26/30 randomization trials (*), mostly meeting the “optimization” prerequisite. The expected maximum AUC improvement width was a mean of 0.086 (95% CI: 0.060 – 0.113) (Figure 3B [d]), and the expected AUC improvement when compared to the AUC in a model of the SUVr threshold of 1.15 was a mean of 0.019 (95% CI: 0.007 – 0.030) (Figure 3C [d]), e.g. AUC value improved from 0.61 to 0.64 in a representative case. The models “without APOE” showed a higher expected maximum AUC improvement achievable with the “optimization” than the models “with APOE” (Figure 3B [c] versus [d], and Figure 3C [c] versus [d]).

 

Discussion

In this retrospective study, we demonstrated our attempts to optimize the A4 study-derived predictive models to be applicable to external cohort datasets, including ADNI and J-ADNI. The proposed method has novelty in that we operationally manipulated the positive Aβ allocation in the original training data of A4, thereby enabling the achievement of the best-performing model when applied to the external cohorts, including ADNI and J-ADNI. The obtained AUC had improved mildly when compared to the AUC in case of literature-based predetermined SUVr threshold configuration. This means our ‘optimization’ procedure allowed us to obtain preclinical AD models for ADNI or J-ADNI with slightly better predictive performance. Our method may be practically useful in the middle of ongoing clinical study of preclinical AD, as a screening to further increase the prior probability of preclinical AD among the remaining samples before their amyloid testing.
The motivation of this study was mainly based on the concern as to the direct application of the A4 study-derived models to J-ADNI cohort, due to the differences in the distribution of participants’ baseline demographics such as age, sex, education years, ethnicity, the proportion of positive Aβ (Supplemental Table 1), or any unexamined clinical, laboratory, or genetic factors. It is known that such differences in the probability distributions of each feature between the training and validation datasets lead to failures in accurate prediction. “Transfer learning” is used in the field of deep learning as one of its solutions, enabling us to apply the trained model to the dataset origin of other domains. Thus, if utilized in our settings, it would enable us to apply the dataset from a different regional population with the smaller sample size (22, 23). However, our approach is based on conventional machine learning and is different from ‘transfer learning’, which we have not used since even the Aβ status in the original training data (= A4 study cohort) has not been definitely determined yet. If the biologically-corroborated criteria for the Aβ status are established within the original A4 cohort, transfer learning would be employable for building models effectively applicable to ADNI or J-ADNI datasets.
As expected, the efficacy of “optimization,” which is measured by the degree of AUC improvement compared to the resulting AUC of not using the “optimization,” was higher than 0 in average. The degree of maximum improvement in AUC (Figure 3B) and the degree of AUC improvement compared to the SUVr threshold of 1.15 (Figure 3C) are both approximately 0.10 in models “without APOE” applied to the ADNI cohort (Figure 3B[b], 3C[b]), which means this optimization procedure is expected when applied to the models “without APOE.” Although showing a smaller improvement, the models “without APOE” applied to the J-ADNI cohort also had a higher AUC than in the case of the models with any SUVr configuration (Figure 3B[d]) or with a conventional SUVr threshold 1.15 (Figure 3C[d]). This difference between ADNI and J-ADNI in their degree of AUC improvement may be due to the difference in their size of samples or in the degree of inter-cohort variation as represented by the different amyloid positivity rate.
Generally, the degree of AUC improvement (Figure 3B, 3C) tended to be higher in models “without APOE’”([b], [d]) than in models “with APOE” ([a], [c]), which means the performance is expected to improve by optimization much larger models “without APOE” than models “with APOE,” probably reflecting the high importance of APOE ε4 status as a variable for predicting positive Aβ. In addition, when the model “with APOE” was used, only 22/30 of a randomized half-split of the J-ADNI dataset led to a significantly positive correlation between VJADNI1 and VJADNI2, while it was more frequent (26/30) when the model “without APOE” was used. These results suggest that the current optimization methods are more reliably and effectively used in models not including APOE ε4 status as features than those including it.
The current approach to adjust SUVr configuration consisted of the SUVr threshold and the exclusion of cases whose SUVr is barely lower than the threshold, is no more than an operational procedure here and is not biologically-validated in a strict sense. In this point, we need to be careful in the interpretation on the obtained final model or its variables’ importance that it is the “transferred” model and does not have certain biological basis on its own. For example, when we identified one feature (e.g. higher PACC) with high variable importance in the final model, the potential biological association between that feature and the Aβ positivity may be smaller than in the case of conventional non-transferred models.
Our study has some limitations. First, while the degree/frequency of positive correlation between the result vectors (Figure 3A) might be influenced by the size of the validating cohort datasets or their intra-cohort data variability, as suggested by our results where the efficacy of “optimization” showed smaller improvement and lower reliability when applied to the J-ADNI cohort than to the ADNI cohort, we have not examined the detailed conditions (e.g. sample size) required for the validation of datasets to be eligible for the “optimization” procedure. Further validation may be needed in other external cohorts with various kinds of sample sizes. Second, in the case of the single multi-center clinical trial to which we attempt to apply our method practically, there may be uncertainty whether the two subgroups collected from different facilities truly have a similar distribution in their demographical features, which is the pre-requisite for the external application of the current methods. Also, the extent to which the difference in inter-subgroup feature distribution can be allowed may be uncertain, and the sample size required to alleviate the potentially underlying variance between subgroups may also remain uncertain. Third, the proposed method manipulates the original training data distribution so as to be specifically best-performing in the validation cohort of interest, so the final model is not reversely applicable to the original A4 cohort data or to other cohorts with different demographical distributions. The fourth limitation is related to the PACC calculation in ADNI and J-ADNI: the validity of using ADAS-cog 13 (Q4) as a substitution of FCSRT, and the validity of setting ‘”NL” cohort data as a reference of PACC calculation. And the fifth is that the proposed method takes a certain amount of computational times, since model training and validation are repeatedly needed: 30 times of ADNI or J-ADNI splits for each [5 times of A4 training subgroup splits and model validations for each k (480 patterns in total)], eventually requiring us to calculate 30*5*480 = 72,000 times of model training and validation. This is actually one of the reasons why we used penalized GLM as the prediction algorithm here, which takes shorter computational time than other types of algorithms such as random forest or support vector machine, and it is designed to have a smaller risk of over-fitting to the training data. If possible, other algorithms should also be tried (24). And lastly, used 3 cohorts referred to different modality of amyloid tests (i.e., florbetapir-PET in A4, CSF in ADNI, and CSF and PiB-PET in J-ADNI), possibly lowering the applicability of our method.
To conclude, we proposed a novel method to obtain preclinical Aβ predictive models specifically optimized to the cohort of interest in order to achieve extrapolative application out of the original training data. This optimization procedure showed efficacy of up to 0.10 of AUC improvement when used in combination with the models “without APOE.” Our method may be practically useful in the mid of the actual clinical study of preclinical AD, as a screening to further increase the prior probability of preclinical AD before amyloid testing.

 

Funding: This study was supported by Japan Agency for Medical Research and Development grants JP21dk0207057, JP21dk0207048, and JP20dk0207028.

Description about the ADNI: Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.

Description about the A4 study: The A4 Study is a secondary prevention trial in preclinical Alzheimer’s disease, aiming to slow cognitive decline associated with brain amyloid accumulation in clinically normal older individuals. The A4 Study is funded by a public-private-philanthropic partnership, including funding from the National Institutes of Health-National Institute on Aging, Eli Lilly and Company, Alzheimer’s Association, Accelerating Medicines Partnership, GHR Foundation, an anonymous foundation and additional private donors, with in-kind support from Avid and Cogstate. The companion observational Longitudinal Evaluation of Amyloid Risk and Neurodegeneration (LEARN) Study is funded by the Alzheimer’s Association and GHR Foundation. The A4 and LEARN Studies are led by Dr. Reisa Sperling at Brigham and Women’s Hospital, Harvard Medical School and Dr. Paul Aisen at the Alzheimer’s Therapeutic Research Institute (ATRI), University of Southern California. The A4 and LEARN Studies are coordinated by ATRI at the University of Southern California, and the data are made available through the Laboratory for Neuro Imaging at the University of Southern California. The participants screening for the A4 Study provided permission to share their de-identified data in order to advance the quest to find a successful treatment for Alzheimer’s disease. We would like to acknowledge the dedication of all the participants, the site personnel, and all of the partnership team members who continue to make the A4 and LEARN Studies possible. The complete A4 Study Team list is available on: a4study.org/a4-study-team.

Conflicts of interest: The authors have no conflict of interest to disclose.

Ethical standards: This study was approved by the University of Tokyo Graduate School of Medicine institutional ethics committee (ID: 11628-(3)).

 

SUPPLEMENTARY MATERIAL

 

References

1. Sperling RA, Aisen PS, Beckett LA, Bennett DA, Craft S, Fagan AM, Iwatsubo T, Jack CR Jr, Kaye J, Montine TJ, Park DC, Reiman EM, Rowe CC, Siemers E, Stern Y, Yaffe K, Carrillo MC, Thies B, Morrison-Bogorad M, Wagster MV, Phelps CH. Toward defining the preclinical stages of Alzheimer’s disease: recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimers Dement. 2011 May;7(3):280-92.
2. Jack CR Jr, Knopman DS, Jagust WJ, Petersen RC, Weiner MW, Aisen PS, Shaw LM, Vemuri P, Wiste HJ, Weigand SD, Lesnick TG, Pankratz VS, Donohue MC, Trojanowski JQ. Tracking pathophysiological processes in Alzheimer’s disease: an updated hypothetical model of dynamic biomarkers. Lancet Neurol. 2013 Feb;12(2):207-16.
3. Donohue MC, Sperling RA, Salmon DP, Rentz DM, Raman R, Thomas RG, Weiner M, Aisen PS; Australian Imaging, Biomarkers, and Lifestyle Flagship Study of Ageing; Alzheimer’s Disease Neuroimaging Initiative; Alzheimer’s Disease Cooperative Study. The preclinical Alzheimer cognitive composite: measuring amyloid-related decline. JAMA Neurol. 2014 Aug;71(8):961-70.
4. Cummings J. The National Institute on Aging-Alzheimer’s Association Framework on Alzheimer’s disease: Application to clinical trials. Alzheimers Dement. 2019 Jan;15(1):172-178.
5. Jansen WJ, Ossenkoppele R, Knol DL, Tijms BM, Scheltens P, Verhey FR, Visser PJ; Amyloid Biomarker Study Group, Aalten P, Aarsland D, Alcolea D, Alexander M, Almdahl IS, Arnold SE, Baldeiras I, Barthel H, van Berckel BN, Bibeau K, Blennow K, Brooks DJ, van Buchem MA, Camus V, Cavedo E, Chen K, Chetelat G, Cohen AD, Drzezga A, Engelborghs S, Fagan AM, Fladby T, Fleisher AS, van der Flier WM, Ford L, Förster S, Fortea J, Foskett N, Frederiksen KS, Freund-Levi Y, Frisoni GB, Froelich L, Gabryelewicz T, Gill KD, Gkatzima O, Gómez-Tortosa E, Gordon MF, Grimmer T, Hampel H, Hausner L, Hellwig S, Herukka SK, Hildebrandt H, Ishihara L, Ivanoiu A, Jagust WJ, Johannsen P, Kandimalla R, Kapaki E, Klimkowicz-Mrowiec A, Klunk WE, Köhler S, Koglin N, Kornhuber J, Kramberger MG, Van Laere K, Landau SM, Lee DY, de Leon M, Lisetti V, Lleó A, Madsen K, Maier W, Marcusson J, Mattsson N, de Mendonça A, Meulenbroek O, Meyer PT, Mintun MA, Mok V, Molinuevo JL, Møllergård HM, Morris JC, Mroczko B, Van der Mussele S, Na DL, Newberg A, Nordberg A, Nordlund A, Novak GP, Paraskevas GP, Parnetti L, Perera G, Peters O, Popp J, Prabhakar S, Rabinovici GD, Ramakers IH, Rami L, Resende de Oliveira C, Rinne JO, Rodrigue KM, Rodríguez-Rodríguez E, Roe CM, Rot U, Rowe CC, Rüther E, Sabri O, Sanchez-Juan P, Santana I, Sarazin M, Schröder J, Schütte C, Seo SW, Soetewey F, Soininen H, Spiru L, Struyfs H, Teunissen CE, Tsolaki M, Vandenberghe R, Verbeek MM, Villemagne VL, Vos SJ, van Waalwijk van Doorn LJ, Waldemar G, Wallin A, Wallin ÅK, Wiltfang J, Wolk DA, Zboch M, Zetterberg H. Prevalence of cerebral amyloid pathology in persons without dementia: a meta-analysis. JAMA. 2015 May 19;313(19):1924-38.
6. Sperling RA, Rentz DM, Johnson KA, Karlawish J, Donohue M, Salmon DP, Aisen P. The A4 study: stopping AD before symptoms begin? Sci Transl Med. 2014 Mar 19;6(228):228fs13.
7. Insel PS, Palmqvist S, Mackin RS, Nosheny RL, Hansson O, Weiner MW, Mattsson N. Assessing risk for preclinical β-amyloid pathology with APOE, cognitive, and demographic information. Alzheimers Dement (Amst). 2016 Aug 3;4:76-84.
8. Ansart M, Epelbaum S, Gagliardi G, Colliot O, Dormont D, Dubois B, Hampel H, Durrleman S; Alzheimer’s Disease Neuroimaging Initiative* and the INSIGHT-preAD study. Reduction of recruitment costs in preclinical AD trials: validation of automatic pre-screening algorithm for brain amyloidosis. Stat Methods Med Res. 2020 Jan;29(1):151-164.
9. Petersen RC, Aisen PS, Beckett LA, Donohue MC, Gamst AC, Harvey DJ, Jack CR Jr, Jagust WJ, Shaw LM, Toga AW, Trojanowski JQ, Weiner MW. Alzheimer’s Disease Neuroimaging Initiative (ADNI): clinical characterization. Neurology. 2010 Jan 19;74(3):201-9.
10. Iwatsubo T, Iwata A, Suzuki K, Ihara R, Arai H, Ishii K, Senda M, Ito K, Ikeuchi T, Kuwano R, Matsuda H; Japanese Alzheimer’s Disease Neuroimaging Initiative, Sun CK, Beckett LA, Petersen RC, Weiner MW, Aisen PS, Donohue MC; Alzheimer’s Disease Neuroimaging Initiative. Japanese and North American Alzheimer’s Disease Neuroimaging Initiative studies: Harmonization for international trials. Alzheimers Dement. 2018 Aug;14(8):1077-1087.
11. Iwata A, Iwatsubo T, Ihara R, Suzuki K, Matsuyama Y, Tomita N, Arai H, Ishii K, Senda M, Ito K, Ikeuchi T, Kuwano R, Matsuda H; Alzheimer’s Disease Neuroimaging Initiative; Japanese Alzheimer’s Disease Neuroimaging Initiative.Effects of sex, educational background, and chronic kidney disease grading on longitudinal cognitive and functional decline in patients in the Japanese Alzheimer’s Disease Neuroimaging Initiative study. Alzheimers Dement (N Y). 2018 Jul 12;4:765-774.
12. Ihara R, Iwata A, Suzuki K, Ikeuchi T, Kuwano R, Iwatsubo T; Japanese Alzheimer’s Disease Neuroimaging Initiative. Clinical and cognitive characteristics of preclinical Alzheimer’s disease in the Japanese Alzheimer’s Disease Neuroimaging Initiative cohort. Alzheimers Dement (N Y). 2018 Nov 26;4:645-651.
13. Ellis KA, Bush AI, Darby D, De Fazio D, Foster J, Hudson P, Lautenschlager NT, Lenzo N, Martins RN, Maruff P, Masters C, Milner A, Pike K, Rowe C, Savage G, Szoeke C, Taddei K, Villemagne V, Woodward M, Ames D; AIBL Research Group. The Australian Imaging, Biomarkers and Lifestyle (AIBL) study of aging: methodology and baseline characteristics of 1112 individuals recruited for a longitudinal study of Alzheimer’s disease. Int Psychogeriatr. 2009 Aug;21(4):672-87.
14. Clark CM, Schneider JA, Bedell BJ, Beach TG, Bilker WB, Mintun MA, Pontecorvo MJ, Hefti F, Carpenter AP, Flitter ML, Krautkramer MJ, Kung HF, Coleman RE, Doraiswamy PM, Fleisher AS, Sabbagh MN, Sadowsky CH, Reiman EP, Zehntner SP, Skovronsky DM; AV45-A07 Study Group. Use of florbetapir-PET for imaging beta-amyloid pathology. JAMA. 2011 Jan 19;305(3):275-83.
15. Shaw LM, Vanderstichele H, Knapik-Czajka M, Clark CM, Aisen PS, Petersen RC, Blennow K, Soares H, Simon A, Lewczuk P, Dean R, Siemers E, Potter W, Lee VM, Trojanowski JQ; Alzheimer’s Disease Neuroimaging Initiative. Cerebrospinal fluid biomarker signature in Alzheimer’s disease neuroimaging initiative subjects. Ann Neurol. 2009 Apr;65(4):403-13.
16. Yamane T, Ishii K, Sakata M, Ikari Y, Nishio T, Ishii K, Kato T, Ito K, Senda M; J-ADNI Study Group. Inter-rater variability of visual interpretation and comparison with quantitative evaluation of 11C-PiB PET amyloid images of the Japanese Alzheimer’s Disease Neuroimaging Initiative (J-ADNI) multicenter study. Eur J Nucl Med Mol Imaging. 2017 May;44(5):850-857.
17. Sato K, Mano T, Ihara R, Suzuki K, Tomita N, Arai H, Ishii K, Senda M, Ito K, Ikeuchi T, Kuwano R, Matsuda H, Iwatsubo T, Toda T, Iwata A; Alzheimer’s Disease Neuroimaging Initiative, and Japanese Alzheimer’s Disease Neuroimaging Initiative. Lower Serum Calcium as a Potentially Associated Factor for Conversion of Mild Cognitive Impairment to Early Alzheimer’s Disease in the Japanese Alzheimer’s Disease Neuroimaging Initiative. J Alzheimers Dis. 2019;68(2):777-788.
18. Sato K, Mano T, Matsuda H, Senda M, Ihara R, Suzuki K, Arai H, Ishii K, Ito K, Ikeuchi T, Kuwano R, Toda T, Iwatsubo T, Iwata A; Japanese Alzheimer’s Disease Neuroimaging Initiative. Visualizing modules of coordinated structural brain atrophy during the course of conversion to Alzheimer’s disease by applying methodology from gene co-expression analysis. Neuroimage Clin. 2019 Jul 25;24:101957.
19. Lautner R, Palmqvist S, Mattsson N, Andreasson U, Wallin A, Pålsson E, Jakobsson J, Herukka SK, Owenius R, Olsson B, Hampel H, Rujescu D, Ewers M, Landén M, Minthon L, Blennow K, Zetterberg H, Hansson O; Alzheimer’s Disease Neuroimaging Initiative. Apolipoprotein E genotype and the diagnostic accuracy of cerebrospinal fluid biomarkers for Alzheimer disease. JAMA Psychiatry. 2014 Oct;71(10):1183-91.
20. Max Kuhn. Contributions from Jed Wing, Steve Weston, Andre Williams, Chris Keefer, Allan Engelhardt, Tony Cooper, Zachary Mayer, Brenton Kenkel, the R Core Team, Michael Benesty, Reynald Lescarbeau, Andrew Ziem, Luca Scrucca, Yuan Tang, Can Candan and Tyler Hunt. (2018). caret: Classification and Regression Training. R package version 6.0-81. (https://CRAN.R-project.org/package=caret)
21. Pascoal TA, Mathotaarachchi S, Shin M, Park AY, Mohades S, Benedet AL, Kang MS, Massarweh G, Soucy JP, Gauthier S, Rosa-Neto P; Alzheimer’s Disease Neuroimaging Initiative.Amyloid and tau signatures of brain metabolic decline in preclinical Alzheimer’s disease. Eur J Nucl Med Mol Imaging. 2018 Jun;45(6):1021-1030.
22. Yosinski J., Clune J., Bengio Y., Lipson H. NIPS; 2014. How Transferable are Features in Deep Neural Networks? pp. 3320–3328.
23. Wee CY, Liu C, Lee A, Poh JS, Ji H, Qiu A; Alzheimers Disease Neuroimage Initiative. Cortical graph neural network for AD and MCI diagnosis and transfer learning across populations. Neuroimage Clin. 2019;23:101929.
24. Sato K, Ihara R, Suzuki K, Niimi Y, Toda T, Jimenez-Maggiora G, Langford O, Donohue MC, Raman R, Aisen PS, Sperling RA, Iwata A, Iwatsubo T. Predicting amyloid risk by machine learning algorithms based on the A4 screen data: Application to the Japanese Trial-Ready Cohort study. Alzheimers Dement (N Y). 2021 Mar 24;7(1):e12135.

THE COMPUTERIZED COGNITIVE COMPOSITE (C3) IN A4, AN ALZHEIMER’S DISEASE SECONDARY PREVENTION TRIAL

 

K.V. Papp1,2, D.M. Rentz1,2, P. Maruff3,4, C.-K. Sun5, R. Raman5, M.C. Donohue5, A. Schembri4, C. Stark6, M.A. Yassa6, A.M. Wessels7, R. Yaari7, K.C. Holdridge7, P.S. Aisen5, R.A. Sperling1,2 on behalf of the A4 Study Team*

 

1. Department of Neurology, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA; 2. Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA; 3. The Florey Institute of Neuroscience and Mental Health, University of Melbourne, Parkville, Victoria, Australia; 4. Cogstate, Ltd, Melbourne, Victoria, Australia; 5. Alzheimer Therapeutic Research Institute, Keck School of Medicine, University of Southern California, San Diego, CA, USA; 6. Center for the Neurobiology of Learning and Memory and Department of Neurobiology and Behavior, University of California Irvine, Irvine, California, USA; 7. Eli Lilly and Company, Indianapolis, Indiana, USA; *Full listing of A4 Study team and site personnel available at A4STUDY.org

Corresponding Author: Kathryn V. Papp, Center for Alzheimer Research and Treatment; 60 Fenwood Road; Boston, MA 02115, Telephone: 617-643-5322; Fax: 857-5461, Email Address: kpapp@bwh.harvard.edu

J Prev Alz Dis 2021;1(8):59-67
Published online June 19, 2020, http://dx.doi.org/10.14283/jpad.2020.38

 


Abstract

Background: Computerized cognitive assessments may improve Alzheimer’s disease (AD) secondary prevention trial efficiency and accuracy. However, they require validation against standard outcomes and relevant biomarkers.
Objective: To assess the feasibility and validity of the tablet-based Computerized Cognitive Composite (C3).
Design: Cross-sectional analysis of cognitive screening data from the A4 study (Anti-Amyloid in Asymptomatic AD).
Setting: Multi-center international study.
Participants: Clinically normal (CN) older adults (65-85; n=4486)
Measurements: Participants underwent florbetapir-Positron Emission Tomography for Aβ+/- classification. They completed the C3 and standard paper and pencil measures included in the Preclinical Alzheimer’s Cognitive Composite (PACC). The C3 combines memory measures sensitive to change over time (Cogstate Brief Battery-One Card Learning) and measures shown to be declining early in AD including pattern separation (Behavioral Pattern Separation Test- Object- Lure Discrimination Index) and associative memory (Face Name Associative Memory Exam- Face-Name Matching). C3 acceptability and completion rates were assessed using qualitative and quantitative methods. C3 performance was explored in relation to Aβ+/- groups (n=1323/3163) and PACC.
Results: C3 was feasible for CN older adults to complete. Rates of incomplete or invalid administrations were extremely low, even in the bottom quartile of cognitive performers (PACC). C3 was moderately correlated with PACC (r=0.39). Aβ+ performed worse on C3 compared with Aβ- [unadjusted Cohen’s d=-0.22 (95%CI: -0.31,-0.13) p<0.001] and at a magnitude comparable to the PACC [d=-0.32 (95%CI: -0.41,-0.23) p<0.001]. Better C3 performance was observed in younger, more educated, and female participants.
Conclusions: These findings provide support for both the feasibility and validity of C3 and computerized cognitive outcomes more generally in AD secondary prevention trials.

Key words: Digital biomarkers, cognition, computerized testing, preclinical Alzheimer’s disease, secondary prevention.


 

Introduction

Computerized cognitive assessments have the potential to significantly reduce data administration and scoring errors, site burden, and cost in Alzheimer’s disease (AD) secondary prevention trials as cognitive screening tools and outcome measures. These assessments have yet to replace paper and pencil measures as primary outcomes given several remaining questions: How feasible are computerized assessments in normal older adults and older adults who progress to Mild Cognitive Impairment (MCI) over the course of a trial? How reliable is the data collected? And finally, how valid are computerized cognitive assessments, that is, are they related to gold-standard paper and pencil primary outcomes and AD pathology targeted in a given intervention?
The Anti-Amyloid in Asymptomatic Alzheimer’s (A4) study (1, 2) offers a unique opportunity to address some of these questions by assessing the feasibility and validity of the Computerized Cognitive Composite (C3) in a very large multi-site AD secondary prevention study targeting clinically normal (CN) older adults with elevated cerebral amyloid (2). The C3 is derived using two well-validated memory paradigms from the cognitive neuroscience literature: the Face Name Associative Memory Exam (FNAME) and the Behavioral Pattern Separation Task-Object (BPS-O). It also includes measures from the Cogstate Brief Battery (CBB) which uses playing cards to assess visual memory in addition to reaction time (RT) and working memory and was designed to be sensitive to change over time with randomized alternate forms. The CBB has been studied in relationship to AD neuroimaging markers in several cohort studies of normal older adults (3, 4). Behavioral versions of the FNAME (5, 6) and a modified version of the BPS-O (7) were selected for inclusion in the C3 as they have been shown to elicit aberrant activity in the medial temporal lobes during functional imaging studies in individuals at risk for AD based on biomarkers (8-10). More specifically, these individuals fail to habituate to repeated stimuli (FNAME) or during both correct rejections and false alarms (BPS-O), neural signatures consonant with successful memory formation. The C3 was identified a-priori to include one primary memory outcome from each component measure including: the BPS-O lure discrimination index, Face-Name Matching accuracy, and One-Card Learning accuracy.
The aim of this study was to assess the feasibility and validity of the C3 in CN older adults participating in a secondary prevention trial. Specific goals included determining whether reliable C3 data was consistently captured using a touchscreen tablet and whether data reliability decreased in the lowest cognitive performers. To assess the validity of the C3, we investigated 1) whether the C3 was related to the primary study outcome: performance on traditional paper and pencil measures (i.e., the Preclinical Alzheimer’s Cognitive Composite- PACC) 2) whether the C3 was related to cerebral amyloid (Aβ) and 3) whether the magnitude of this relationship was comparable to that observed between PACC and Aβ+/-. In addition to our main aims, we explored whether improved performance with C3 retesting using alternate forms differentiated between Aβ+/- individuals above and beyond cross-sectional performance. Finally, we explored performance on the constituent tests from the C3 and their relationships with Aβ status, demographic characteristics, and paper and pencil measures. The implications of these findings as they relate to the design and use of future computerized outcomes in secondary prevention trials are discussed.

 

Methods

Participants and Study Design

The A4 Study is a double-blind, placebo-controlled 240-week Phase 3 trial of an anti-Aβ monoclonal antibody in CN older adults with preclinical AD (2) occurring across 67 sites. Participants interested in enrolling in A4 were required to be aged 65 to 85 and were deemed clinically normal (CN) based on Mini Mental Status Exam (MMSE) ranging from 25-30 and Global Clinical Dementia (CDR) Rating Score of 0. During their initial screening visit, participants completed traditional and computerized cognitive testing (detailed further below). Prior to enrollment, they underwent a florbetapir Positron Emission Tomography (PET) for classification of Aβ status (Table 1) at a second visit. On their third visit, all potential participants completed computerized testing and were subsequently provided with results of their AD biomarker imaging and informed about whether they were eligible (Aβ+) or ineligible (Aβ-) to enroll in the trial. The current study includes cognitive screening data at 2 timepoints for Aβ+ and Aβ- individuals.

Table 1. Participant Characteristics by Aβ Status

NOTE. Two-sample t-test with unequal variances were used for continuous variables and Fisher’s Exact test for categorical variables. Values are Mean (Standard Deviation) unless otherwise indicated.

 

Cognitive Measures

The primary outcome for the A4 Study is performance on the PACC, a multi-domain composite of paper and pencil measures (11). Measures contributing to the C3 are administered on a touchscreen tablet using the Cogstate platform and serve as an exploratory outcome. All participants completed the PACC and C3 at the first screening visit (Visit 1) and an alternate C3 within 90 days (mean=55 days) at the study eligibility visit (Visit 3) prior to study eligibility disclosure.

Paper and Pencil Cognitive Testing: The PACC

The PACC, described in detail elsewhere (11), is calculated as the sum of mean performance across four measures normalized using a z-score including the MMSE (0–30), the WMS-R Logical Memory Delayed Recall (LMDR; 0–25), the Digit-Symbol Coding Test (DSC; 0–93), and the Free and Cued Selective Reminding Test–Free + Total Recall (FCSRT96; 0–96) (2).

Computerized Testing: The C3

Figure 1 provides a schematic of C3 Components: BPS-O, FNAME and the CBB. An examiner is present in the testing room and initially guides administration, but the battery has the potential to be completed largely independently in the context of written on-screen instructions and automatic transitions between tasks (12).

 

Figure 1. C3 Task Schematic

NOTE. All tasks are completed on a tablet using a touchscreen. Stimuli in gray are not scored.

 

Behavioral Pattern Separation- Object (BPS-O; more recently termed the Mnemonic Similarity Test)

Participants are presented with images of 40 everyday objects serially and are allotted 5 seconds to determine whether the item is for use “indoors” or “outdoors” to ensure adequate attentiveness to stimuli (7). Participants are subsequently shown 20 of the same items interspersed with both novel images and lure images. They are asked to categorize each image as: Old, Similar, or New within 5 seconds. Accuracy and RT measures are collected. Of interest is the rate at which participants can correctly identify lures as “Similar” rather than as “Old.” The lure discrimination index (LDI) is computed as the proportion of “Similar” responses given to lure items minus the ratio of “Similar” responses given to the foils (the latter is to correct for response bias). The LDI is the primary outcome from the BPS-O task. A higher LDI indicates better pattern separation performance.

Face-Name Associative Memory Exam (FNAME)

Participants are shown 12 face-name pairs presented serially. For each face-name pair, the participant is asked whether the name “fits” or “doesn’t fit” the face to ensure adequate attentiveness to the stimuli. Participants are allowed 5 seconds to respond and are asked to try to remember the face-name pair. Following the learning phase, the CBB tests serve as a 12 to 15-minute delay. Subsequently, there are three measures of memory including face recognition (FSBT), first letter name recall (FNLT) and face-name matching (FNMT). In FSBT, participants are asked to identify the previously learned faces, presented alongside two distractor faces of matching age, race, and sex. The target face is subsequently presented with a touchscreen keyboard and the participant selects the first letter of the name paired with that face (FNLT). Finally, the target face is presented with three names (target name, a re-paired same-sex name, and an age and sex-matched foil name) and the participant must select the correct name (FNMT). Accuracy for each component is scored /12 with FNMT number of correct matches serving as the primary outcome of interest.

Cogstate Brief Battery (CBB)

The CBB (13, 14) uses playing cards as stimuli and includes a measure of attention (Detection-DET),reaction time RT (Identification-IDN), working memory (One-Back Test-ONB), and visual memory (One-Card Learning-OCL). Measures of RT and accuracy are recorded. To address skewness, a log10 transformation is applied to RT measures and an arcsin sqrt transformation is applied to accuracy measures. In DET, participants are required to tap ‘Yes’ as quickly as possible in response to a stimulus card turning face-up. The task continues until 35 correct trials are recorded. The outcome is RT. In IDN, a participant must select whether the card is red or not red; thirty correct trials are required. RT is the primary outcome for IDN; IDN accuracy was also examined. In ONB, participants must indicate “yes” or “no” whether the current card is equivalent to the previously seen card. In OCL, participants must learn a series of playing cards by responding ‘yes’ or ‘no’ to whether the card has been previously seen in the task. For ONB and OCL, both RT and accuracy are computed. Here, we examined RT and Accuracy for both IDN and ONB. We examined only RT for DET and only Accuracy for OCL.

The C3

Constituents of the C3 were identified a-priori and include one primary memory outcome from each measure including the BPS-O LDI, FNMT, and OCL. The C3 is computed as the average of these z-scored outcomes derived from the study population at Visit 1.

Data Quality

Data from individual C3 measures were included in analyses if they met pre-specified task-specific completion checks (Supplementary Table 1). For example, OCL for a given participant is included in analyses if the participant responds in ≥75% of trials. Study rater comments were also reviewed to better determine C3 usability and acceptability.

Amyloid PET Imaging

Eligible participants completed a florbetapir PET scan at Visit 2. Scan acquisition occurred over 50-70 minutes following an injection of 10mCi of florbetapir-F18. Aβ binding was assessed using mean standardized uptake value ratio (SUVr) with whole cerebellar gray as a reference region. Participants were deemed eligible (Aβ+) versus not eligible (Aβ-) using an algorithm combining both quantitative SUVr (>1.15) information and a centrally-determined visual read (2).

Statistical Analyses

Primary analyses were performed on the C3 at Visit 1. To assess C3 feasibility and data validity, test completion rates and performance checks were computed (Supplementary Table 1) and rates subsequently compared between Aβ+/- groups using Chi-square tests. Rater comments were systematically reviewed and observations by raters were grouped into categories (e.g., technical issue, interruptions) and the frequency of observations made in each category were computed. To infer C3 feasibility and data validity in those who may develop impairment over the course of the A4 study, we compared test completion rates and performance checks between the lowest cognitive performers (bottom quartile on PACC) with typical cognitive performers using chi square tests.
Demographic differences between Aβ+/- groups were assessed using Welch’s two-sample t-tests for continuous variables and Fisher’s Exact test for categorical variables (e.g., age, APOE). Linear models were fit to compare cognitive performance across males and females. Linear models were fit to compare cognitive performance across Aβ+/- while adjusting for covariates: age, sex, and education. Effect size was computed as a Cohen’s d (mean difference between Aβ+ and Aβ- groups divided by the pooled standard deviation) with 0.01 representing a “very small” effect, 0.20 representing a “small” effect, and 0.5 representing a “medium” effect (15). Comparable linear models were performed and effect sizes calculated for individual C3 components to examine Aβ+/- group differences on individual C3 measures (e.g., OCL, ONB, BPS-O). No adjustments were made for multiple comparisons; however, results are reported as point estimates and 95% confidence intervals.
Differences in performance between Visit 1 and Visit 3 were examined using linear models of difference scores with Aβ status, age, sex, and education as covariates.
Pearson correlation coefficients were computed to assess the relationships between C3 and demographic characteristics as well as C3 and the PACC. Pearson correlation coefficients were similarly used to assess the relationships among C3 components and PACC components to assess the convergent and discriminant validity between memory versus non-memory tasks on C3 versus PACC.
Linear models were also fit to compare cognitive performance between ε4+/- while adjusting for covariates: age, sex, and education.
All analyses were conducted using R version 3.6.1 (R-project.org).

 

Results

Feasibility of the C3

Completion and performance checks were met in >98% of individual test administrations within the C3 (Supplementary Table 1) and equivalent by Aβ+/. Raters reported issues in approximately 4% of C3 administrations. The most commonly reported problem (reflecting 0.7% of administrations) was that the tablet was insufficiently responsive to a participant’s finger taps and/or the participant was mis-tapping by either hovering their fingers too closely to the screen or by tapping too quickly. The second most commonly reported issue (0.5% of administrations) was overly deliberative responding on BPS-O and FNAME causing items to time-out. This was followed by non-specific technical issues (e.g., frozen program, interruptions from low battery signal or software update, glitches such as stimulus not loading or items auto-proceeding). Report of confusion with task instructions was very low (reported in 0.3% of administrations). Participants most commonly had difficulty understanding instructions for ONB and OCL; additionally, some reported confusion regarding the goal of the judgment component of BPS-O and FNAME learning components (i.e., indoor vs. outdoor, fits vs. doesn’t fit). Despite this, few participants (<3%) failed to make an “indoor/outdoor” or a “fits” judgment on more than 3 items. Participants refused to continue C3 testing in <0.002% of administrations with the most common reasons including frustration and fatigue.

Predictions for the Feasibility of the C3 Longitudinally

To preliminarily estimate whether the C3 (to be completed at 6-month intervals for the A4 study duration) will remain feasible in participants experiencing cognitive decline, we examined C3 performance in the lowest cognitive performers on PACC. The magnitude of the C3 Aβ group difference increased by a factor of 5.2 when restricting the Aβ+ group to the bottom quartile of PACC [adjusted cohen’s d=-0.57 (95%CI:-0.68, -0.45) p<0.001], however, no significant changes in rates of performance completion and performance checks were observed.

Demographic and Clinical Characteristics

Aβ+ were older compared with Aβ- (Table 1). There were no group differences for sex or education level. Aβ+ exhibited a higher rate of ε4 positivity and higher proportion of Caucasians compared with Aβ-.

C3 Performance

Aβ+ performed worse on the C3 compared with Aβ- (unadjusted d=-0.22, adjusted d=-0.11), mirroring the Aβ+/- performance difference on the PACC (unadjusted d=-0.32, adjusted d=-0.18) (Figure 2; Table 2). Importantly, the majority of participants were performing in the normal range, with performance in Aβ+ on average only -0.08 standard deviations below the mean. In addition to Aβ positivity (Beta=-0.07 p=0.002), older age (Beta= -0.04 p<0.0001), less education (Beta= 0.03 p<0.0001), and male sex (Beta=-0.10 p<0.0001) contributed to overall worse C3 performance. Models adjusted for demographic features generally resulted in smaller Aβ+/- effect sizes compared with unadjusted models (Figure 2). For example, there was 66% decrease in effect size between the unadjusted (d=-0.22) and adjusted C3 (d=-0.11). C3 and PACC were moderately correlated (r=0.39, p<0.001). However, both contributed unique explanatory variance about Aβ+/- status when modeled together (Supplementary Table 2 Model A).
Improved performance at re-testing was observed for C3 with an average increase of 0.25 standard deviations between visits (Beta=0.25, p<0.0001). However, there was no relationship between Aβ status and differential improvement on C3 re-testing (Beta= 0.00, p=0.961). Importantly, Aβ+ continued to perform worse on the C3 compared with Aβ- and this group difference was at a comparable magnitude as compared with initial testing (re-testing cohen’s d=-0.21, p<0.0001).

Table 2. Group Differences Between Aβ+ versus Aβ- on C3 at Screening Visit 1

Note. M=mean, SD=standard deviation; PACC=Preclinical Alzheimer’s Cognitive Composite; C3= Computerized Cognitive Composite; BPS-O= Behavioral Pattern Separation Task-Object; LDI=Lure Discrimination Index; FNAME=Face-Name Associative Memory Exam; FNLT=1st letter Name Recall; FNMT=Face-Name Matching; FSBT=Facial Recognition; CBB=Cogstate Brief Battery; RT=reaction time; Acc=Accuracy; DET=Detection; IDN=Identification; ONB=One-Back Test; OCL=One-Card Learning.

 

 

Figure 2. Covariate-Unadjusted and Adjusted Group Differences (Effect Sizes: Cohen’s d) Between Aβ+/Aβ- Groups at Screening Visit 1

Note. Smaller effect size (Cohen’s d) is associated with worse performance in Aβ+ (n=1323) relative to Aβ- (n=3163). Top (unadjusted) and bottom (covariate-adjusted). PACC=Preclinical Alzheimer’s Cognitive Composite; C3= Computerized Cognitive Composite; FNAME=Face-Name Associative Memory Exam; CBB=Cogstate Brief Battery; RT=reaction time; Acc=Accuracy

 

Individual C3 Components

Individual C3 components which showed statistically significant differences between groups were BPS-O LDI, FNAME FNMT, CBB IDN accuracy, ONB accuracy and RT, and OCL accuracy. When adjusting for demographics, FNAME FNMT and ONB RT were no longer significant. Interestingly, for IDN RT, Aβ+ exhibited a statistical trend towards unexpectedly faster RT compared with Aβ- (adjusted d=-0.06, p=0.055). Despite a trend towards being slightly faster, Aβ+ were less accurate for IDN compared with Aβ- (unadjusted d=-0.25, adjusted d=-0.14). IDN Accuracy was correlated with IDN RT (r= -0.30, p<0.001) such that generally faster RT for correct responses was associated with reduced overall accuracy. However, when both IDN Accuracy and IDN RT were incorporated into the sample model to predict Aβ status, only reduced IDN Accuracy was a significant predictor (Supplementary Table 2 Model B).

Correlations Among C3 Components, Demographics, PACC

Age

Greater age was associated with worse performance across all C3 outcomes (Table 3). This association was strongest for the overall C3 Composite (r=-0.29, p<0.001). Age was least associated with RT tasks including DET (r=-0.13, p<0.001) and IDN (r=-0.11, p<0.001).

Table 3. Pearson correlation coefficients (r) Among C3 Components and Demographics

Note. Higher value represents better performance. PACC=Preclinical Alzheimer Cognitive Composite; C3= Computerized Cognitive Composite; BPS-O= Behavioral Pattern Separation Task-Object; LDI=Lure Discrimination Index; FNAME=Face-Name Associative Memory Exam; FNLT=1st letter Name Recall; FNMT=Face-Name Matching; FSBT=Facial Recognition; CBB=Cogstate Brief Battery; RT=reaction time; Acc=Accuracy; DET=Detection; IDN=Identification; ONB=One-Back Test; OCL=One-Card Learning; FCSRT=Free and Cued Selective Reminding Test; DSST=Digit Symbol Substitution Test

 

Education

Higher education was associated with better performance on all individual C3 outcomes, with the largest impact on OCL accuracy (r= 0.13, p<0.001) followed by the overall C3 (r=0.12, p<0.001). The only exception was ONB RT where faster performance was associated with lower education.

Sex

Women outperformed men on all components of FNAME including FNLT (d= -0.46, p<0.0001), FNMT (d= -0.36, p<0.0001), and FSBT (d= -0.39, p<0.0001). Women also outperformed men on IDN Accuracy (d= -0.16, p<0.0001) and ONB Accuracy (d=-0.08, p=0.019). Interestingly, however, men outperformed women on DET (d= -0.23, p<0.0001) and ONB RT (d= -0.12, p<0.001). Performance between the sexes was comparable for BPS-O, IDN RT, and OCL Accuracy.
On OCL, Aβ+ females did not perform differently compared with Aβ- females [Estimate=-0.00 (0.01), p=0.468]. However, Aβ- males performed worse compared with Aβ+ males [Estimate=-0.02 (0.01), p=0.0006]. This suggests that OCL captures subtle decrements in memory between Aβ+/- men but not women. A non-significant statistical trend toward the same pattern was observed in BPS-O.

PACC and C3

Correlations among components of the 2 composites tended to be more strongly-related in a domain-specific manner providing support for convergent and discriminant validity (Table 3). For example, DET and IDN were correlated with DSST at r=0.26 and 0.31, respectively while not being significantly related to memory components of the PACC (FCSRT, Story Memory) or MMSE.

The C3 and APOE Status

There was no difference in performance between APOEε4 carriers vs. non-carriers on the C3 [adjusted d= -0.03 (95% CI: -0.09, 0.03), p=0.379] or on individual C3 outcomes (not shown). The model for carrier vs. non-carrier group differences did not improve with the removal of demographic covariates in contrast with models for Aβ+/- [unadjusted d= 0.03 (95% CI: -0.05, 0.10), p=0.470]. Finally, we did not observe an interaction between E4 and Aβ status on the C3.

 

Discussion

Among a large sample of CN older adults screening for an AD secondary prevention trial, assessment of cognition using a tablet-based measure (C3) was feasible. Diminished C3 performance was associated with worse PACC performance and elevated Aβ. Although the magnitude of the Aβ+/- group difference was statistically small (d= -0.11, once adjusted for covariates) it was comparable to that observed on well-established and clinically meaningful paper and pencil measures included in the primary outcome, i.e., the PACC (d= -0.18). Performance on the C3 was also reliable, with an equal Aβ+/- group effect on the C3 at retesting within 90 days. More broadly, these findings suggest that computerized testing has the potential to replace traditional paper and pencil primary outcomes in future trials- representing a potential shift in clinical trial cognitive assessment methodology. Additionally, these results further confirm the small but consistent association between Aβ burden and cognition cross-sectionally within a CN population.

Usability/Acceptability of the C3

The very low rates of incomplete and/or invalid administrations for the C3 battery indicate that in the older adults assessed, even those with little computer literacy, the supervised tablet-based cognitive testing has high acceptability. Rates of completion and performance check failures remained low in a subset of low PACC performers, providing early evidence for C3 feasibility longitudinally as some participants show progressive cognitive decrements over the course of the study. Study procedures required a rater to supervise C3 testing, however, raters noted that many participants did not require significant assistance after completing the first few measures. This was further evidence by improved performance on re-testing as participants gained familiarity with the device and tasks. Future trials may consider further optimizing computerized tasks to be self-guided to reduce rater training and time. Potential barriers to tablet-based testing were infrequent, largely addressable, and unlikely to systematically affect performance on the C3. These included inexperience with tablets leading both to mis-tapping and difficulty registering finger taps. Many older adults emphasized accuracy over speed during learning trials, resulting in time-outs. Several of these issues can be addressed with modifications to instructions and design (e.g., including a timer indicator) while others will diminish over time with secular trends toward increased familiarity with digital technology.

The C3 Composite and Individual C3 Measures by Aβ+/-

Components of C3 tests which differed between Aβ+/- groups were primarily in memory (BPS-O; OCL) but also included working memory (ONB). The difference in pattern separation memory performance between Aβ+/- participants extends previous fMRI works showing an association between AD biomarkers (including Aβ -PET) and aberrant fMRI activity during learning on a pattern separation task in normal older adults (9) to a difference in frank performance. The BPS-O (10) was designed in part to capture a weakened “novelty signal”, that is, a reduced ability to correctly discriminate between stimuli that are similar but not identical to previously encountered targets. This tendency to misidentify similar lures as targets has been conceptualized as an error in pattern separation (16). Aβ group differences were also observed on face-name memory but this effect was significantly attenuated when controlling for demographic features. In contrast with other C3 memory measures (OCL Accuracy and BPS-O) there was a significant sex effect whereby women generally performed better on all aspects of FNAME compared with men. This may be attributable to a general female advantage in verbal memory (17), however, it may be related to the nature of the information. Previous work with FNAME indicates a diminishment of the sex effect when requiring memory for occupation-face versus name-face pairs (5, 18). Our findings from the CBB measures were consistent with previous results examining this battery in relationship to AD neuroimaging markers in normal older adults. Poorer performance on OCL has been associated with higher levels of CSF phosphorylated-tau/Abeta42 in late middle-aged participants in the Wisconsin Registry for Alzheimer’s Prevention (4). Similarly, we found that OCL was sensitive. However, we also found that working memory (ONB) was also relatively strongly associated with elevated Aβ. While C3 constituents were selected theoretically and a-priori, ONB may be considered for inclusion in future optimized and/or data-driven C3 versions. Interestingly, the Aβ+ group made more errors on a Cogstate RT task (IDN) but paradoxically also performed the task more quickly compared with the Aβ- group. These findings suggest that faster RT may, in fact, be a sign of subtle decrements. One explanation for this finding is an age-associated decrease in inhibition of pre-potent responses (19) may be more pronounced in preclinical AD. More broadly, it confirms that early cognitive changes in preclinical AD extend beyond memory (20, 21).
Part of the impetus for combining outcomes from the BPS-O, FNAME, and CBB into a C3, is aligned with the rationale for cognitive composites as primary endpoints (22) to maximize signal to noise ratio in a population expected to exhibit subtle cognitive decrements. This was confirmed in our data whereby the combination of FNMT, BPS-O, and OCL into the C3 resulted in a numerically larger effect size compared with any single one of these measures alone. However, there are multiple means of constructing composites including data-driven approaches; for example, selecting measures most associated with Aβ cross-sectionally or measures most sensitive to change. The current C3 was theoretically derived on the basis of previous literature and longitudinal data is needed to confirm its sensitivity over time. Importantly, different memory measures provided related but partially unique information about Aβ status. For example, both BPS-O and OCL were significant predictors of Aβ status when included in the same model (Supplementary Table 2 Model C). More recent work examining the heterogeneity of cognitive decline in early AD suggests that different atrophy patterns are associated with different cognitive trajectories (23). A cognitive composite would thus benefit from being sufficiently broad to avoid under/overestimating decline in a given subgroup.
Our finding that OCL differentiated Aβ+ vs Aβ- men but not women highlights the issue of heterogeneity in a different light. Males and females performed equivalently for visual memory of playing cards (OCL) but females outperformed males on face-name memory. We hypothesize that visual card-based tasks may be both more engaging and an area of relative strength for males versus females in contrast with name memory (17). Regardless, these findings highlight the rationale for composite scores and the opportunity to use C3 to better understand demographic and individual differences in performance and cognitive trajectories.

C3 Performance and ε4 Status

The lack of a group difference in C3 performance between ε4 carriers vs. non-carriers is not unexpected given the specific recruitment of CN older adults and the current cross-sectional analysis. This is evidenced by the further diminishment of group differences between e4+ vs. e4- participants when including age as a covariate. In contrast, removal of age as a covariate systematically increased the Aβ+ vs. Aβ- group differences.

C3 and Re-testing

Consistent with the literature, participants performed slightly better on re-testing which is consistent with increased familiarity with the tablet and task demands (3). Diminished practice effects have been shown to predict incident MCI and/or dementia (24, 25) and have been suggested as a screening tool (26). However, we did not observe differential improvement in performance by Aβ group status. Future adjustments to the FNAME paradigm emphasizing item versus task familiarity may increase the relevance of a diminished practice effect. More specifically, using repeated versus alternate stimuli may capture more AD-specific learning over repeated exposures to the same material (27). C3 practice effects are likely to diminish significantly after the second administration (24). Likewise, item familiarity practice effects are unlikely to contribute to C3 trajectories over time given that all remaining versions are unique.

 

Conclusions

Within the context of AD secondary prevention trials, our results indicate that computerized (tablet-based) cognitive testing is feasible in older adults in a secondary prevention trial setting and we provide support for the validity of such testing as the C3 was 1) correlated with the primary outcome of paper and pencil composite performance (PACC), 2) related to AD pathological burden (Aβ+/-) and 3) related to Aβ+/- at a similar magnitude as the PACC. Positive relationships with AD biomarkers and PACC suggest that the C3 is capturing meaningful cognitive decrements and, has the potential to serve as a proxy for paper and pencil measures in future trials. In addition to reducing staff time and allowing the possibility for remote assessment, computerized testing has the potential to capture a greater quantity and more nuanced quality of data for each measure. Future work will determine the sensitivity of the C3 to change over time in the context of an anti-amyloid treatment trial.

 

Acknowlegments and funding: The A4 Study is a secondary prevention trial in preclinical Alzheimer’s disease, aiming to slow cognitive decline associated with brain amyloid accumulation in clinically normal older individuals. The A4 Study is funded by a public-private-philanthropic partnership, including funding from the National Institutes of Health-National Institute on Aging (U19AG010483; R01AG063689), Eli Lilly and Company, Alzheimer’s Association, Accelerating Medicines Partnership, GHR Foundation, an anonymous foundation and additional private donors, with in-kind support from Avid, Cogstate, Albert Einstein College of Medicine, US Against Alzheimer’s disease, and Foundation for Neurologic Diseases. The companion observational Longitudinal Evaluation of Amyloid Risk and Neurodegeneration (LEARN) Study is funded by the Alzheimer’s Association and GHR Foundation. The A4 and LEARN Studies are led by Dr. Reisa Sperling at Brigham and Women’s Hospital, Harvard Medical School and Dr. Paul Aisen at the Alzheimer’s Therapeutic Research Institute (ATRI), University of Southern California. The A4 and LEARN Studies are coordinated by ATRI at the University of Southern California, and the data are made available through the Laboratory for Neuro Imaging at the University of Southern California. The participants screening for the A4 Study provided permission to share their de-identified data in order to advance the quest to find a successful treatment for Alzheimer’s disease. We would like to acknowledge the dedication of all the participants, the site personnel, and all of the partnership team members who continue to make the A4 and LEARN Studies possible. The complete A4 Study Team list is available on: a4study.org/a4-study-team.

Conflicts of interest: K Papp has served as a consultant for Biogen Idec and Digital Cognition Technologies. D Rentz has served as a consultant for Eli Lilly, Biogen Idec, Lundbeck Pharmaceuticals, and serves as a member of the Scientific Advisory Board for Neurotrack. P Maruff is a full-time employee of Cogstate Ltd. C-K. Sun has no disclosures to report. R. Raman has no disclosures to report. M. Donohue has served on scientific advisory boards for Biogen, Eli Lilly, and Neurotrack; and has consulted for Roche. His spouse is a full-time employee of Janssen. A. Schembri is a full-time employee of Cogstate Ltd. C. Stark has no disclosures to report. M Yassa has served as a consultant for Pfizer, Eli Lilly, Lundbeck and Dart Neuroscience and is chief scientific officer of Signa Therapeutics, LLC. A. Wessels is a full-time employee of Eli Lilly and Company. R. Yaari is a full-time employee of Eli Lilly and Company. K. Holdridge is a full-time employee of Eli Lilly and Company. P. Aisen has received research funding from NIA, FNIH, the Alzheimer’s Association, Janssen, Lilly and Eisai, and personal fees from Merck, Roche, Biogen, ImmunoBrain Checkpoint and Samus. R.A. Sperling has received research funding from NIH, Alzheimer’s Association and Eli Lilly for this research. She has served as a consultant for AC Immune, Biogen, Eisai, Janssen, Neurocentria and Roche. Her spouse has served as a consultant to Biogen, Janssen, and Novartis.

Ethical Standards: Study procedures were conducted in accordance with consensus ethics principles derived from international ethics guidelines, including the Declaration of Helsinki and Council for International Organizations of Medical Sciences (CIOMS) International Ethical Guidelines.

Open Access: This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, duplication, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

 

SUPPLEMENTARY MATERIAL

 

References

1. Sperling, R.A., Rentz, D.M., Johnson, K.A., et al., The A4 study: stopping AD before symptoms begin? Sci Transl Med, 2014. 6(228): p. 228fs13.
2. Sperling, R.A., Donohue, M., Raman, R., et al., Factors associated with elevated amyloid burden in clinically normal older individuals in the A4 Study screening cohort. JAMA Neurology, in press.
3. Mielke, M.M., Weigand, S.D., Wiste, H.J., et al., Independent comparison of CogState computerized testing and a standard cognitive battery with neuroimaging. Alzheimers Dement, 2014. 10(6): p. 779-89.
4. Racine, A.M., Clark, L.R., Berman, S.E., et al., Associations between Performance on an Abbreviated CogState Battery, Other Measures of Cognitive Function, and Biomarkers in People at Risk for Alzheimer’s Disease. J Alzheimers Dis, 2016. 54(4): p. 1395-1408.
5. Papp, K.V., Amariglio, R.E., Dekhtyar, M., et al., Development of a psychometrically equivalent short form of the Face-Name Associative Memory Exam for use along the early Alzheimer’s disease trajectory. Clin Neuropsychol, 2014. 28(5): p. 771-85.
6. Rentz, D.M., Locascio, J.J., Becker, J.A., et al., Cognition, reserve, and amyloid deposition in normal aging. Ann Neurol, 2010. 67(3): p. 353-64.
7. Stark, S.M., Yassa, M.A., Lacy, J.W., and Stark, C.E., A task to assess behavioral pattern separation (BPS) in humans: Data from healthy aging and mild cognitive impairment. Neuropsychologia, 2013. 51(12): p. 2442-9.
8. Vannini, P., Hedden, T., Becker, J.A., et al., Age and amyloid-related alterations in default network habituation to stimulus repetition. Neurobiol Aging, 2012. 33(7): p. 1237-52.
9. Marks, S.M., Lockhart, S.N., Baker, S.L., and Jagust, W.J., Tau and beta-Amyloid Are Associated with Medial Temporal Lobe Structure, Function, and Memory Encoding in Normal Aging. J Neurosci, 2017. 37(12): p. 3192-3201.
10. Kirwan, C.B. and Stark, C.E., Overcoming interference: an fMRI investigation of pattern separation in the medial temporal lobe. Learn Mem, 2007. 14(9): p. 625-33.
11. Donohue, M.C., Sperling, R.A., Salmon, D.P., et al., The Preclinical Alzheimer Cognitive Composite: Measuring Amyloid-Related Decline. JAMA Neurol, 2014. 71(8): p. 961-970.
12. Rentz, D., Dekhtyar, M., Sherman, J., et al., The Feasibility of At-Home iPad Cognitive Testing For Use in Clinical Trials. J Prev Alzheimers Dis, 2016. 3(1): p. 8-12.
13. Fredrickson, J., Maruff, P., Woodward, M., et al., Evaluation of the usability of a brief computerized cognitive screening test in older people for epidemiological studies. Neuroepidemiology, 2010. 34(2): p. 65-75.
14. Maruff, P., Lim, Y.Y., Darby, D., et al., Clinical utility of the cogstate brief battery in identifying cognitive impairment in mild cognitive impairment and Alzheimer’s disease. BMC Psychol, 2013. 1(1): p. 30.
15. Sawilowsky, S.S., New Effect Size Rules of Thumb. Journal of Modern Applied Statistical Methods, 2009. 8(2): p. 26.
16. Yassa, M.A., Lacy, J.W., Stark, S.M., et al., Pattern separation deficits associated with increased hippocampal CA3 and dentate gyrus activity in nondemented older adults. Hippocampus, 2011. 21(9): p. 968-79.
17. Sundermann, E.E., Biegon, A., Rubin, L.H., et al., Does the Female Advantage in Verbal Memory Contribute to Underestimating Alzheimer’s Disease Pathology in Women versus Men? J Alzheimers Dis, 2017. 56(3): p. 947-957.
18. Buckley, R., Sparks, K., Papp, K., et al., Computerized cognitive testing for use in clinical trials: a comparison of the NIH Toolbox and Cogstate C3 batteries. The journal of prevention of Alzheimer’s disease, 2017. 4(1): p. 3.
19. Butler, K.M. and Zacks, R.T., Age deficits in the control of prepotent responses: evidence for an inhibitory decline. Psychol Aging, 2006. 21(3): p. 638-43.
20. Petersen, R.C., Department of Neurology, M.C.a.F., Rochester, Minnesota, Department of Health Sciences Research, M.C.a.F., Rochester, Minnesota, et al., Association of Elevated Amyloid Levels With Cognition and Biomarkers in Cognitively Normal People From the Community. JAMA Neurology, 2016. 73(1): p. 85-92.
21. Baker, J.E., Lim, Y.Y., Pietrzak, R.H., et al., Cognitive impairment and decline in cognitively normal older adults with high amyloid-beta: A meta-analysis. Alzheimers Dement (Amst), 2017. 6: p. 108-121.
22. Kozauer, N. and Katz, R., Regulatory innovation and drug development for early-stage Alzheimer’s disease. N Engl J Med, 2013. 368(13): p. 1169-71.
23. Zhang, X., Mormino, E.C., Sun, N., et al., Bayesian model reveals latent atrophy factors with dissociable cognitive trajectories in Alzheimer’s disease. Proc Natl Acad Sci U S A, 2016. 113(42): p. E6535-e6544.
24. Machulda, M.M., Pankratz, V.S., Christianson, T.J., et al., Practice effects and longitudinal cognitive change in normal aging vs. incident mild cognitive impairment and dementia in the Mayo Clinic Study of Aging. Clin Neuropsychol, 2013. 27(8): p. 1247-64.
25. Hassenstab, J., Ruvolo, D., Jasielec, M., et al., Absence of practice effects in preclinical Alzheimer’s disease. Neuropsychology, 2015. 29(6): p. 940-8.
26. Duff, K., Beglinger, L.J., Schultz, S.K., et al., Practice effects in the prediction of long-term cognitive outcome in three patient samples: a novel prognostic index. Arch Clin Neuropsychol, 2007. 22(1): p. 15-24.
27. Pihlajamaki, M., O’Keefe, K., O’Brien, J., Blacker, D., and Sperling, R.A., Failure of repetition suppression and memory encoding in aging and Alzheimer’s disease. Brain Imaging Behav, 2011. 5(1): p. 36-44.

GLOBAL ALZHEIMER’S PLATFORM TRIAL READY COHORTS FOR THE PREVENTION OF ALZHEIMER’S DEMENTIA

 

R. Sperling1, J. Cummings2, M. Donohue3, P. Aisen3

 

1. Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA; 2. Cleveland Clinic Lou Ruvo Center for Brain Health, Las Vegas, NV, USA; 3. Alzheimer’s Therapeutic Research Institute (ATRI), Keck School of Medicine, University of Southern California, San Diego, CA, USA

Corresponding Author: Reisa Sperling, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, USA, reisa@rics.bwh.harvard.edu
 
J Prev Alz Dis 2016;3(4):185-187
Published online August 16, 2016, http://dx.doi.org/10.14283/jpad.2016.108

 


 

Introduction

The recent launch of several Alzheimer’s disease (AD) clinical trials targeting the preclinical stage of the disease has highlighted the need for a paradigm shift in prevention trial recruitment. While there are multiple promising mechanisms to test, each new clinical trial can take up to 1-2 years to set up and likely 2-3 years to complete enrollment. Consequently, we are running out of time to outpace the public health epidemic precipitated by the aging of the world’s population. .
The concept of preclinical AD is based on the now widely-accepted observation that amyloid accumulates in the brain for many years prior to the development of symptoms (1, 2).  In 2011, an international working group convened by the National Institute on Aging and the Alzheimer’s Association proposed a conceptual framework and operational research criteria for defining preclinical AD based on presence of amyloidosis, with or without neurodegeneration and subtle cognitive decline (3). In this framework, amyloidosis could be assessed using positron emission tomography (PET) imaging or cerebrospinal fluid (CSF) analysis (low CSF Aβ1-42).  A similar framework, but with a somewhat different lexicon was proposed by the International Working Group for New Research Criteria for the Diagnosis of AD, which has a similar concept of “preclinical AD,” but defines presymptomatic as individuals with autosomal dominant genetic risk, and refers to biomarker-positive individuals as “at-risk” (4).  
Since these criteria were first introduced, the concept of preclinical AD has evolved as increasing evidence has supported the hypothetical temporal evolution of AD biomarkers and clinical symptoms (5). Studies in populations with autosomal dominant forms of AD (ADAD), in particular, have suggested that disease markers can be detected in a predictable order prior to the expected onset of symptoms: changes in the CSF levels of amyloid 25 years before expected onset; amyloid deposition assessed using PET imaging 15 years before expected onset, and impaired episodic memory 10 years before expected onset (6).

 

The challenge of identifying and recruiting participants for secondary prevention trials

This evolving understanding of the earliest stages in the AD continuum have spawned secondary prevention trials in both genetic-at-risk and amyloid-at-risk cohorts, defined by an absence of clinically detectable impairment but the presence of either 1) a deterministic genetic mutation that confers near certainty of developing AD, or 2) biomarker evidence that amyloid has begun to accumulate in the brain. Identifying individuals who fit into these two categories has thus become a major challenge for those planning such trials.    
The Global Alzheimer’s Platform (GAP) was established in 2013 as a collaboration of the Global CEO initiative on Alzheimer’s disease (CEOi) and the New York Academy of Sciences (NYAS). In parallel with initiatives in Europe, Canada, and Japan, GAP aims to coalesce the special expertise and infrastructure needed to accelerate clinical trials across all stages of AD, including preclinical stages. GAP comprises several components, including GAP-NET to support site infrastructure with pre-certifications, master contracts, and a centralized IRB (7); and GAP Trial Ready Cohorts for Preclinical and Prodromal Alzheimer’s Dementia (GAP TRC-PAD).
The goal of GAP TRC-PAD is to build an efficient and sustainable recruitment system for upcoming secondary prevention trials (Figure 1). Drawing from existing registries and studies, including the Brain Health Registry (BHR), Alzheimer’s Prevention Registry (APR), the Cleveland Clinic’s Healthybrains.org (8), and the Imaging Dementia-Evidence for Amyloid Scanning (IDEAS) study, non-demented individuals over the age of 60 who are interested in participating in clinical trials will be invited to join the GAP Registry. Those who sign an electronic informed consent, will be asked to submit data on demographics, family, medical, and lifestyle history, and cognitive function.   

 

Figure 1. Structure of GAP TRC-PAD

 

GAP TRC-PAD set as its initial goal to identify a large number of potential participants, rapidly screen these individuals using an adaptive risk algorithm, and ultimately identify 1000 preclinical and 1000 prodromal participants as a “Trial Ready Cohort” for the first GAP clinical trials.  In addition, GAP seeks to develop and validate web-based cognitive and functional assessments for use in future trials.

 

Predicting amyloid status

Amyloid status can be determined by CSF studies or PET imaging, and in cognitively healthy controls, low baseline CSF Aβ1-42 was shown to be associated with future Aβ positivity (9). However, screening large numbers of people with these tests would be prohibitively expensive and not feasible from a pragmatic point of view. Thus, investigators have identified other inexpensive and non-invasive measures that are predictive of amyloid status. For example, in a population-based study of cognitive normal elderly, age and APOE genotype were shown to be predictive of amyloid accumulation (10); and in a study of clinically normal older individuals, subjective cognitive concerns (SCC) were shown to be predictive of Aβ positivity (11). APOEε4 genotype has also been linked to high amyloid burden in the Australian Imaging, Biomarkers and Lifestyle (AIBL) study of aging. Among cognitively healthy individuals, APOEε4 carriers were more than twice as likely to have positive amyloid PET scans compared to non-carriers (12). More recently, results from the AIBL study showed that high amyloid burden was associated with older age, subjective memory complaints, and APOEε4 genotype (13).  Another novel “measure” that may be predictive of amyloid positivity is lack of practice effects on cognitive testing. In a preliminary study, higher uptake of 18F-flutemetamol on PET imaging was five times higher in individuals with low practice effects on a delayed recall memory task compared to those with high practice effects (14).   
Data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) also showed that among cognitively normal older adults with significant memory concern, elevated amyloid deposition and abnormal CSF biomarker were strongly associated with APOEε4 carrier status (15). Aβ-positive participants in ADNI and AIBL also show more decline in cognition compared to Aβ-negative participants as measured by the ADCS Preclinical Alzheimer’s Cognitive Composite (ADCS-PACC), especially among APOEε4 carriers (16).  
Working with ADNI data, investigators at the University of Southern California’s Alzheimer’s Therapeutic Research Institute (ATRI) developed an adaptive algorithm to predict amyloid status based on APOE carriage status, baseline scores on the ADCS-PACC, age, and family history.  Preliminary results suggest that this algorithm, applied to populations of cognitively normal (pre-clinical) and non-demented but cognitively impaired (prodromal) potential clinical trial participants, will enable identification of those likely to be amyloid positive. Amyloid testing on only these pre-selected groups should then reduce the number of screen fails, expedite the enrollment process, and reduce the overall costs of a study. A larger study is planned in 2016 to confirm these findings.   
As more is learned about algorithmic function, it will be possible to adjust algorithms to yield specific populations of interest.  For example, APOE 4 carriage may be highly influential in current amyloid lowering strategies but some drugs may have genotype-specific effects or side effects and recruitment of both e4 carriers and non-carriers may be important.  Algorithm adjustment will be necessary to achieve this.  Other biomarkers might also play a greater role in algorithms that currently conceptualized, including Tau PET imaging which may be more useful in staging individuals along the preclinical/prodromal progression.  
Some interventions might be most effective prior to substantial tangle accumulation or cerebral atrophy, suggestive of neuronal loss, has begun whereas other interventions might have a greater effect after the onset of atrophy when inflammation and tau-related cell death may have a greater role.  Integrating Tau PET and magnetic resonance imaging (MRI) into the algorithm might assist in identifying varying pathologies in early populations that can be paired with different mechanisms of action of test therapies.
 A greater range of data — from sleep measured to “low friction” assessments such as amount of cell phone use, to higher friction measures such as success in on-line games — might be integrated into future algorithms to identify patients in early phases of disease or to more fully characterize the range of abnormalities exhibited by minimally cognitively affected individuals.  

 

Conclusions

Registries that capture large numbers of cognitively normal potential clinical trial participants will be essential to enable testing of interventions designed for secondary prevention. Equally important will be a means for quickly and accurately identifying individuals within the registry who meet the requirements of a particular study. The GAP Registry and GAP TRC-PAD are designed specifically to meet these needs, and in combination with the infrastructure developed by GAP-NET, should provide the integrated platform necessary for efficient clinical trials not only in the pre-dementia space, but across all disease stages.
The Anti-Amyloid Treatment in Asymptomatic Alzheimer’s Disease (A4) Study is the first prevention trial targeting individuals at risk for AD based on evidence of brain amyloid accumulation (17). Using the adaptive algorithm developed as described in this paper, we anticipate being able to select from an initial registry of approximately 50,000 individuals, bringing in a subset for further screening with PET or other measures, ultimately yielding a trial ready cohort of 1000 preclinical and 1000 prodromal patients, thus reducing the time required to fully enroll participants for the first GAP trials. While the A4 Study is designed with the goal of showing efficacy of an anti-amyloid agent to prevent the cognitive decline due to AD, equally important will be the additional information we will acquire to inform future prevention trials regarding expediting enrollment and developing more sensitive endpoints, creating population-appropriate outcome measures, and novel biomarkers, including theragnostic markers to track therapeutic responses.  

 

Disclosures: Reisa Sperling has served as a consultant for Abbvie, Biogen, Bracket, Genentech, Lundbeck, Roche, and Sanofi. She has served as a co-investigator for Avid, Eli Lilly, and Janssen Alzheimer Immunotherapy clinical trials. She has spoken at symposia sponsored by Eli Lilly, Biogen, and Janssen. R. Sperling receives research support from Janssen Pharmaceuticals, and Eli Lilly and Co.. She also receives research support from the following grants: P01 AG036694, U01 AG032438, U01 AG024904, R01 AG037497, R01 AG034556, K24 AG035007, P50 AG005134, U19 AG010483, R01 AG027435, Fidelity Biosciences, Harvard NeuroDiscovery Center, and the Alzheimer’s Association. Paul Aisen has served as a consultant to the following companies:  NeuroPhage, Elan, Eisai, Bristol-Myers Squibb, Eli Lilly, Merck, Roche, Amgen, Genentech, Abbott, Pfizer, Novartis, AstraZeneca, Janssen, Medivation, Ichor, Lundbeck, Biogen, iPerian, Probiodrug, Anavex, Abbvie, Janssen, Cohbar.  Dr. Aisen receives research support from Eli Lilly, the Alzheimer’s Association and the NIH [NIA U01-AG10483 (PI), NIA U01-AG024904 (Coordinating Center Director), NIA R01-AG030048 (PI), and R01-AG16381 (Co-I)]. Jeffrey Cummings has received in kind research support from Avid Radiopharmaceuticals and Teva Pharmaceuticals. He has provided consultation to AbbVie, Acadia, ADAMAS, Alzheon, Anavex, AstraZeneca, Avanir, Biogen-Idec, Biotie, Boehinger-Ingelheim, Chase, Eisai, Forum, Genentech, Intracellular Therapies, Lilly, Lundbeck, Merck, Neurotrope, Novartis, Nutricia, Otsuka, Pfizer, Prana, QR Pharma, Resverlogix, Roche, Suven, Takeda, and Toyoma companies. He has provided consultation to GE Healthcare and MedAvante and owns stock in ADAMAS, Prana, Sonexa, MedAvante, Neurotrax, and Neurokos. Dr. Cummings owns the copyright of the Neuropsychiatric Inventory.

Acknowledgments: The authors wish to acknowledge the invaluable contributions of colleagues at the Harvard Aging Brain Study at Massachusetts General Hospital, the Center for Alzheimer Research and Treatment at the Brigham and Women’s Hospital, the Cleveland Clinic Low Ruvo Center for Brain Health, and the Alzheimer Therapeutic Research Institute at University of Southern California Keck School of Medicine. The authors also wish to thank Lisa Bain for assistance with the manuscript preparation.

Conflict of interest: None.

 

References

1.    Jack CR, Jr., Knopman DS, Jagust WJ, Shaw LM, Aisen PS, Weiner MW, et al. Hypothetical model of dynamic biomarkers of the Alzheimer’s pathological cascade. Lancet Neurol. 2010;9(1):119-28.
2.    Morris JC, Price JL. Pathologic correlates of nondemented aging, mild cognitive impairment, and early-stage Alzheimer’s disease. Journal of molecular neuroscience : MN. 2001;17(2):101-18.
3.    Sperling RA, Aisen PS, Beckett LA, Bennett DA, Craft S, Fagan AM, et al.Toward defining the preclinical stages of Alzheimer’s disease: Recommendations from the National Institute on Aging and the Alzheimer’s Association workgroup. Alzheimers Dement. 2011;7(3):280-92.
4.    Dubois B, Feldman HH, Jacova C, Hampel H, Molinuevo JL, Blennow K, et al. Advancing research diagnostic criteria for Alzheimer’s disease: the IWG-2 criteria. Lancet Neurol. 2014;13(6):614-29.
5.    Jack CR, Jr., Knopman DS, Jagust WJ, Petersen RC, Weiner MW, Aisen PS, et al. Tracking pathophysiological processes in Alzheimer’s disease: an updated hypothetical model of dynamic biomarkers. Lancet Neurol. 2013;12(2):207-16.
6.    Bateman RJ, Xiong C, Benzinger TLS, Fagan AM, Goate A, Fox NC, et al. Clinical, cognitive, and biomarker changes in the Dominantly Inherited Alzheimer Network. N Engl J Med. 2012;367(9):795-804.
7.    Cummings J, Aisen P, Barton R, Bork J, Doody R, Dwyer J, et al. Re-Engineering Alzheimer Clinical Trials: Global Alzheimer’s Platform Network. J Prev Alz Dis. 2016;in press
8.    Zhong K, Cummings J. Healthbrains.org: Cleveland Clinic’s Push-Pull Approach to Trial Registration. J Prev Alz Dis. 2016;in press.
9.    Mattsson N, Insel PS, Donohue M, Jagust W, Sperling R, Aisen P, et al. Predicting Reduction of Cerebrospinal Fluid beta-Amyloid 42 in Cognitively Healthy Controls. JAMA Neurol. 2015;72(5):554-60.
10.    Mielke MM, Wiste HJ, Weigand SD, Knopman DS, Lowe VJ, Roberts RO, et al. Indicators of amyloid burden in a population-based study of cognitively normal elderly. Neurology. 2012;79(15):1570-7.
11.    Amariglio RE, Mormino EC, Pietras AC, Marshall GA, Vannini P, Johnson KA, et al. Subjective cognitive concerns, amyloid-beta, and neurodegeneration in clinically normal elderly. Neurology. 2015;85(1):56-62.
12.    Rowe CC, Ellis KA, Rimajova M, Bourgeat P, Pike KE, Jones G, et al. Amyloid imaging results from the Australian Imaging, Biomarkers and Lifestyle (AIBL) study of aging. Neurobiol Aging. 2010;31(8):1275-83.
13.    Zwan MD, Villemagne VL, Dore V, Buckley R, Bourgeat P, Veljanoski R, et al. Subjective Memory Complaints in APOEvarepsilon4 Carriers are Associated with High Amyloid-beta Burden. J Alzheimers Dis. 2015.
14.    Duff K, Foster NL, Hoffman JM. Practice effects and amyloid deposition: preliminary data on a method for enriching samples in clinical trials. Alzheimer Dis Assoc Disord. 2014;28(3):247-52.
15.    Risacher SL, Kim S, Nho K, Foroud T, Shen L, Petersen RC, et al. APOE effect on Alzheimer’s disease biomarkers in older adults with significant memory concern. Alzheimers Dement. 2015;11(12):1417-29.
16.    Donohue MC, Sperling RA, Salmon DP, Rentz DM, Raman R, Thomas RG, et al. The preclinical Alzheimer cognitive composite: measuring amyloid-related decline. JAMA Neurol. 2014;71(8):961-70.
17.    Sperling RA, Rentz DM, Johnson KA, Karlawish J, Donohue M, Salmon DP, et al. The A4 Study: Stopping AD Before Symptoms Begin? Sci Transl Med. 2014;6(228):228fs13.

ESTABLISHING CLINICAL RELEVANCE IN PRECLINICAL ALZHEIMER’S DISEASE

 

R.A. Sperling, R.E. Amariglio, G.A. Marshall, D.M. Rentz

 

Center for Alzheimer Research and Treatment, Brigham and Women’s Hospital and Massachusetts General Hospital, Harvard Medical School

Corresponding Author: Reisa A. Sperling, Harvard Medical School, Brigham and Women’s Hospital and Massachusetts General Hospital, Boston, USA, reisa@rics.bwh.harvard.edu

J Prev Alz Dis 2015;2(2):85-87
Published online April 7, 2015, http://dx.doi.org/10.14283/jpad.2015.56

 


 

Converging evidence suggests that the pathophysiologic processes underlying Alzheimer’s disease (AD) begin more than a decade prior to dementia (1, 2), starting with increased amyloid deposition followed by neurodegeneration and brain atrophy (3). Moreover, consensus is emerging that our best opportunity for intervention is likely prior to widespread neurodegeneration. For an anti-amyloid treatment to be effective, this may mean it will be necessary to treat the disease before symptoms appear. However, since by definition, the preclinical AD population does not have clinically detectable functional deficits, assessing the benefits of treatment becomes problematic.          

Although cognition may be the earliest clinically relevant marker of disease progression, the temporal lag between the accumulation of AD pathology, cognitive decline and functional impairment remains to be elucidated. Evidence indicates that early cognitive decline is followed closely by subtle functional impairments in high-level everyday activities, suggesting that more sensitive measures of cognition and function may be useful as the best markers of early change.

Assessing cognition and function in early stage disease

Connecting the dots between current cognitive and functional tests and clinically relevant markers of disease progression may be accomplished by a combination of: 1) updating traditional measures of function, 2) including more subjective measures of cognitive function, 3) developing performance-based functional measures, and 4) making cognitive tests more clinically relevant.

Several scales have been developed to assess activities of daily living (ADL) and instrumental ADLs (IADL) in clinical trials (4).  In comparison to basic ADLs such as feeding, toileting, grooming, and bathing, IADLs represent more complex activities, such as managing finances and handling medications, which tap into cognitive abilities. Basic ADLs are typically impaired in moderate-to-severe AD, whereas decline in the ability to perform IADLs may occur at the MCI stage or earlier (5). A widely used IADL scale developed in 1997(6) has recently been updated by the Alzheimer’s Disease Cooperative Study (ADCS IADL) to include items relevant in the 21st century, such as using and remembering passwords, smartphones, and the Internet. Both study participants and partners may be assessed using IADL scales.

Subjective memory concerns have also been used to assess subtle cognitive decline in the earliest stages of the disease, and studies have demonstrated an association between amyloid burden and subjective cognitive complaints (SCC) among normal elderly (7, 8). A number of measures can be used to assess subjective cognitive concerns. For example, the ADCS Cognitive Function Instrument (CFI) includes both self-report and partner rating scales. The CFI is simple to administer, asking subjects and partners to compare current cognitive abilities to one year ago. For example, participants are asked “Compared to one year ago, do you have more difficulty managing money (e.g., paying bills, calculating change, completing tax forms)?” Participants may check either yes, no, or maybe; or in some instances “does not apply”, and a score is calculated from these answers.

In a recent study, Amariglio et al (9) compared longitudinal scores on the CFI between “CDR progressors”, i.e., subjects who progressed from CDR 0 to CDR 0.5 or 1.0, and “CDR stable”, i.e., those who did not progress. CFI scores of CDR progressors separated early from the scores of CDR stable subjects and continued to increase over the four year time frame of the study, while scores of CDR stable subjects remained essentially unchanged. In the same study, the investigators showed that the CFI was able to differentiate subjects who were APOEε4 carriers from those who were not.  Carriage of the ApoEε4 allele confers a substantially higher risk of developing sporadic, late-onset AD as well as an earlier age of onset (10). The study also examined the correlation between subjective CFI scores and objective cognitive testing, demonstrating that self and partner subjective ratings change at different rates. Initially, study partners’ratings are somewhat less correlated with the participants actual performance, but as the disease progresses and cognition worsens, CFI scores from the study partners’ start to catch up. These changes may reflect the fact that individuals in the later stages of disease experience anosognosia, or lack of awareness of their impairments. The combination of self and partner ratings therefore appears to correlate better with the cognitive testing than either alone.

Is it clinically relevant?

Regulatory agencies require demonstration of the clinical relevance of outcome assessments. While some of the most widely used neuropsychological tests may indeed reflect neurological processes that are changing over the course of the disease, it remains a challenge to capture these early changes using clinically relevant tests. For example, connecting the dots on trail making tests may assess multiple cognitive processes such as attention, visual search, psychomotor speed, and planning (11), the relevance of this to an individual’s everyday functional capacity has not been demonstrated. It may be that the best way to capture early cognitive decline with a clinically relevant scale is to employ performance-based functional assessments. Our group recently described a new performance based measure involving  three high-level, real-life automated phone menu tasks frequently encountered by seniors: calling a pharmacy to refill a prescription, calling a health insurance company to select a new primary care provider, and making a bank transfer in order to have enough money to pay taxes (12).  Performance on these tasks — both time required to complete the task and number of errors –was compared between clinically normal elderly and patients with MCI. For both of these measures, statistically significant differences were seen between the two groups. In addition, a subgroup of these subjects underwent magnetic resonance imaging (MRI), which showed that impaired performance was associated with inferior temporal cortical thinning in clinically normal subjects. This finding aligns with other studies suggesting that cortical thinning may be one of the earliest biomarker correlates of cognitive and functional decline (13, 14), and further supports the use of sensitive performance measures for the assessment of decline in the earliest stages of the disease.

In terms of making cognitive tests more relevant, tests of memory problems encountered in real life, such as remembering names and faces and pattern separation (ability to distinguish among very similar stimuli) may be especially sensitive in an aging population and particularly in the presence of amyloidosis (15). Moreover, for use as assessments to detect drug effects in clinical trials, repeated measures will be needed. Thus, computerized tests that provide frequent serial assessments may be especially useful. For use in the Dominantly Inherited Alzheimer’s Network (DIAN) and Anti-Amyloid Treatment in Asymptomatic Alzheimer’s  (A4) studies, we have developed an iPad version of a computerized cognitive composite – CogState “plus” (16). This composite combines the CogState brief battery, which assesses reaction time, working memory, and incidental learning, along with face-name associative memory and behavioral pattern separation tests.

 We are getting closer to closing the gap between cognitive assessments and clinical relevance, but thorny issues remain with regard to current and planned secondary prevention trials. Longer trials will be needed to demonstrate clinically meaningful change, a consequence of which will be greater attrition. Moreover, functional change using current measures appears to be non-linear, speeding up as an individual approaches the MCI stage of the disease. What that means is that these measures will be challenging to use in delayed start designs. 

The real-life relevance of early functional change also needs to be more clearly demonstrated. Insurance coverage is often linked to demonstrating financial benefit rather than improving quality of life, so it will be crucial to translate early functional impairments to financial consequences, for example demonstrating how errors on filling a prescription result in increased costs from inadequate treatment.

Disclosures: Dr. Sperling  has served as a consultant for Merck, Eisai, Janssen, Boehringer-Ingelheim, Isis, Lundbeck, Roche, and Genetech.  Dr. Amariglio has no disclosures.  Dr. Marshall has served as a consultant Halloran, GliaCure, and Janssen Research & Development. Dr. Rentz has served as a consultant for Eli Lilly, Neurotrack and Lundbeck.

Acknowledgments: The authors are supported by the National Institute on Aging (P01AG036694; R01 AG046396; R01 AG027435; K24 AG035007; K23AG044431, K23 AG033634, and U19 AG10483), Janssen, Eisai Inc., Eli Lilly and Company, and Bristol-Myers-Squibb, Fidelity Biosciences, Alzheimer’s Association (LEARN, NIRG-12-243012) and, and other philanthropic organizations. The authors wish to acknowledge the invaluable contributions of Dr. Paul Aisen, Michael Donohue, and the ADCS, and colleagues at the Harvard Aging Brain Study for contribution to this work.  The authors also wish to thank Lisa Bain for assistance with the manuscript preparation. 

References

1. Rowe CC, Ellis KA, Rimajova M, et al. Amyloid imaging results from the Australian Imaging, Biomarkers and Lifestyle (AIBL) study of aging. Neurobiol Aging. 2010;31(8):1275-83.

2. Bateman RJ, Xiong C, Benzinger TLS, et al. Clinical, cognitive, and biomarker changes in the Dominantly Inherited Alzheimer Network. N Engl J Med. 2012;367(9):795-804.

3. Jack CR, Jr., Knopman DS, Jagust WJ, et al. Tracking pathophysiological processes in Alzheimer’s disease: an updated hypothetical model of dynamic biomarkers. Lancet Neurol. 2013;12(2):207-16.

4. Robert P, Ferris S, Gauthier S, Ihl R, Winblad B, Tennigkeit F. Review of Alzheimer’s disease scales: is there a need for a new multi-domain scale for therapy evaluation in medical practice? Alzheimers Res Ther. 2010;2(4):24.

5. Marshall GA, Amariglio RE, Sperling RA, Rentz DM. Activities of daily living: where do they fit in the diagnosis of Alzheimer’s disease? Neurodegener Dis Manag. 2012;2(5):483-91.

6. Galasko D, Bennett D, Sano M, et al. An inventory to assess activities of daily living for clinical trials in Alzheimer’s disease. The Alzheimer’s Disease Cooperative Study. Alzheimer Dis Assoc Disord. 1997;11 Suppl 2:S33-9.

7. Amariglio RE, Becker JA, Carmasin J, et al. Subjective cognitive complaints and amyloid burden in cognitively normal older individuals. Neuropsychologia. 2012;50(12):2880-6.

8. Perrotin A, Mormino EC, Madison CM, Hayenga AO, Jagust WJ. Subjective cognition and amyloid deposition imaging: a Pittsburgh Compound B positron emission tomography study in normal elderly individuals. Arch Neurol. 2012;69(2):223-9.

9. Amariglio RE, Donohue MC, Marshall GA, et al. Tracking early decline in cognitive function in older individuals at risk for Alzheimer disease dementia: The Alzheimer’s Disease Cooperative Study Cognitive Function Instrument. JAMA Neurol. 2015.

10. Corder EH, Saunders AM, Strittmatter WJ, et al. Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer’s disease in late onset families. Science. 1993;261(5123):921-3.

11. Salthouse TA. What cognitive abilities are involved in trail-making performance? Intelligence. 2011;39(4):222-32.

12. Marshall G, Dekhtyar M, Bruno J, et al. A new performance-based activities of daily living instrument for early Alzheimer’s disease. AAIC. Cophenhagen, Denmark: Alzheimers Dement; 2014:P365.

13. Pereira JB, Svenningsson P, Weintraub D, et al. Initial cognitive decline is associated with cortical thinning in early Parkinson disease. Neurology. 2014;82(22):2017-25.

14. Marshall GA, Lorius N, Locascio JJ, et al. Regional cortical thinning and cerebrospinal biomarkers predict worsening daily functioning across the Alzheimer’s disease spectrum. J Alzheimers Dis. 2014;41(3):719-28.

15. Rentz DM, Amariglio RE, Becker JA, et al. Face-name associative memory performance is related to amyloid burden in normal elderly. Neuropsychologia. 2011;49(9):2776-83.

16. Rentz, D, Parra Rodriguez, M, Amariglio, R, Stern, Y, Sperling, R, Ferris, S. Promising Developments in Neuropsychological Approaches for the Detection of Preclinical Alzheimer’s disease: A selective review.  Alzheimer’s Research & Therapy 2013; 5 (6):58.