jpad journal

AND option

OR option



K.V. Papp1,2, D.M. Rentz1,2, P. Maruff3,4, C.-K. Sun5, R. Raman5, M.C. Donohue5, A. Schembri4, C. Stark6, M.A. Yassa6, A.M. Wessels7, R. Yaari7, K.C. Holdridge7, P.S. Aisen5, R.A. Sperling1,2 on behalf of the A4 Study Team*


1. Department of Neurology, Brigham and Women’s Hospital and Harvard Medical School, Boston, Massachusetts, USA; 2. Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA; 3. The Florey Institute of Neuroscience and Mental Health, University of Melbourne, Parkville, Victoria, Australia; 4. Cogstate, Ltd, Melbourne, Victoria, Australia; 5. Alzheimer Therapeutic Research Institute, Keck School of Medicine, University of Southern California, San Diego, CA, USA; 6. Center for the Neurobiology of Learning and Memory and Department of Neurobiology and Behavior, University of California Irvine, Irvine, California, USA; 7. Eli Lilly and Company, Indianapolis, Indiana, USA; *Full listing of A4 Study team and site personnel available at

Corresponding Author: Kathryn V. Papp, Center for Alzheimer Research and Treatment; 60 Fenwood Road; Boston, MA 02115, Telephone: 617-643-5322; Fax: 857-5461, Email Address:

J Prev Alz Dis 2021;1(8):59-67
Published online June 19, 2020,



Background: Computerized cognitive assessments may improve Alzheimer’s disease (AD) secondary prevention trial efficiency and accuracy. However, they require validation against standard outcomes and relevant biomarkers.
Objective: To assess the feasibility and validity of the tablet-based Computerized Cognitive Composite (C3).
Design: Cross-sectional analysis of cognitive screening data from the A4 study (Anti-Amyloid in Asymptomatic AD).
Setting: Multi-center international study.
Participants: Clinically normal (CN) older adults (65-85; n=4486)
Measurements: Participants underwent florbetapir-Positron Emission Tomography for Aβ+/- classification. They completed the C3 and standard paper and pencil measures included in the Preclinical Alzheimer’s Cognitive Composite (PACC). The C3 combines memory measures sensitive to change over time (Cogstate Brief Battery-One Card Learning) and measures shown to be declining early in AD including pattern separation (Behavioral Pattern Separation Test- Object- Lure Discrimination Index) and associative memory (Face Name Associative Memory Exam- Face-Name Matching). C3 acceptability and completion rates were assessed using qualitative and quantitative methods. C3 performance was explored in relation to Aβ+/- groups (n=1323/3163) and PACC.
Results: C3 was feasible for CN older adults to complete. Rates of incomplete or invalid administrations were extremely low, even in the bottom quartile of cognitive performers (PACC). C3 was moderately correlated with PACC (r=0.39). Aβ+ performed worse on C3 compared with Aβ- [unadjusted Cohen’s d=-0.22 (95%CI: -0.31,-0.13) p<0.001] and at a magnitude comparable to the PACC [d=-0.32 (95%CI: -0.41,-0.23) p<0.001]. Better C3 performance was observed in younger, more educated, and female participants.
Conclusions: These findings provide support for both the feasibility and validity of C3 and computerized cognitive outcomes more generally in AD secondary prevention trials.

Key words: Digital biomarkers, cognition, computerized testing, preclinical Alzheimer’s disease, secondary prevention.



Computerized cognitive assessments have the potential to significantly reduce data administration and scoring errors, site burden, and cost in Alzheimer’s disease (AD) secondary prevention trials as cognitive screening tools and outcome measures. These assessments have yet to replace paper and pencil measures as primary outcomes given several remaining questions: How feasible are computerized assessments in normal older adults and older adults who progress to Mild Cognitive Impairment (MCI) over the course of a trial? How reliable is the data collected? And finally, how valid are computerized cognitive assessments, that is, are they related to gold-standard paper and pencil primary outcomes and AD pathology targeted in a given intervention?
The Anti-Amyloid in Asymptomatic Alzheimer’s (A4) study (1, 2) offers a unique opportunity to address some of these questions by assessing the feasibility and validity of the Computerized Cognitive Composite (C3) in a very large multi-site AD secondary prevention study targeting clinically normal (CN) older adults with elevated cerebral amyloid (2). The C3 is derived using two well-validated memory paradigms from the cognitive neuroscience literature: the Face Name Associative Memory Exam (FNAME) and the Behavioral Pattern Separation Task-Object (BPS-O). It also includes measures from the Cogstate Brief Battery (CBB) which uses playing cards to assess visual memory in addition to reaction time (RT) and working memory and was designed to be sensitive to change over time with randomized alternate forms. The CBB has been studied in relationship to AD neuroimaging markers in several cohort studies of normal older adults (3, 4). Behavioral versions of the FNAME (5, 6) and a modified version of the BPS-O (7) were selected for inclusion in the C3 as they have been shown to elicit aberrant activity in the medial temporal lobes during functional imaging studies in individuals at risk for AD based on biomarkers (8-10). More specifically, these individuals fail to habituate to repeated stimuli (FNAME) or during both correct rejections and false alarms (BPS-O), neural signatures consonant with successful memory formation. The C3 was identified a-priori to include one primary memory outcome from each component measure including: the BPS-O lure discrimination index, Face-Name Matching accuracy, and One-Card Learning accuracy.
The aim of this study was to assess the feasibility and validity of the C3 in CN older adults participating in a secondary prevention trial. Specific goals included determining whether reliable C3 data was consistently captured using a touchscreen tablet and whether data reliability decreased in the lowest cognitive performers. To assess the validity of the C3, we investigated 1) whether the C3 was related to the primary study outcome: performance on traditional paper and pencil measures (i.e., the Preclinical Alzheimer’s Cognitive Composite- PACC) 2) whether the C3 was related to cerebral amyloid (Aβ) and 3) whether the magnitude of this relationship was comparable to that observed between PACC and Aβ+/-. In addition to our main aims, we explored whether improved performance with C3 retesting using alternate forms differentiated between Aβ+/- individuals above and beyond cross-sectional performance. Finally, we explored performance on the constituent tests from the C3 and their relationships with Aβ status, demographic characteristics, and paper and pencil measures. The implications of these findings as they relate to the design and use of future computerized outcomes in secondary prevention trials are discussed.



Participants and Study Design

The A4 Study is a double-blind, placebo-controlled 240-week Phase 3 trial of an anti-Aβ monoclonal antibody in CN older adults with preclinical AD (2) occurring across 67 sites. Participants interested in enrolling in A4 were required to be aged 65 to 85 and were deemed clinically normal (CN) based on Mini Mental Status Exam (MMSE) ranging from 25-30 and Global Clinical Dementia (CDR) Rating Score of 0. During their initial screening visit, participants completed traditional and computerized cognitive testing (detailed further below). Prior to enrollment, they underwent a florbetapir Positron Emission Tomography (PET) for classification of Aβ status (Table 1) at a second visit. On their third visit, all potential participants completed computerized testing and were subsequently provided with results of their AD biomarker imaging and informed about whether they were eligible (Aβ+) or ineligible (Aβ-) to enroll in the trial. The current study includes cognitive screening data at 2 timepoints for Aβ+ and Aβ- individuals.

Table 1. Participant Characteristics by Aβ Status

NOTE. Two-sample t-test with unequal variances were used for continuous variables and Fisher’s Exact test for categorical variables. Values are Mean (Standard Deviation) unless otherwise indicated.


Cognitive Measures

The primary outcome for the A4 Study is performance on the PACC, a multi-domain composite of paper and pencil measures (11). Measures contributing to the C3 are administered on a touchscreen tablet using the Cogstate platform and serve as an exploratory outcome. All participants completed the PACC and C3 at the first screening visit (Visit 1) and an alternate C3 within 90 days (mean=55 days) at the study eligibility visit (Visit 3) prior to study eligibility disclosure.

Paper and Pencil Cognitive Testing: The PACC

The PACC, described in detail elsewhere (11), is calculated as the sum of mean performance across four measures normalized using a z-score including the MMSE (0–30), the WMS-R Logical Memory Delayed Recall (LMDR; 0–25), the Digit-Symbol Coding Test (DSC; 0–93), and the Free and Cued Selective Reminding Test–Free + Total Recall (FCSRT96; 0–96) (2).

Computerized Testing: The C3

Figure 1 provides a schematic of C3 Components: BPS-O, FNAME and the CBB. An examiner is present in the testing room and initially guides administration, but the battery has the potential to be completed largely independently in the context of written on-screen instructions and automatic transitions between tasks (12).


Figure 1. C3 Task Schematic

NOTE. All tasks are completed on a tablet using a touchscreen. Stimuli in gray are not scored.


Behavioral Pattern Separation- Object (BPS-O; more recently termed the Mnemonic Similarity Test)

Participants are presented with images of 40 everyday objects serially and are allotted 5 seconds to determine whether the item is for use “indoors” or “outdoors” to ensure adequate attentiveness to stimuli (7). Participants are subsequently shown 20 of the same items interspersed with both novel images and lure images. They are asked to categorize each image as: Old, Similar, or New within 5 seconds. Accuracy and RT measures are collected. Of interest is the rate at which participants can correctly identify lures as “Similar” rather than as “Old.” The lure discrimination index (LDI) is computed as the proportion of “Similar” responses given to lure items minus the ratio of “Similar” responses given to the foils (the latter is to correct for response bias). The LDI is the primary outcome from the BPS-O task. A higher LDI indicates better pattern separation performance.

Face-Name Associative Memory Exam (FNAME)

Participants are shown 12 face-name pairs presented serially. For each face-name pair, the participant is asked whether the name “fits” or “doesn’t fit” the face to ensure adequate attentiveness to the stimuli. Participants are allowed 5 seconds to respond and are asked to try to remember the face-name pair. Following the learning phase, the CBB tests serve as a 12 to 15-minute delay. Subsequently, there are three measures of memory including face recognition (FSBT), first letter name recall (FNLT) and face-name matching (FNMT). In FSBT, participants are asked to identify the previously learned faces, presented alongside two distractor faces of matching age, race, and sex. The target face is subsequently presented with a touchscreen keyboard and the participant selects the first letter of the name paired with that face (FNLT). Finally, the target face is presented with three names (target name, a re-paired same-sex name, and an age and sex-matched foil name) and the participant must select the correct name (FNMT). Accuracy for each component is scored /12 with FNMT number of correct matches serving as the primary outcome of interest.

Cogstate Brief Battery (CBB)

The CBB (13, 14) uses playing cards as stimuli and includes a measure of attention (Detection-DET),reaction time RT (Identification-IDN), working memory (One-Back Test-ONB), and visual memory (One-Card Learning-OCL). Measures of RT and accuracy are recorded. To address skewness, a log10 transformation is applied to RT measures and an arcsin sqrt transformation is applied to accuracy measures. In DET, participants are required to tap ‘Yes’ as quickly as possible in response to a stimulus card turning face-up. The task continues until 35 correct trials are recorded. The outcome is RT. In IDN, a participant must select whether the card is red or not red; thirty correct trials are required. RT is the primary outcome for IDN; IDN accuracy was also examined. In ONB, participants must indicate “yes” or “no” whether the current card is equivalent to the previously seen card. In OCL, participants must learn a series of playing cards by responding ‘yes’ or ‘no’ to whether the card has been previously seen in the task. For ONB and OCL, both RT and accuracy are computed. Here, we examined RT and Accuracy for both IDN and ONB. We examined only RT for DET and only Accuracy for OCL.

The C3

Constituents of the C3 were identified a-priori and include one primary memory outcome from each measure including the BPS-O LDI, FNMT, and OCL. The C3 is computed as the average of these z-scored outcomes derived from the study population at Visit 1.

Data Quality

Data from individual C3 measures were included in analyses if they met pre-specified task-specific completion checks (Supplementary Table 1). For example, OCL for a given participant is included in analyses if the participant responds in ≥75% of trials. Study rater comments were also reviewed to better determine C3 usability and acceptability.

Amyloid PET Imaging

Eligible participants completed a florbetapir PET scan at Visit 2. Scan acquisition occurred over 50-70 minutes following an injection of 10mCi of florbetapir-F18. Aβ binding was assessed using mean standardized uptake value ratio (SUVr) with whole cerebellar gray as a reference region. Participants were deemed eligible (Aβ+) versus not eligible (Aβ-) using an algorithm combining both quantitative SUVr (>1.15) information and a centrally-determined visual read (2).

Statistical Analyses

Primary analyses were performed on the C3 at Visit 1. To assess C3 feasibility and data validity, test completion rates and performance checks were computed (Supplementary Table 1) and rates subsequently compared between Aβ+/- groups using Chi-square tests. Rater comments were systematically reviewed and observations by raters were grouped into categories (e.g., technical issue, interruptions) and the frequency of observations made in each category were computed. To infer C3 feasibility and data validity in those who may develop impairment over the course of the A4 study, we compared test completion rates and performance checks between the lowest cognitive performers (bottom quartile on PACC) with typical cognitive performers using chi square tests.
Demographic differences between Aβ+/- groups were assessed using Welch’s two-sample t-tests for continuous variables and Fisher’s Exact test for categorical variables (e.g., age, APOE). Linear models were fit to compare cognitive performance across males and females. Linear models were fit to compare cognitive performance across Aβ+/- while adjusting for covariates: age, sex, and education. Effect size was computed as a Cohen’s d (mean difference between Aβ+ and Aβ- groups divided by the pooled standard deviation) with 0.01 representing a “very small” effect, 0.20 representing a “small” effect, and 0.5 representing a “medium” effect (15). Comparable linear models were performed and effect sizes calculated for individual C3 components to examine Aβ+/- group differences on individual C3 measures (e.g., OCL, ONB, BPS-O). No adjustments were made for multiple comparisons; however, results are reported as point estimates and 95% confidence intervals.
Differences in performance between Visit 1 and Visit 3 were examined using linear models of difference scores with Aβ status, age, sex, and education as covariates.
Pearson correlation coefficients were computed to assess the relationships between C3 and demographic characteristics as well as C3 and the PACC. Pearson correlation coefficients were similarly used to assess the relationships among C3 components and PACC components to assess the convergent and discriminant validity between memory versus non-memory tasks on C3 versus PACC.
Linear models were also fit to compare cognitive performance between ε4+/- while adjusting for covariates: age, sex, and education.
All analyses were conducted using R version 3.6.1 (



Feasibility of the C3

Completion and performance checks were met in >98% of individual test administrations within the C3 (Supplementary Table 1) and equivalent by Aβ+/. Raters reported issues in approximately 4% of C3 administrations. The most commonly reported problem (reflecting 0.7% of administrations) was that the tablet was insufficiently responsive to a participant’s finger taps and/or the participant was mis-tapping by either hovering their fingers too closely to the screen or by tapping too quickly. The second most commonly reported issue (0.5% of administrations) was overly deliberative responding on BPS-O and FNAME causing items to time-out. This was followed by non-specific technical issues (e.g., frozen program, interruptions from low battery signal or software update, glitches such as stimulus not loading or items auto-proceeding). Report of confusion with task instructions was very low (reported in 0.3% of administrations). Participants most commonly had difficulty understanding instructions for ONB and OCL; additionally, some reported confusion regarding the goal of the judgment component of BPS-O and FNAME learning components (i.e., indoor vs. outdoor, fits vs. doesn’t fit). Despite this, few participants (<3%) failed to make an “indoor/outdoor” or a “fits” judgment on more than 3 items. Participants refused to continue C3 testing in <0.002% of administrations with the most common reasons including frustration and fatigue.

Predictions for the Feasibility of the C3 Longitudinally

To preliminarily estimate whether the C3 (to be completed at 6-month intervals for the A4 study duration) will remain feasible in participants experiencing cognitive decline, we examined C3 performance in the lowest cognitive performers on PACC. The magnitude of the C3 Aβ group difference increased by a factor of 5.2 when restricting the Aβ+ group to the bottom quartile of PACC [adjusted cohen’s d=-0.57 (95%CI:-0.68, -0.45) p<0.001], however, no significant changes in rates of performance completion and performance checks were observed.

Demographic and Clinical Characteristics

Aβ+ were older compared with Aβ- (Table 1). There were no group differences for sex or education level. Aβ+ exhibited a higher rate of ε4 positivity and higher proportion of Caucasians compared with Aβ-.

C3 Performance

Aβ+ performed worse on the C3 compared with Aβ- (unadjusted d=-0.22, adjusted d=-0.11), mirroring the Aβ+/- performance difference on the PACC (unadjusted d=-0.32, adjusted d=-0.18) (Figure 2; Table 2). Importantly, the majority of participants were performing in the normal range, with performance in Aβ+ on average only -0.08 standard deviations below the mean. In addition to Aβ positivity (Beta=-0.07 p=0.002), older age (Beta= -0.04 p<0.0001), less education (Beta= 0.03 p<0.0001), and male sex (Beta=-0.10 p<0.0001) contributed to overall worse C3 performance. Models adjusted for demographic features generally resulted in smaller Aβ+/- effect sizes compared with unadjusted models (Figure 2). For example, there was 66% decrease in effect size between the unadjusted (d=-0.22) and adjusted C3 (d=-0.11). C3 and PACC were moderately correlated (r=0.39, p<0.001). However, both contributed unique explanatory variance about Aβ+/- status when modeled together (Supplementary Table 2 Model A).
Improved performance at re-testing was observed for C3 with an average increase of 0.25 standard deviations between visits (Beta=0.25, p<0.0001). However, there was no relationship between Aβ status and differential improvement on C3 re-testing (Beta= 0.00, p=0.961). Importantly, Aβ+ continued to perform worse on the C3 compared with Aβ- and this group difference was at a comparable magnitude as compared with initial testing (re-testing cohen’s d=-0.21, p<0.0001).

Table 2. Group Differences Between Aβ+ versus Aβ- on C3 at Screening Visit 1

Note. M=mean, SD=standard deviation; PACC=Preclinical Alzheimer’s Cognitive Composite; C3= Computerized Cognitive Composite; BPS-O= Behavioral Pattern Separation Task-Object; LDI=Lure Discrimination Index; FNAME=Face-Name Associative Memory Exam; FNLT=1st letter Name Recall; FNMT=Face-Name Matching; FSBT=Facial Recognition; CBB=Cogstate Brief Battery; RT=reaction time; Acc=Accuracy; DET=Detection; IDN=Identification; ONB=One-Back Test; OCL=One-Card Learning.



Figure 2. Covariate-Unadjusted and Adjusted Group Differences (Effect Sizes: Cohen’s d) Between Aβ+/Aβ- Groups at Screening Visit 1

Note. Smaller effect size (Cohen’s d) is associated with worse performance in Aβ+ (n=1323) relative to Aβ- (n=3163). Top (unadjusted) and bottom (covariate-adjusted). PACC=Preclinical Alzheimer’s Cognitive Composite; C3= Computerized Cognitive Composite; FNAME=Face-Name Associative Memory Exam; CBB=Cogstate Brief Battery; RT=reaction time; Acc=Accuracy


Individual C3 Components

Individual C3 components which showed statistically significant differences between groups were BPS-O LDI, FNAME FNMT, CBB IDN accuracy, ONB accuracy and RT, and OCL accuracy. When adjusting for demographics, FNAME FNMT and ONB RT were no longer significant. Interestingly, for IDN RT, Aβ+ exhibited a statistical trend towards unexpectedly faster RT compared with Aβ- (adjusted d=-0.06, p=0.055). Despite a trend towards being slightly faster, Aβ+ were less accurate for IDN compared with Aβ- (unadjusted d=-0.25, adjusted d=-0.14). IDN Accuracy was correlated with IDN RT (r= -0.30, p<0.001) such that generally faster RT for correct responses was associated with reduced overall accuracy. However, when both IDN Accuracy and IDN RT were incorporated into the sample model to predict Aβ status, only reduced IDN Accuracy was a significant predictor (Supplementary Table 2 Model B).

Correlations Among C3 Components, Demographics, PACC


Greater age was associated with worse performance across all C3 outcomes (Table 3). This association was strongest for the overall C3 Composite (r=-0.29, p<0.001). Age was least associated with RT tasks including DET (r=-0.13, p<0.001) and IDN (r=-0.11, p<0.001).

Table 3. Pearson correlation coefficients (r) Among C3 Components and Demographics

Note. Higher value represents better performance. PACC=Preclinical Alzheimer Cognitive Composite; C3= Computerized Cognitive Composite; BPS-O= Behavioral Pattern Separation Task-Object; LDI=Lure Discrimination Index; FNAME=Face-Name Associative Memory Exam; FNLT=1st letter Name Recall; FNMT=Face-Name Matching; FSBT=Facial Recognition; CBB=Cogstate Brief Battery; RT=reaction time; Acc=Accuracy; DET=Detection; IDN=Identification; ONB=One-Back Test; OCL=One-Card Learning; FCSRT=Free and Cued Selective Reminding Test; DSST=Digit Symbol Substitution Test



Higher education was associated with better performance on all individual C3 outcomes, with the largest impact on OCL accuracy (r= 0.13, p<0.001) followed by the overall C3 (r=0.12, p<0.001). The only exception was ONB RT where faster performance was associated with lower education.


Women outperformed men on all components of FNAME including FNLT (d= -0.46, p<0.0001), FNMT (d= -0.36, p<0.0001), and FSBT (d= -0.39, p<0.0001). Women also outperformed men on IDN Accuracy (d= -0.16, p<0.0001) and ONB Accuracy (d=-0.08, p=0.019). Interestingly, however, men outperformed women on DET (d= -0.23, p<0.0001) and ONB RT (d= -0.12, p<0.001). Performance between the sexes was comparable for BPS-O, IDN RT, and OCL Accuracy.
On OCL, Aβ+ females did not perform differently compared with Aβ- females [Estimate=-0.00 (0.01), p=0.468]. However, Aβ- males performed worse compared with Aβ+ males [Estimate=-0.02 (0.01), p=0.0006]. This suggests that OCL captures subtle decrements in memory between Aβ+/- men but not women. A non-significant statistical trend toward the same pattern was observed in BPS-O.

PACC and C3

Correlations among components of the 2 composites tended to be more strongly-related in a domain-specific manner providing support for convergent and discriminant validity (Table 3). For example, DET and IDN were correlated with DSST at r=0.26 and 0.31, respectively while not being significantly related to memory components of the PACC (FCSRT, Story Memory) or MMSE.

The C3 and APOE Status

There was no difference in performance between APOEε4 carriers vs. non-carriers on the C3 [adjusted d= -0.03 (95% CI: -0.09, 0.03), p=0.379] or on individual C3 outcomes (not shown). The model for carrier vs. non-carrier group differences did not improve with the removal of demographic covariates in contrast with models for Aβ+/- [unadjusted d= 0.03 (95% CI: -0.05, 0.10), p=0.470]. Finally, we did not observe an interaction between E4 and Aβ status on the C3.



Among a large sample of CN older adults screening for an AD secondary prevention trial, assessment of cognition using a tablet-based measure (C3) was feasible. Diminished C3 performance was associated with worse PACC performance and elevated Aβ. Although the magnitude of the Aβ+/- group difference was statistically small (d= -0.11, once adjusted for covariates) it was comparable to that observed on well-established and clinically meaningful paper and pencil measures included in the primary outcome, i.e., the PACC (d= -0.18). Performance on the C3 was also reliable, with an equal Aβ+/- group effect on the C3 at retesting within 90 days. More broadly, these findings suggest that computerized testing has the potential to replace traditional paper and pencil primary outcomes in future trials- representing a potential shift in clinical trial cognitive assessment methodology. Additionally, these results further confirm the small but consistent association between Aβ burden and cognition cross-sectionally within a CN population.

Usability/Acceptability of the C3

The very low rates of incomplete and/or invalid administrations for the C3 battery indicate that in the older adults assessed, even those with little computer literacy, the supervised tablet-based cognitive testing has high acceptability. Rates of completion and performance check failures remained low in a subset of low PACC performers, providing early evidence for C3 feasibility longitudinally as some participants show progressive cognitive decrements over the course of the study. Study procedures required a rater to supervise C3 testing, however, raters noted that many participants did not require significant assistance after completing the first few measures. This was further evidence by improved performance on re-testing as participants gained familiarity with the device and tasks. Future trials may consider further optimizing computerized tasks to be self-guided to reduce rater training and time. Potential barriers to tablet-based testing were infrequent, largely addressable, and unlikely to systematically affect performance on the C3. These included inexperience with tablets leading both to mis-tapping and difficulty registering finger taps. Many older adults emphasized accuracy over speed during learning trials, resulting in time-outs. Several of these issues can be addressed with modifications to instructions and design (e.g., including a timer indicator) while others will diminish over time with secular trends toward increased familiarity with digital technology.

The C3 Composite and Individual C3 Measures by Aβ+/-

Components of C3 tests which differed between Aβ+/- groups were primarily in memory (BPS-O; OCL) but also included working memory (ONB). The difference in pattern separation memory performance between Aβ+/- participants extends previous fMRI works showing an association between AD biomarkers (including Aβ -PET) and aberrant fMRI activity during learning on a pattern separation task in normal older adults (9) to a difference in frank performance. The BPS-O (10) was designed in part to capture a weakened “novelty signal”, that is, a reduced ability to correctly discriminate between stimuli that are similar but not identical to previously encountered targets. This tendency to misidentify similar lures as targets has been conceptualized as an error in pattern separation (16). Aβ group differences were also observed on face-name memory but this effect was significantly attenuated when controlling for demographic features. In contrast with other C3 memory measures (OCL Accuracy and BPS-O) there was a significant sex effect whereby women generally performed better on all aspects of FNAME compared with men. This may be attributable to a general female advantage in verbal memory (17), however, it may be related to the nature of the information. Previous work with FNAME indicates a diminishment of the sex effect when requiring memory for occupation-face versus name-face pairs (5, 18). Our findings from the CBB measures were consistent with previous results examining this battery in relationship to AD neuroimaging markers in normal older adults. Poorer performance on OCL has been associated with higher levels of CSF phosphorylated-tau/Abeta42 in late middle-aged participants in the Wisconsin Registry for Alzheimer’s Prevention (4). Similarly, we found that OCL was sensitive. However, we also found that working memory (ONB) was also relatively strongly associated with elevated Aβ. While C3 constituents were selected theoretically and a-priori, ONB may be considered for inclusion in future optimized and/or data-driven C3 versions. Interestingly, the Aβ+ group made more errors on a Cogstate RT task (IDN) but paradoxically also performed the task more quickly compared with the Aβ- group. These findings suggest that faster RT may, in fact, be a sign of subtle decrements. One explanation for this finding is an age-associated decrease in inhibition of pre-potent responses (19) may be more pronounced in preclinical AD. More broadly, it confirms that early cognitive changes in preclinical AD extend beyond memory (20, 21).
Part of the impetus for combining outcomes from the BPS-O, FNAME, and CBB into a C3, is aligned with the rationale for cognitive composites as primary endpoints (22) to maximize signal to noise ratio in a population expected to exhibit subtle cognitive decrements. This was confirmed in our data whereby the combination of FNMT, BPS-O, and OCL into the C3 resulted in a numerically larger effect size compared with any single one of these measures alone. However, there are multiple means of constructing composites including data-driven approaches; for example, selecting measures most associated with Aβ cross-sectionally or measures most sensitive to change. The current C3 was theoretically derived on the basis of previous literature and longitudinal data is needed to confirm its sensitivity over time. Importantly, different memory measures provided related but partially unique information about Aβ status. For example, both BPS-O and OCL were significant predictors of Aβ status when included in the same model (Supplementary Table 2 Model C). More recent work examining the heterogeneity of cognitive decline in early AD suggests that different atrophy patterns are associated with different cognitive trajectories (23). A cognitive composite would thus benefit from being sufficiently broad to avoid under/overestimating decline in a given subgroup.
Our finding that OCL differentiated Aβ+ vs Aβ- men but not women highlights the issue of heterogeneity in a different light. Males and females performed equivalently for visual memory of playing cards (OCL) but females outperformed males on face-name memory. We hypothesize that visual card-based tasks may be both more engaging and an area of relative strength for males versus females in contrast with name memory (17). Regardless, these findings highlight the rationale for composite scores and the opportunity to use C3 to better understand demographic and individual differences in performance and cognitive trajectories.

C3 Performance and ε4 Status

The lack of a group difference in C3 performance between ε4 carriers vs. non-carriers is not unexpected given the specific recruitment of CN older adults and the current cross-sectional analysis. This is evidenced by the further diminishment of group differences between e4+ vs. e4- participants when including age as a covariate. In contrast, removal of age as a covariate systematically increased the Aβ+ vs. Aβ- group differences.

C3 and Re-testing

Consistent with the literature, participants performed slightly better on re-testing which is consistent with increased familiarity with the tablet and task demands (3). Diminished practice effects have been shown to predict incident MCI and/or dementia (24, 25) and have been suggested as a screening tool (26). However, we did not observe differential improvement in performance by Aβ group status. Future adjustments to the FNAME paradigm emphasizing item versus task familiarity may increase the relevance of a diminished practice effect. More specifically, using repeated versus alternate stimuli may capture more AD-specific learning over repeated exposures to the same material (27). C3 practice effects are likely to diminish significantly after the second administration (24). Likewise, item familiarity practice effects are unlikely to contribute to C3 trajectories over time given that all remaining versions are unique.



Within the context of AD secondary prevention trials, our results indicate that computerized (tablet-based) cognitive testing is feasible in older adults in a secondary prevention trial setting and we provide support for the validity of such testing as the C3 was 1) correlated with the primary outcome of paper and pencil composite performance (PACC), 2) related to AD pathological burden (Aβ+/-) and 3) related to Aβ+/- at a similar magnitude as the PACC. Positive relationships with AD biomarkers and PACC suggest that the C3 is capturing meaningful cognitive decrements and, has the potential to serve as a proxy for paper and pencil measures in future trials. In addition to reducing staff time and allowing the possibility for remote assessment, computerized testing has the potential to capture a greater quantity and more nuanced quality of data for each measure. Future work will determine the sensitivity of the C3 to change over time in the context of an anti-amyloid treatment trial.


Acknowlegments and funding: The A4 Study is a secondary prevention trial in preclinical Alzheimer’s disease, aiming to slow cognitive decline associated with brain amyloid accumulation in clinically normal older individuals. The A4 Study is funded by a public-private-philanthropic partnership, including funding from the National Institutes of Health-National Institute on Aging (U19AG010483; R01AG063689), Eli Lilly and Company, Alzheimer’s Association, Accelerating Medicines Partnership, GHR Foundation, an anonymous foundation and additional private donors, with in-kind support from Avid, Cogstate, Albert Einstein College of Medicine, US Against Alzheimer’s disease, and Foundation for Neurologic Diseases. The companion observational Longitudinal Evaluation of Amyloid Risk and Neurodegeneration (LEARN) Study is funded by the Alzheimer’s Association and GHR Foundation. The A4 and LEARN Studies are led by Dr. Reisa Sperling at Brigham and Women’s Hospital, Harvard Medical School and Dr. Paul Aisen at the Alzheimer’s Therapeutic Research Institute (ATRI), University of Southern California. The A4 and LEARN Studies are coordinated by ATRI at the University of Southern California, and the data are made available through the Laboratory for Neuro Imaging at the University of Southern California. The participants screening for the A4 Study provided permission to share their de-identified data in order to advance the quest to find a successful treatment for Alzheimer’s disease. We would like to acknowledge the dedication of all the participants, the site personnel, and all of the partnership team members who continue to make the A4 and LEARN Studies possible. The complete A4 Study Team list is available on:

Conflicts of interest: K Papp has served as a consultant for Biogen Idec and Digital Cognition Technologies. D Rentz has served as a consultant for Eli Lilly, Biogen Idec, Lundbeck Pharmaceuticals, and serves as a member of the Scientific Advisory Board for Neurotrack. P Maruff is a full-time employee of Cogstate Ltd. C-K. Sun has no disclosures to report. R. Raman has no disclosures to report. M. Donohue has served on scientific advisory boards for Biogen, Eli Lilly, and Neurotrack; and has consulted for Roche. His spouse is a full-time employee of Janssen. A. Schembri is a full-time employee of Cogstate Ltd. C. Stark has no disclosures to report. M Yassa has served as a consultant for Pfizer, Eli Lilly, Lundbeck and Dart Neuroscience and is chief scientific officer of Signa Therapeutics, LLC. A. Wessels is a full-time employee of Eli Lilly and Company. R. Yaari is a full-time employee of Eli Lilly and Company. K. Holdridge is a full-time employee of Eli Lilly and Company. P. Aisen has received research funding from NIA, FNIH, the Alzheimer’s Association, Janssen, Lilly and Eisai, and personal fees from Merck, Roche, Biogen, ImmunoBrain Checkpoint and Samus. R.A. Sperling has received research funding from NIH, Alzheimer’s Association and Eli Lilly for this research. She has served as a consultant for AC Immune, Biogen, Eisai, Janssen, Neurocentria and Roche. Her spouse has served as a consultant to Biogen, Janssen, and Novartis.

Ethical Standards: Study procedures were conducted in accordance with consensus ethics principles derived from international ethics guidelines, including the Declaration of Helsinki and Council for International Organizations of Medical Sciences (CIOMS) International Ethical Guidelines.

Open Access: This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits use, duplication, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.





1. Sperling, R.A., Rentz, D.M., Johnson, K.A., et al., The A4 study: stopping AD before symptoms begin? Sci Transl Med, 2014. 6(228): p. 228fs13.
2. Sperling, R.A., Donohue, M., Raman, R., et al., Factors associated with elevated amyloid burden in clinically normal older individuals in the A4 Study screening cohort. JAMA Neurology, in press.
3. Mielke, M.M., Weigand, S.D., Wiste, H.J., et al., Independent comparison of CogState computerized testing and a standard cognitive battery with neuroimaging. Alzheimers Dement, 2014. 10(6): p. 779-89.
4. Racine, A.M., Clark, L.R., Berman, S.E., et al., Associations between Performance on an Abbreviated CogState Battery, Other Measures of Cognitive Function, and Biomarkers in People at Risk for Alzheimer’s Disease. J Alzheimers Dis, 2016. 54(4): p. 1395-1408.
5. Papp, K.V., Amariglio, R.E., Dekhtyar, M., et al., Development of a psychometrically equivalent short form of the Face-Name Associative Memory Exam for use along the early Alzheimer’s disease trajectory. Clin Neuropsychol, 2014. 28(5): p. 771-85.
6. Rentz, D.M., Locascio, J.J., Becker, J.A., et al., Cognition, reserve, and amyloid deposition in normal aging. Ann Neurol, 2010. 67(3): p. 353-64.
7. Stark, S.M., Yassa, M.A., Lacy, J.W., and Stark, C.E., A task to assess behavioral pattern separation (BPS) in humans: Data from healthy aging and mild cognitive impairment. Neuropsychologia, 2013. 51(12): p. 2442-9.
8. Vannini, P., Hedden, T., Becker, J.A., et al., Age and amyloid-related alterations in default network habituation to stimulus repetition. Neurobiol Aging, 2012. 33(7): p. 1237-52.
9. Marks, S.M., Lockhart, S.N., Baker, S.L., and Jagust, W.J., Tau and beta-Amyloid Are Associated with Medial Temporal Lobe Structure, Function, and Memory Encoding in Normal Aging. J Neurosci, 2017. 37(12): p. 3192-3201.
10. Kirwan, C.B. and Stark, C.E., Overcoming interference: an fMRI investigation of pattern separation in the medial temporal lobe. Learn Mem, 2007. 14(9): p. 625-33.
11. Donohue, M.C., Sperling, R.A., Salmon, D.P., et al., The Preclinical Alzheimer Cognitive Composite: Measuring Amyloid-Related Decline. JAMA Neurol, 2014. 71(8): p. 961-970.
12. Rentz, D., Dekhtyar, M., Sherman, J., et al., The Feasibility of At-Home iPad Cognitive Testing For Use in Clinical Trials. J Prev Alzheimers Dis, 2016. 3(1): p. 8-12.
13. Fredrickson, J., Maruff, P., Woodward, M., et al., Evaluation of the usability of a brief computerized cognitive screening test in older people for epidemiological studies. Neuroepidemiology, 2010. 34(2): p. 65-75.
14. Maruff, P., Lim, Y.Y., Darby, D., et al., Clinical utility of the cogstate brief battery in identifying cognitive impairment in mild cognitive impairment and Alzheimer’s disease. BMC Psychol, 2013. 1(1): p. 30.
15. Sawilowsky, S.S., New Effect Size Rules of Thumb. Journal of Modern Applied Statistical Methods, 2009. 8(2): p. 26.
16. Yassa, M.A., Lacy, J.W., Stark, S.M., et al., Pattern separation deficits associated with increased hippocampal CA3 and dentate gyrus activity in nondemented older adults. Hippocampus, 2011. 21(9): p. 968-79.
17. Sundermann, E.E., Biegon, A., Rubin, L.H., et al., Does the Female Advantage in Verbal Memory Contribute to Underestimating Alzheimer’s Disease Pathology in Women versus Men? J Alzheimers Dis, 2017. 56(3): p. 947-957.
18. Buckley, R., Sparks, K., Papp, K., et al., Computerized cognitive testing for use in clinical trials: a comparison of the NIH Toolbox and Cogstate C3 batteries. The journal of prevention of Alzheimer’s disease, 2017. 4(1): p. 3.
19. Butler, K.M. and Zacks, R.T., Age deficits in the control of prepotent responses: evidence for an inhibitory decline. Psychol Aging, 2006. 21(3): p. 638-43.
20. Petersen, R.C., Department of Neurology, M.C.a.F., Rochester, Minnesota, Department of Health Sciences Research, M.C.a.F., Rochester, Minnesota, et al., Association of Elevated Amyloid Levels With Cognition and Biomarkers in Cognitively Normal People From the Community. JAMA Neurology, 2016. 73(1): p. 85-92.
21. Baker, J.E., Lim, Y.Y., Pietrzak, R.H., et al., Cognitive impairment and decline in cognitively normal older adults with high amyloid-beta: A meta-analysis. Alzheimers Dement (Amst), 2017. 6: p. 108-121.
22. Kozauer, N. and Katz, R., Regulatory innovation and drug development for early-stage Alzheimer’s disease. N Engl J Med, 2013. 368(13): p. 1169-71.
23. Zhang, X., Mormino, E.C., Sun, N., et al., Bayesian model reveals latent atrophy factors with dissociable cognitive trajectories in Alzheimer’s disease. Proc Natl Acad Sci U S A, 2016. 113(42): p. E6535-e6544.
24. Machulda, M.M., Pankratz, V.S., Christianson, T.J., et al., Practice effects and longitudinal cognitive change in normal aging vs. incident mild cognitive impairment and dementia in the Mayo Clinic Study of Aging. Clin Neuropsychol, 2013. 27(8): p. 1247-64.
25. Hassenstab, J., Ruvolo, D., Jasielec, M., et al., Absence of practice effects in preclinical Alzheimer’s disease. Neuropsychology, 2015. 29(6): p. 940-8.
26. Duff, K., Beglinger, L.J., Schultz, S.K., et al., Practice effects in the prediction of long-term cognitive outcome in three patient samples: a novel prognostic index. Arch Clin Neuropsychol, 2007. 22(1): p. 15-24.
27. Pihlajamaki, M., O’Keefe, K., O’Brien, J., Blacker, D., and Sperling, R.A., Failure of repetition suppression and memory encoding in aging and Alzheimer’s disease. Brain Imaging Behav, 2011. 5(1): p. 36-44.



N.H. Stricker1, E.S. Lundt2, E.C. Alden1, S.M. Albertson2, M.M. Machulda1, W.K. Kremers2, D.S. Knopman4, R.C. Petersen4, M.M. Mielke3,4


1. Division of Neurocognitive Disorders, Department of Psychiatry and Psychology, Mayo Clinic, Rochester, Minnesota, USA; 2. Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA; 3. Division of Epidemiology, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA; 4. Department of Neurology, Mayo Clinic, Rochester, Minnesota, USA

Corresponding Author: Nikki H. Stricker, Ph.D., Mayo Clinic, 200 First Street SW, Rochester, MN 55905; 507-284-2649 (phone), 507-284-4158 (fax),

J Prev Alz Dis 2019;(6) in press
Published online September 16, 2019,



Background: The Cogstate Brief Battery (CBB) is a computerized cognitive assessment that can be completed in clinic or at home.
Design/Objective: This retrospective study investigated whether practice effects / performance trajectories of the CBB differ by location of administration.
Participants/Setting: Participants included 1439 cognitively unimpaired individuals age 50-75 at baseline participating in the Mayo Clinic Study of Aging (MCSA), a population-based study of cognitive aging. Sixty three percent of participants completed the CBB in clinic only and 37% completed CBB both in clinic and at home.
Measurements: The CBB consists of four subtests: Detection, Identification, One Card Learning, and One Back. Linear mixed effects models were used to evaluate performance trajectories in clinic and at home.
Results: Results demonstrated significant practice effects between sessions 1 to 2 for most CBB measures. Practice effects continued over subsequent testing sessions, to a lesser degree. Average practice effects/trajectories were similar for each location (home vs. clinic). One Card Learning and One Back accuracy performances were lower at home than in clinic, and this difference was large in magnitude for One Card Learning accuracy. Participants performed faster at home on Detection reaction time, although this difference was small in magnitude.
Conclusions: Results suggest the location where the CBB is completed has an important impact on performance, particularly for One Card Learning accuracy, and there are practice effects across repeated sessions that are similar regardless of where testing is completed.

Key words: Neuropsychology, computerized testing, cognitively unimpaired, memory, reaction time.



There is growing interest in improving access to cognitive assessment tools for research and clinical use by allowing for administration of cognitive measures in unsupervised settings, including at home. The ability to complete cognitive measures at home has implications for enriching clinical trials, and a large scale demonstration of this goal is already underway through the Brain Health Registry (1). At home assessment can also facilitate collection of clinical outcome data by allowing collection of follow-up data for participants who may live far from clinical centers or for longer periods of time than otherwise feasible when a clinic visit is required. As such, developing computerized testing platforms that can be reliably administered both in supervised clinical settings and unsupervised settings is important. Cogstate is one computerized platform that has numerous tests available, including the Cogstate Brief Battery (CBB), which consists of four subtests that measure attention, working memory, processing speed, and visual learning. In addition, prior data has demonstrated the feasibility of using the CBB in both supervised and unsupervised settings (2). This makes Cogstate an appealing option for use as a screening measure that can be completed in the clinic or home environment.
There is also significant interest in cognitive measures that may help identify individuals at risk of mild cognitive impairment (MCI) or dementia and who would benefit from further clinical work up. Given CBB can be completed at home or during general medical appointments, it may be able to address this clinical need. In addition, existing studies suggest CBB is sensitive to early cognitive decline, and that CBB performance can help identify individuals with cognitive impairment (3, 4). The CBB has FDA approval under the name CognigramTM for use as a digital cognitive assessment tool in individuals 6-99 years of age.  However, it is not known whether there are differences in performance across supervised and unsupervised settings that could differentially impact the sensitivity of the CBB. Currently, the CBB is being used in large scale epidemiological and experimental studies. While some studies did not find evidence for differences in performance across supervised and unsupervised settings (2), other research has suggested potential variations in performance. For example, the Mayo Clinic Study of Aging (MCSA) has administered Cogstate since 2012, and pilot data (n = 194) for participants completing the CBB first in clinic and then at home within 6 months showed small but statistically significant performance differences by location of administration, with participants performing faster at home compared to in the clinic (5). Therefore, additional data with larger sample sizes and more follow-up sessions are needed to fully evaluate the integrity of completing CBB in supervised and unsupervised settings, and to determine whether there are performance differences when individuals complete CBB in clinic and at home.
In addition to the ability to complete CBB in both supervised and unsupervised settings, CBB measures were designed to minimize practice effects by having randomly generated alternative forms each time an individual takes the test. This is an important aspect of CBB because one challenge of detecting cognitive decline in older adults is that decline may be relatively subtle but practice effects associated with repeat testing over time can be quite robust. For example, research using traditional neuropsychological assessments suggests practice effects can occur between baseline and follow-up visits on measures of learning and memory, even in individuals with incident MCI (6). Similarly, prior research using CBB over multiple sessions indicates the strongest practice effects occur between the first and second assessment (7). However, while most CBB practice effects stabilized after the third evaluation, sustained practice effects were observed for One Card Learning accuracy (7). Additionally, another study of older adults completing CBB in an unsupervised testing environment demonstrated continued practice effects over multiple testing sessions (8). Given this evidence of practice effects on the CBB, further research is needed to clarify the nature of these practice effects and determine whether practice effects differ across supervised and unsupervised testing environments.
The primary aim of this study was to investigate whether practice effects / performance trajectories of the CBB differ across supervised and unsupervised settings, represented within this study by location of administration (clinic vs. home, respectively). Secondary aims were to 1) further assess differences in test performance across location of administration with a larger independent sample, controlling for the known session 1 to session 2 practice effect that may have confounded our preliminary home vs. clinic analyses [5]; and 2) further assess practice effects on the CBB across additional follow-up sessions.



The MCSA is a population-based study of cognitive aging among Olmsted County, MN, residents. It began in October 2004 and initially enrolled individuals aged 70 to 89 years with follow-up visits every 15 months. The details of the study design and sampling procedures have been previously published; enrollment follows an age- and sex-stratified random sampling design to ensure that men and women are equally represented in each 10-year age strata [9]. In 2012, enrollment was extended to cover the ages of 50-90+ following the same sampling methods. Administration of Cogstate began in 2012 for newly enrolled 50-69 year olds and in 2013 for those aged 70 and older during clinic visits. From September 2013 through March 2014 and September 2014 through July 2015 we piloted administration of the CBB at home among MCSA participants in our 50-69 year old cohort. This pilot data ensured acceptability of the at home testing process in our participants and preliminary analyses demonstrated generally comparable performance in the clinic versus at home (5). Although participants did perform faster at home, this difference was viewed as small in magnitude and at-home testing was offered to all MCSA participants starting July 2015. Individuals who completed the pilot testing (n = 380) were not included in the current analysis. There was a trend toward fewer follow-up sessions available for individuals over 75 due to a temporary cap on the number of Cogstate sessions in the protocol for older participants. We therefore limited the participants in the current study to individuals who were between the ages of 50-75 at the time of their first Cogstate session.
Study visits included a neurologic evaluation by a physician, an interview by a study coordinator, and neuropsychological testing by a psychometrist (9). The physician examination included a medical history review, complete neurological examination, and administration of the Short Test of Mental Status (10). The study coordinator interview included demographic information and medical history, and questions about memory to both the participant and informant using the Clinical Dementia Rating (CDR®) Dementia Staging Instrument (11). See Roberts et al. (9) for details about the neuropsychological battery.
For each participant, performance in a cognitive domain was compared with age-adjusted scores of cognitively unimpaired (CU) individuals using Mayo’s Older American Normative Studies (12). Participants with scores of ≥ 1.0 SD below the age-specific mean in the general population were considered for possible cognitive impairment. A diagnosis of mild cognitive impairment (MCI) or dementia was based on a consensus agreement between the interviewing study coordinator, examining physician, and neuropsychologist, after a review of all participant information (9, 13). Performance on Cogstate was not available for review during consensus conference and thus independent of diagnosis. Individuals who did not meet criteria for MCI or dementia were deemed CU and were eligible for inclusion in the current study. Data for participants who were CU at the time of their first Cogstate session but later were assigned a diagnosis of MCI or dementia at a follow-up visit were included until the visit with the diagnosis to avoid biasing our sample toward individuals with less follow-up data available (also see Supplemental Results for sensitivity analyses).
The study protocols were approved by the Mayo Clinic and Olmsted Medical Center Institutional Review Boards. All participants provided written informed consent.

Cogstate Brief Battery

All participants completed their first Cogstate session in clinic. Cogstate was administered on a PC or iPad during MCSA clinic visits (every 15 months), and our prior work describes small platform differences on select outcome variables (14). Participants are permitted to choose whether to complete Cogstate in clinic or at home in between full MCSA study visits. Home testing was completed on a PC through a web browser (i.e., not on an iPad, tablet or phone). Participants electing to complete Cogstate in clinic returned for an in clinic visit at 7.5 month intervals. Participants electing to complete Cogstate at home were sent an email and prompted to complete the testing at 4-month intervals, with a reminder after 2 weeks if not completed.
Each Cogstate administration included a short practice battery followed by a 2-minute rest period and then the complete battery. The practice battery was not used in any analyses. For in clinic visits, the study coordinator was available to help the participants understand the tasks during the practice session. During the test session, the coordinator provided minimal supervision or assistance and typically waited in another room for the participant to finish. The ability to reliably complete and adhere to the requirements of each task was determined by completion and integrity checks as previously described (14). All data values with a failed completion were excluded and failed integrity values were included and examined as potential outlier values.
Cogstate subtests were administered in the order listed below. Accuracy and reaction time measures were transformed by Cogstate applying a logarithmic base 10 transformation to reaction time data (milliseconds) and arcsine square root transformation to the proportion correct (accuracy) in order to improve normality and reduce skewness.
Detection is a simple reaction time (RT) paradigm that measures psychomotor speed. Participants press “yes” as quickly as possible when a playing card turns face up. RT for correct responses was the primary outcome measure, which is often referred to as speed in other Cogstate manuscripts.
Identification is a choice RT paradigm that measures visual attention. Participants press “yes” or “no” to indicate whether or not a playing card is red as quickly as possible. RT for correct responses was the primary outcome measure. A linear correction was applied to each PC time point (in clinic and at home) to correct for small PC-iPad performance differences in clinic as previously described (14).
One Card Learning is a continuous visual recognition learning task that assesses learning and attention. Participants press “yes” or “no” to indicate whether or not they have seen the card presented previously in the deck. Accuracy was the primary outcome measure.
One Back assesses working memory. Participants press “yes” or “no” to indicate whether or not the card is the same as the last card viewed (one back) as quickly as possible. Accuracy was the primary outcome measure.

Statistical Methods

Each individual had repeated measures from regular testing where the location was in clinic approximately every 15 months and either in-clinic or at home during the interim. Linear mixed effects (LME) models with random subject-specific intercepts and slopes were used to assess differences between testing locations in each response measure. All models were adjusted for age, sex, and education as additive effects.  We captured practice effects using a piecewise linear spline with a bend at session two parameterized with two variables: first practice (sessions 1 to 2) and subsequent practice (session 2 to 3, 3 to 4, and so on).  A positive beta for first practice and subsequent practice both imply increasing scores with more practice for accuracy measures, and a negative beta implies improved performance with more practice for RT measures.  To assess whether learning effects differed by location we included an interaction between subsequent practice and location. Non-linearity of the subsequent practice was also tested by including natural log of subsequent practice but did not reach significance at the 0.05 level and was excluded from our final models.
The distribution of the response variable is known to reflect the distribution of the residuals. Many individuals scored near ceiling (perfect score) on One Back accuracy resulting in a skewed, non-normal distribution despite the transformation applied. However, LME models assume residual errors follow normal distributions.  In order to assess the appropriateness of LME for One Back accuracy we fit a second Generalized Linear Mixed effects model (GLMM) with a binomial link. For this response variable, the number of correct responses out of X trials is known to follow a binomial distribution; hence we expect the error distribution to be binomial as well.  We observed that our LME model may be biased toward the mean especially at early sessions when compared to the GLMM; however the estimated difference between at home and in-clinic was comparable, thus we report the LME model for consistency across response variables.  In our case, the large sample size allowed for a robust estimate of group difference despite the slight departure from normality.  Analyses were conducted using statistical software R version 3.4.1.




See Table 1 for participant demographics. There were 1439 CU individuals who completed the CBB, with a mean of 3.2 years of follow-up data; over half the sample (50.2%) completed 7 or more Cogstate sessions and 6% completed 14 or more sessions. Sixty three percent of the sample completed Cogstate in clinic only and 37% completed Cogstate both in clinic and at home. Participants completing Cogstate both in clinic and at home were slightly younger, had higher years of education, and had more follow-up sessions available (as expected given the study design), but did not differ by sex. Of all Cogstate sessions completed in clinic, 34.5% were completed on a PC (N = 2559) and the remainder were completed on an iPad (N = 4850). Of all Cogstate sessions completed, 22.8% of sessions (N = 2192) were at home. Completion flag failures were infrequent (< 0.3%; see Supplemental Table 1), with all completion failures occurring in clinic and none at home. Integrity flag failure rates were comparable across clinic and home sessions for most subtests, although there was a slightly greater rate of integrity failures on One Card Learning at home sessions (2%) relative to in clinic sessions (1%; p < .001). Because of this slight difference, we performed sensitivity analyses for One Card Learning and determined that conclusions did not change when excluding data points with a failed integrity flag. We retain this data in our models to improve generalizability of results.
Note. CI = confidence interval. For Detection and Identification, values represent logarithmic base 10 transformation for reaction time data (collected in milliseconds) and lower values signify better performance. For One Card Learning and One Back, values represent arcsine transformation for accuracy data and higher values signify better performance.

Table 1. Participant demographics at baseline visit

Table 1. Participant demographics at baseline visit

Note. SD = standard deviation. P-values reported above are from linear model ANOVAs (continuous variables) or Pearson’s Chi-square test (frequencies).


Significant practice effects

Results demonstrate significant practice effects for most Cogstate measures (see Table 2). Consistent with our earlier results that demonstrated the most pronounced practice effect at session 2 (7) and our own exploration of additional types of models in an effort to find the best way to model the observed data, inspection of Figure 2 and Table 2 demonstrates a clear practice effect from session 1 to session 2 on most CBB measures (p’s < .001), except Detection RT. The magnitude of the session 1 to session 2 practice effect is large relative to the magnitude of performance differences associated with demographic variables, particularly for One Card Learning accuracy (see Figure 3). In addition, practice effects continued across all additional follow up sessions for all Cogstate measures (p’s < .001), although visual inspection of Figure 3 suggests continued practice effects are minimal for RT measures despite coefficients reaching significance.

Table 2. Linear mixed effects regression parameter estimates (standard errors) for predicting four cognition measures

Table 2. Linear mixed effects regression parameter estimates (standard errors) for predicting four cognition measures

Note. RT = reaction time. SE = standard error. SD = standard deviation. Age in years; sex is 1 for males and 0 for females; education is total years with all individuals less than 11 coded as 11; First practice is 0 = first session and 1 = second session; Subsequent practice (2+) is 0 = 2ndsession, 1 = 3rd, 2 = 4th, and so on; location is 0 = clinic and 1 = home. Taken together first practice and subsequent practice comprise a piecewise linear spline with a bend at session 2. For Detection and Identification, values represent logarithmic base 10 transformation for reaction time data (collected in milliseconds) and negative beta estimates signify better/improved performance. For One Card Learning and One Back, values represent arcsine transformation for accuracy data and positive beta estimates signify better/improved performance.


Practice effects by location were similar

For all Cogstate measures, we did not observe an interaction between location and number of Cogstate sessions (p’s > .05). These results suggest that participants show the same cognitive trajectory / degree of practice effect across Cogstate sessions regardless of completion location.

Faster at home on Detection

Participants showed faster performance at home on Detection RT (p < .01). Visual inspection of Figure 2 suggests that this difference may become less pronounced over repeated sessions, although the interaction failed to reach significance, likely related to significant noise for this response variable. There was no significant difference across performance in clinic and home for Identification RT.

Less accurate at home

Visual inspection of Figure 1 and results in Table 2 show that One Card Learning and One Back accuracy are lower at home than in clinic (p < .001).

Figure 1. Study flow chart

Figure 1. Study flow chart


Magnitude of study findings

Applying internal Cogstate normative data for change to our model estimated effects provides some insight into the potential impact of our main study findings (15). We used the location (clinic vs. home) estimate from our model (see Table 2) and applied a Reliable Change Index formula (16) that uses within-subject standard deviation provided as part of their normative data to generate a z-score for change across sessions that helps illustrate the size of the effect relative to sampling variability across two sessions. The location of administration difference alone yielded z-scores for change of 0.10 on Detection RT and -.47 on One Card Learning accuracy. Similarly, the estimated difference from session 1 to 2 on One Card Learning accuracy (see Table 2) yields a z-score for change of 0.33. The practice effect is lesser in magnitude after the second session (0.06 z for each interval), but sessions 2 to 7 had an aggregate z-score for change of .30 (approximately the magnitude of the first practice effect). Together, the change from baseline to session 7 would yield a z-score for change of 0.63. That is, the average participant in our sample shows nearly a 2/3 standard deviation improvement in One Card Learning accuracy performance at session 7 relative to baseline. To assist with interpretation of the size of these effects, we also re-ran models on raw, untransformed Cogstate variables and results are presented in Supplementary Table 2 (S2) and Supplemental Results.

Figure 2. Mean of Cogstate scores as a function of session number and location (clinic versus home) for a cognitively unimpaired population 63 years of age with 16 years of education and averaged between males and females, overlaid on scatterplots

Figure 2. Mean of Cogstate scores as a function of session number and location (clinic versus home) for a cognitively unimpaired population 63 years of age with 16 years of education and averaged between males and females, overlaid on scatterplots

Note. CI = confidence interval. For Detection and Identification, values represent logarithmic base 10 transformation for reaction time data (collected in milliseconds) and lower values signify better performance. For One Card Learning and One Back, values represent arcsine transformation for accuracy data and higher values signify better performance.

Figure 3. Estimated mean score (95% CI) in cognitive measures from linear mixed-effects models for specified levels of each feature. In the absence of interactions, additive effect age (or sex, or education) shifts estimates up or down by a fixed amount per year of age. Unless indicated in the y-axis description we used age of 63 years, education 16 years, location in-clinic, first session, and averaged between males and females for estimating effects

Figure 3. Estimated mean score (95% CI) in cognitive measures from linear mixed-effects models for specified levels of each feature. In the absence of interactions, additive effect age (or sex, or education) shifts estimates up or down by a fixed amount per year of age. Unless indicated in the y-axis description we used age of 63 years, education 16 years, location in-clinic, first session, and averaged between males and females for estimating effects

Note. CI = confidence interval; DET = Detection; IDN = Identification; OCL = One Card Learning; ONB = One Back.



The main finding of this study is that practice effects on the CBB do not differ by location of administration. In accordance with our secondary aims, we also demonstrate that: 1) there are important differences across CBB sessions completed in clinic and at home, and 2) there are practice effects on the CBB from session 1 to session 2; moreover practice effects continue across numerous follow-up sessions.
This study adds to growing evidence suggesting that practice effects must be considered when using the CBB, despite careful efforts by Cogstate to mitigate practice effects by implementing a short practice trial before CBB administration and the use of randomly generated alternate forms. This is consistent with prior work showing that alternate forms reduce, but do not eliminate, practice effects (17). This is because a number of other factors besides memory for specific test items can lead to practice effects, including familiarity and increased comfort with test procedures and learned strategies for successfully navigating task demands (18). Our results show a clear practice effect from session 1 to session 2 for most CBB measures (3 out of 4). For example, the difference between session 1 and session 2 on One Card Learning accuracy is similar in magnitude to the average difference between a 50 and 70 year old, and is an approximately 3.7% improvement on the raw score scale (see Figure 3 and S2). The magnitude of the practice effect from session 1 to 2 is smaller for IDN and ONB accuracy, and slightly less than a 10 year age difference. There are also continued practice effects after session 2 for all variables. These continued practice effects are more notable for accuracy measures, particularly One Card Learning. Continued practice effects are considered negligible for RT measures. Consistent with our results, Valdes et al. (8) similarly showed significant practice effects on 3 out of 4 CBB measures when administered monthly at home in a sample of older adults. They also found that prior computer use impacted the practice effect observed on One Card Learning and One Back accuracy, with greater improvements over sessions seen in individuals with less frequent computer use. Overall, our results suggest there are practice effects on CBB, particularly across sessions 1 and 2.  However, practice effects, or performance trajectories, of CBB performance are similar regardless of where an individual completes the CBB.
Although trajectories of CBB performance are similar across location of administration, our results suggest there are some potentially important performance differences by location. Most notably, individuals perform less accurately at home than in clinic (approximately 5.3% lower at home on raw percent accuracy; see S2). Figure 3 demonstrates the magnitude of this effect relative to the impact of other variables in the model. For One Card Learning accuracy, the difference between clinic and home performance at session 2 is slightly greater than the effect of 20 years difference in age; it is also greater than the initial practice effect from session 1 to session 2. Our previously published pilot data that were not included in the current study did not show a significant One Card Learning accuracy difference, and did not include One Back accuracy as an outcome variable (5). The session 1 to session 2 practice effect on One Card Learning accuracy may have obscured finding a home vs. clinic difference in our prior manuscript, as within that study all participants completed CBB in the clinic first, then at home. Participants also completed Cogstate in clinic first within the current study, but because participants who completed Cogstate at home also had subsequent Cogstate sessions in clinic, and our model takes into account the number of Cogstate sessions, this allows us to better estimate the home effect. Findings by Cromer et al. (2) did not show differences in any CBB measure compared across supervised and unsupervised sessions, including One Card Learning accuracy. Importantly, that study counterbalanced test order, although the sample size was small (n = 57) limiting the power to detect subtle differences. Similar to our earlier findings, we again found evidence that participants perform faster at home than in clinic on Detection RT, but we interpret the magnitude of this effect as small and not clinically meaningful. Our prior results also showed faster performance on One Back RT and One Card Learning RT, which were not included as primary response variables for the current study. Consistent with our prior results, there was no difference in Identification RT across location.
These results have important implications for future study designs. For studies focusing on detecting change in performance over time, investigators should consider study designs that would help minimize change due to practice effects. Because the largest practice effects were observed between session 1 and 2 in our data, administering two CBB sessions during the baseline visit, and excluding session 1 from further analysis could be considered, , as using CBB session 1 as a benchmark for future change may obscure true decline given typical practice effects. Valdes et al. (8) also recommended this approach based on their data. Similarly, we recommend investigators choose to either have participants complete the CBB in the clinic for all sessions, or have participants complete all sessions at home. If home administration is the only feasible option for longitudinal follow-up, investigators could choose to administer the first CBB in clinic to familiarize participants with the procedure as described above, but use a first CBB session administered at home shortly after as the longitudinal baseline. For populations that are less familiar with using computers at home, three baseline sessions could even be considered; one in clinic, a second at home to familiarize participants with procedures for completing the test at home, and a third session at home to serve as the study baseline.
These results also have important implications for the application of the CBB for clinical use or to inform diagnostic status in research studies. A working memory/learning composite score based on supervised One Card Learning and One Back accuracy performance has previously demonstrated good sensitivity and specificity for differentiating individuals with MCI and AD dementia from cognitively unimpaired individuals (4). Given our findings of significantly lower accuracy performance at home, it will be important to validate the diagnostic accuracy of the CBB in unsupervised settings.
Cogstate was designed to be sensitive to detecting change over time. Although the CBB has demonstrated sensitivity to memory decline in CU and MCI participants with high amyloid based on Pittsburgh compound B (PiB)-positron emission tomography neuroimaging (20), studies have not yet been done to demonstrate whether the application of available internal Cogstate norms for change is sensitive to subtle cognitive decline at the level of the individual, particularly for detection of MCI and AD dementia. In individuals with concussion, change on the CBB (aka CogSport/Axon) using study specific normative data (21) was less sensitive but more specific for concussion relative to a single assessment. Our finding of practice effects that continue after session 2 raises questions about the best method for determining whether a significant change has occurred beyond a single follow-up assessment. Internal Cogstate normative data provide within-subject standard deviation (WSD) values for most CBB primary outcome variables using a test-retest interval of approximately 1 month (15). This can be used in a reliable change index formula to calculate a z-score for change (16, 22). This helps take into account the session 1 to session 2 practice effect when determining if a change is significant, but it is not clear if application of this single interval WSD is appropriate for subsequent follow-up sessions, particularly for One Card Learning accuracy that demonstrated the most robust continued practice effect over time in our results. For these reasons, the use of a control group is critical for clinical trials using Cogstate as an outcome measure. Because the internal normative data provided by Cogstate are not in the public domain, this limits the ease of reproducing our illustration of the magnitude of these results.
Strengths of our study include the population-based design and large sample size. There are also several limitations. First, all participants completed Cogstate in the clinic first, which is a significant confound when comparing CBB performance in clinic and at home. A counter-balanced design would provide a better test of home-clinic differences. Second, the differing follow-up intervals across participants electing to complete Cogstate only in clinic versus also at home complicated interpretation of these results, but sensitivity analyses suggest this did not significantly impact findings (see Supplemental Results). Future studies would benefit from using the same follow-up interval regardless of location of administration. Although the fact that participants complete Cogstate in clinic and at home complicates comparison of trajectories, it also helps us to better estimate home vs. clinic effects. Future studies would benefit from including a measure of the frequency of computer use and confidence with computers to help determine whether that impacts results. Future studies would also benefit from examining whether practice effects are also observed in clinical populations, such as individuals with MCI, as this has been reported on some traditional neuropsychological tests (6).
In summary, results suggest the location where the CBB is completed has an important impact on performance, particularly for One Card Learning accuracy. CBB performance over time is influenced by practice effects on most measures, which are most prominent from session 1 to 2, but also continue over time. Practice effects and trajectories of performance over time are similar regardless of where testing is completed.


Funding: This work was supported by the Rochester Epidemiology Project (R01 AG034676), the National Institutes of Health (grant numbers P50 AG016574, U01 AG006786, and R01 AG041851), a grant from the Alzheimer’s Association (AARG-17-531322), the Robert Wood Johnson Foundation, The Elsie and Marvin Dekelboum Family Foundation, Alzheimer’s Association, and the Mayo Foundation for Education and Research. NHS and MMMi serve as consultants to Biogen and Lundbeck. DSK serves on a Data Safety Monitoring Board for the DIAN-TU study and is an investigator in clinical trials sponsored by Lilly Pharmaceuticals, Biogen, and the University of Southern California. RCP has served as a consultant for Hoffman-La Roche Inc., Merck Inc., Genentech Inc., Biogen Inc., Eisai, Inc. and GE Healthcare. The sponsors had no role in the design and conduct of the study; in the collection, analysis, and interpretation of data; in the preparation of the manuscript; or in the review or approval of the manuscript. The authors report no conflicts of interest.

Acknowledgements: The authors wish to thank the participants and staff at the Mayo Clinic Study of Aging.





1.    Weiner MW, Nosheny R, Camacho M, Truran-Sacrey D, Mackin RS, Flenniken D, et al. The Brain Health Registry: An internet-based platform for recruitment, assessment, and longitudinal monitoring of participants for neuroscience studies. Alzheimers Dement. 2018;14:1063-76.
2.    Cromer JA, Harel BT, Yu K, Valadka JS, Brunwin JW, Crawford CD, et al. Comparison of Cognitive Performance on the Cogstate Brief Battery When Taken In-Clinic, In-Group, and Unsupervised. Clin Neuropsychol. 2015;29:542-58.
3.    Darby DG, Pietrzak RH, Fredrickson J, Woodward M, Moore L, Fredrickson A, et al. Intraindividual cognitive decline using a brief computerized cognitive screening test. Alzheimers Dement. 2012;8:95-104.
4.    Maruff P, Lim YY, Darby D, Ellis KA, Pietrzak RH, Snyder PJ, et al. Clinical utility of the cogstate brief battery in identifying cognitive impairment in mild cognitive impairment and Alzheimer’s disease. BMC Psychology. 2013;1:30.
5.    Mielke MM, Machulda MM, Hagen CE, Edwards KK, Roberts RO, Pankratz VS, et al. Performance of the CogState computerized battery in the Mayo Clinic Study on Aging. Alzheimer’s & Dementia. 2015;11:1367-76.
6.    Machulda MM, Pankratz VS, Christianson TJ, Ivnik RC, Mielke MM, Roberts RO, et al. Practice effects and longitudinal cognitive change in normal aging vs. incident mild cognitive impairment and dementia in the Mayo Clinic Study of Aging. The Clinical Neuropsychologist. 2013;27:1247-64.
7.    Mielke MM, Machulda MM, Hagen CE, Christianson TJ, Roberts RO, Knopman DS, et al. Influence of amyloid and APOE on cognitive performance in a late middle-aged cohort. Alzheimer’s & Dementia. 2016;12:281-91.
8.    Valdes EG, Sadeq NA, Harrison Bush AL, Morgan D, Andel R. Regular cognitive self-monitoring in community-dwelling older adults using an internet-based tool. J Clin Exp Neuropsychol. 2016;38:1026-37.
9.    Roberts RO, Geda YE, Knopman DS, Cha RH, Pankratz VS, Boeve BF, et al. The Mayo Clinic Study of Aging: Design and sampling, participation, baseline measures and sample characteristics. Neuroepidemiology. 2008;30:58-69.
10.    Kokmen E, Smith GE, Petersen RC, Tangalos E, Ivnik RC. The short test of mental status: Correlations with standardized psychometric testing. Arch Neurol. 1991;48:725-8.
11.    Morris JC. The Clinical Dementia Rating (CDR): Current version and scoring rules. Neurology. 1993;43:2412-4.
12.    Ivnik RJ, Malec JF, Smith GE, Tangalos EG, Petersen RC, Kokmen E, et al. Mayo’s Older Americans Normative Studies: WAIS-R, WMS-R and AVLT norms for ages 56 through 97. The Clinical Neuropsychologist. 1992;6:1-104.
13.    Petersen RC, Roberts RO, Knopman DS, Geda YE, Cha RH, Pankratz VS, et al. Prevalence of mild cognitive impairment is higher in men: The Mayo Clinic Study of Aging. Neurology. 2010;75:889-97.
14.    Stricker NH, Lundt ES, Edwards KK, Machulda MM, Kremers WK, Roberts RO, et al. Comparison of PC and iPad administrations of the Cogstate Brief Battery in the Mayo Clinic Study of Aging: assessing cross-modality equivalence of computerized neuropsychological tests. Clin Neuropsychol. 2018:1-25.
15.    Cogstate. Cogstate Pediatric and Adult Normative Data. New Haven, CT: Cogstate, Inc.; 2018.
16.    Lewis MS, Maruff P, Silbert BS, Evered LA, Scott DA. The influence of different error estimates in the detection of postoperative cognitive dysfunction using reliable change indices with correction for practice effects. Arch Clin Neuropsychol. 2007;22:249-57.
17.    Calamia M, Markon K, Tranel D. Scoring higher the second time around: Meta-analysis of practice effects in neuropsychological assessment. The Clinical Neuropsychologist. 2012;26:543-70.
18.    Heilbronner RL, Sweet JJ, Attix DK, Krull KR, Henry GK, Hart RP. Official position of the American Academy of Clinical Neuropsychology on serial neuropsychological assessments: the utility and challenges of repeat test administrations in clinical and forensic contexts. Clin Neuropsychol. 2010;24:1267-78.
19.    Mackin RS, Insel PS, Truran D, Finley S, Flenniken D, Nosheny R, et al. Unsupervised online neuropsychological test performance for individuals with mild cognitive impairment and dementia: Results from the Brain Health Registry. Alzheimers Dement (Amst). 2018;10:573-82.
20.    Lim YY, Maruff P, Pietrzak RH, Ellis KA, Darby D, Ames D, et al. Abeta and cognitive change: examining the preclinical and prodromal stages of Alzheimer’s disease. Alzheimers Dement. 2014;10:743-51 e1.
21.    Louey AG, Cromer JA, Schembri AJ, Darby DG, Maruff P, Makdissi M, et al. Detecting cognitive impairment after concussion: Sensitivity of change from baseline and normative data methods using the CogSport/Axon cognitive test battery. Arch Clin Neuropsychol. 2014;29:432-41.
22.    Hinton-Bayre AD. Deriving reliable change statistics from test-retest normative data: comparison of models and mathematical expressions. Arch Clin Neuropsychol. 2010;25:244-56.



R.F. Buckley1,2,3,4, K.P. Sparks1,5, K.V. Papp1,2,5, M. Dekhtyar1,5, C. Martin6, S. Burnham7, R.A. Sperling1,2,5, D.M. Rentz1,2,5


1. Massachusetts General Hospital, Boston, Massachusetts, USA; 2. Harvard Medical School, Boston, Massachusetts, USA; 3. Florey Institutes of Neuroscience and Mental Health, Melbourne, Australia; 4. Melbourne School of Psychological Sciences, University of Melbourne, Australia; 5. Brigham and Women’s Hospital, Boston, Massachusetts, USA; 6. Northeastern University, Boston, Massachusetts, USA; 7. Commonwealth Scientific and Industrial Research Organization, Perth, Australia

Corresponding Author: Dorene M. Rentz,  Harvard Medical School,  Department of Neurology,  Brigham and Women’s Hospital,  60 Fenwood Road,  Boston, MA 02115,  Phone 617-732-8235,  Email:

J Prev Alz Dis 2017;4(1):3-11
Published online January 24, 2017,



Background: As prevention trials for Alzheimer’s disease move into asymptomatic populations, identifying older individuals who manifest the earliest cognitive signs of Alzheimer’s disease is critical. Computerized cognitive testing has the potential to replace current gold standard paper and pencil measures and may be a more efficient means of assessing cognition. However, more empirical evidence about the comparability of novel computerized batteries to paper and pencil measures is required.
Objectives: To determine whether two computerized IPad batteries, the NIH Toolbox Cognition Battery and Cogstate-C3, similarly predict subtle cognitive impairment identified using the Preclinical Alzheimer Cognitive Composite (PACC).
Design, Setting, Participants: A pilot sample of 50 clinically normal older adults (Mage=68.5 years±7.6, 45% non-Caucasian) completed the PACC assessment, and the NIH Toolbox and Cogstate-C3 at research centers of Massachusetts General and Brigham and Women’s Hospitals. Participants made 3-4 in-clinic visits, receiving the PACC first, then the NIH Toolbox, and finally the Cogstate-C3.
Measurements: Performance on the PACC was dichotomized by typical performance (>= 0.5SD), versus subtle cognitive impairment (<0.5SD). Composites for each computerized battery were created using principle components analysis, and compared with the PACC using non-parametric Spearman correlations. Logistic regression analyses were used to determine which composite was best able to classify subtle cognitive impairment from typical performance.
Results: The NIH Toolbox formed one composite and exhibited the strongest within-battery alignment, while the Cogstate-C3 formed two distinct composites (Learning-Memory and Processing Speed-Attention). The NIH Toolbox and C3 Learning-Memory composites exhibited positive correlations with the PACC (ρ=0.49, p<0.001; ρ=0.58, p<0.001, respectively), but not the C3 Processing Speed-Attention composite, ρ=-0.18, p=0.22. The C3 Learning-Memory was the only composite that classified subtle cognitive impairment, and demonstrated the greatest sensitivity (62%) and specificity (81%) for that subtle cognitive impairment.
Conclusions: Preliminary findings suggest that the NIH Toolbox has the advantage of showing the strongest overall clustering and alignment with standardized paper-and-pencil tasks. By contrast, Learning-Memory tasks within the Cogstate-C3 battery have the greatest potential to identify cross-sectional, subtle cognitive impairment as defined by the PACC

Key words: Cognition, Neuropsychology, Aging, Computerized Testing.



Interest in using computerized cognitive testing as a potential outcome measure in clinical trials has steadily increased. Computerized testing has been proposed as a feasible and reliable way of testing older participants (1-4). Studies examining the validity of computerized cognitive composites in relation to performance on conventional neuropsychological instruments are accruing (5-8), and furthermore, computerized testing has already become a secondary outcome in a major clinical trial (9). Until recently, however, clinical trials have relied upon conventional paper and pencil neuropsychological tests, as they represent a gold-standard in clinical testing and diagnostic decision-making (for a discussion, see: 10). As technology advances, clinical trials are increasingly moving towards validated computerized testing for sensitively capturing cognitive performance in large-scale secondary prevention cohorts. Comparing computerized batteries against current measures used in large-scale clinical trials is critical as the field moves towards these large-scale, population-based cognitive screening and assessments (11, 12). The Alzheimer’s Disease Cooperative Study Preclinical Alzheimer Cognitive Composite (PACC) (9, 13) is a composite of standard paper and pencil tests that are currently being used in a large-scale prevention trial (9). The PACC was originally designed as a multi-domain but memory-predominant cognitive composite that exhibited sensitivity to biomarker risk of AD in clinically-normal older adults (13).
It is unclear how computerized batteries perform in relation to one another against conventional paper-and-pencil composites, such as the PACC. Secondly, it is not clearly understood how these batteries may compare in their ability to classify subtle cognitive impairment as defined by poor performance on paper-and-pencil composites. Two computerized batteries that are of particular relevance to these questions are the Cogstate Computerized Cognitive Composite (C3) battery (1), which is currently being used in the Anti-Amyloid Treatment in Asymptomatic Alzheimer’s Disease (A4) secondary prevention trial (9), and the newly developed, non-proprietary iPad version of the National Institutes of Health (NIH) Toolbox Cognition Battery (NIHTB-CB) (for reference to the general computerized battery: 14, 15). The Cogstate C3 departs from the original Cogstate Brief Battery (7) as it includes the Face-Name Associative Memory Exam (FNAME), a challenging associative memory task found to be sensitive to neocortical amyloid burden in older adults (16), and the Behavioral Pattern Separation Task-Object (BPXT) (17), a pattern-separation memory task sensitive to treatment change in an MCI trial (17). The Cogstate Brief Battery is well-validated, and has been shown to capture AD-related cognitive changes in older adults (18), and those with MCI and AD (19). The desktop version of the NIHTB-CB has been validated against standard neuropsychological measures, and in a large and demographically diverse population ranging in age from 3 to 85 years (6, 14). The NIHTB-CB is intended to serve as a ‘common currency’ among longitudinal and epidemiological studies, however, it is yet to be tested in clinical trials or longitudinal observational studies of aging and dementia. Neither of these batteries are a direct replication or ‘digitization’ of conventional paper-and-pencil tests, but represent a novel approach to cognitive testing that that can be optimally translated to computerized technologies. As an example, Cogstate utilizes playing cards as a non-verbal assessment of working memory and processing speed that has wide cross-cultural applicability (18).
A critical component of early detection in preclinical Alzheimer’s disease is the ability of neuropsychological tests to identify evidence of subtle cognitive impairment (20). Defined as Stage 3, after abnormal levels of both amyloidosis and neurodegeneration are apparent, the appearance of subtle changes in cognitive performance heralds the final phases of preclinical AD prior to a diagnosis of MCI. Targeting clinically-normal older adults at risk of AD-related cognitive decline over short term follow-up will require sophisticated cognitive batteries that are sensitive to subtle change, but will also need to meet the requirements of large-scale clinical trials in clinically-normal cohorts for being deployable across large populations. Before computerized batteries can be utilized in these environments, these batteries must demonstrate validity for identifying preclinical levels of subtle cognitive impairment (21, 22).
The aims of this pilot cross-sectional study were three-fold. First, we developed aggregate cognitive composites for both computerized batteries to measure overall cognitive performance in relation to the paper and pencil PACC. We also aimed to compare each of these computerized batteries against performance on the PACC in clinically normal older adults. Finally, using the PACC to define subtle cognitive impairment, our objective was to determine the ability of each of the computerized batteries to distinguish subtle cognitive impairment from typical cognitive performance. Evidence that these batteries similarly identify subtle cognitive impairment would support the validity of these instruments for large-scale screening and cognitive outcome protocols for clinical trials.


Materials and Methods


Fifty clinically normal, community-dwelling, older adults (age range: 54-97 years) were recruited from volunteers interested in research studies at the Center for Alzheimer Research and Treatment at Brigham and Women’s Hospital and at the Massachusetts Alzheimer Disease Research Center at Massachusetts General Hospital. All subjects underwent informed consent procedures approved by the Partners Human Research Committee, the Institutional Review Board for Brigham and Women’s Hospital and Massachusetts General Hospital. No prior computer or iPad knowledge was required. Subjects were excluded if they had a history of alcoholism, drug abuse, head trauma or current serious medical or psychiatric illnesses. All subjects met the age requirement (above 50 years old), and scored within age-specified norms on the Telephone Interview of Cognitive Status (TICS; 23). We set a minimum age of 50 years, as longitudinal research studies and clinical trials are beginning to include younger ages in their cohorts (i.e. the Australian Imaging Biomarker and Lifestyle (AIBL) study of ageing, the Harvard Aging Brain Study (HABS), and the ante-amyloid (A3) clinical trial (11)).


In order to mimic a typical clinical trial setting, subjects participated in three-four clinic visits within a six-month time-frame, where they completed the PACC, the NIHTB-CB, and the Cogstate iPad C3 battery at the first, second and third visit, respectively. Each visit was separated from the next by approximately one week. The rationale for multiple clinic visits was to reduce cognitive fatigue when completing each neuropsychological battery. Both computerized batteries were performed from beginning to end in one visit. Participants made a fourth visit as part of a larger study that will not be covered in the current study. Participants were not extensively trained to use the iPad prior to testing, as the tests were overseen by an examiner according to a standardized administration (CM, KPS, MD). Instructions were given if the participant was having trouble making selections (pressing too hard or too long).


The PACC includes Logical Memory–delayed recall (LM-DR), the Free and Cued Selective Reminding Test (FCSRT) total score, the Mini Mental State Exam (MMSE) total score, and Wechsler Adult Intelligence Scale-Revised Digit Symbol Coding Test (DSC) total score (13). This composite includes measures of general cognition (MMSE) and speeded executive function (DSC), but is 50% composed of episodic memory tests (13). All tests were z-transformed using the mean and standard deviation of performance by clinically normal older adults (n=256, age range: 61-90) years) participating in the Harvard Aging Brain Study (24, 25). This population served as an ideal normative sample by which to classify our current pilot sample as individuals were recruited from the same geographic area and recruited through the same centers. To form the PACC, all z-transformed variables were averaged together, with a higher score indicating better performance.
The NIHTB-CB included the Flanker Inhibitory Control and Attention Test (Flanker), the Picture Sequence Memory Test (PSMT), the Picture Vocabulary Test (PVT), the Pattern Comparison Processing Speed Test (PCPST) and the Dimensional Change Card Sort Test (DCCS) (14). Two other NIHTB-CB measures, the List Sorting Working Memory test and the Oral Reading Recognition test, were not included in the current study as they required the use of an additional keyboard. The Flanker is a measure of cognitive control, where the participant is asked to attend to a stimulus that is flanked by four identical stimuli that are either positioned congruently or incongruently to the target. The participant must select the direction in which the target stimulus is pointing. The PSMT is a measure of episodic memory in which participants are shown a series of images and asked to re-create the image order over two trials. The PVT is a measure of receptive vocabulary; participants are orally presented a word and are asked to select from one of four images that is closest to the meaning of the word. The outcome measure for PVT was age-scaled and standardized. The PCPST is a measure of processing speed, where participants are asked to match an object with response items by either color or shape. The DCCS is a measure of set shifting, where a participant matches a target visual stimulus to one or two choice stimuli according to shape or color (14). The PSMT, PCPST, DCCS and Flanker tasks were all computed scores provided by NIHTB-CB. These computed scores reflect a theta score, which reflects an individual’s overall ability or performance, similar to a z-score.
The Cogstate C3 includes the FNAME and the Behavioral Pattern Separation-Object Task (BPXT), as well as the Detection Task (DET), the Identification Task (IDN), the One Card Learning Task (OCL) and the One-Back Task (ONB). The FNAME is an associative memory test that requires participants to associate (FNMT), and subsequently recall (FNLT), and recognize (FSBT) faces with corresponding names. The FNAME task measures frequency of correct responses. The BPXT assesses working and recognition memory; participants are iteratively presented with a series of repeated, novel and distractor images and are asked to categorize each into Old, Similar, or New. The outcome measure is frequency of correct responses. Additional tasks use playing cards as stimuli. The DET is a measure of reaction time and processing speed, where participants are asked to respond when a stimulus card is turned face up. The IDN is an attention paradigm in which a card is presented and the respondent must choose whether the card is red or is not red (black). The outcome measures for these two tasks were speed (sec:ms). The OCL task is a non-verbal memory task, which assesses short term recall of a set of repeated playing cards. OCL was measured using accuracy. The ONB task is a measure of working memory, where respondents are asked to serially match each card to the previous trial, and was also measured according to speed of response (18). These scores were not transformed, however, they were converted to z-scores.

Creating computerized battery composites

Our initial aim was to create cognitive composites from the computerized batteries in order to align with the PACC. Previous studies have created cognitive composites from the Cogstate Brief battery in older adults who were clinically-normal and patients with MCI and AD, so we investigated whether the Cogstate C3 could create similar composites. NIHTB-CB Crystallized Cognition Composites and Fluid Cognition Composites have been proposed in a previous study, however, these were created from a sample of children and required two extra tests that we did not include in our study. Global composite measures were created for each of the NIHTB-CB and Cogstate C3 batteries using principal component analysis (PCA), and Bartlett factor scores were extracted. These composites were created consistent with previous reports using the Cogstate Brief Battery (19) and the NIHTB-CB (15). PCA was used to reduce the NIHTB-CB and C3 into global composite scores. Using scree plots and eigenvalue cut-offs, we determined that the NIHTB-CB could be reduced to one composite, while the C3 exhibited a better fit with two composites. The NIHTB-CB composite accounted for 47% of the variance explained in the model, while the two C3 composites accounted for a total of 61% of the variance (with the first factor accounting for 32% variance). The first C3 factor, ‘Learning-Memory’, included the BPXT, FNMT, FNLT, FSBT and OCL. The second factor, ‘Processing Speed-Attention’, included the ONB, IDN and DET. A clustering model using a two-dimensional PCA, which compared the similarities of the tasks in two-dimensional space was also used to explore how tasks clustered together. The results, displayed in Figure 1, suggested that IDN, DET and ONB created a distinct cluster, while the remainder of the NIHTB-CB and C3 tasks formed a second, tight cluster (see Figure 1). Using composite scores arising from this data reduction approach, allowed us to pursue our main hypotheses, i.e., using the composite scores to differentiate high and low performance in relation to the PACC.

Figure 1. Visualization of clusters using PCA of NIHTB-CB with C3 tasks. Arrows indicate the loading coefficients of each variable of interest

Figure 1. Visualization of clusters using PCA of NIHTB-CB with C3 tasks. Arrows indicate the loading coefficients of each variable of interest

Statistical methodology

Analyses were conducted using IBM SPSS version 22.0, and R (version 3.3.0). Due to a smaller sample size, a series of non-parametric Spearman correlations were conducted to ascertain relationships between computerized battery composites and PACC performance. Performance on the PACC was dichotomized into normal and subtle cognitive impairment according to a cut-off of 0.5SD below the normative group mean, which was derived according to the Harvard Aging Brain Study cohort (an entirely separate cohort from the one used in the current study). These participants were not considered to meet criteria for a diagnostic classification of mild cognitive impairment, but demonstrated a very subtle cognitive decrement in comparison with their peers. This classification was chosen to align with Stage 3 Preclinical AD criteria, which states: “Evidence of subtle cognitive decline, that does not meet criteria for MCI or dementia” (20). Choosing a 0.5SD cut-off allowed us to define subtly poorer performance, while maintaining samples with enough power for analytical purposes. This sits in contrast to diagnostic classifications for MCI which typically note performance below 1-1.5SD age-adjusted norms (26). Three logistic regression analyses were performed to determine how well the NIHTB-CB and C3 composites could detect the subtle cognitive impairment group. Although age and education levels were not found to be significantly different between typical PACC performers and those with subtle cognitive impairment, we ran analyses with these covariates included in order to portray our results within the context of age and education-adjustment. Receiver Operating Characteristic (ROC) curves were computed to determine sensitivity, specificity and area under the curve (AUC) parameters of each composite to classify normal and subtle cognitive impairment. Post-hoc analyses were run to ascertain which tests within the best performing composites were driving better classification outcomes.



Participant Characteristics

No demographic differences were found between those who were classified as subtle cognitive impairment or exhibiting typical performance according to the PACC (see Table 1). However, a marginally greater number of non-Caucasian individuals (n = 9) were found to be classified as subtly impaired on the PACC in comparison with Caucasians (n = 4), χ2(1) = 3.91, p = .10, however this difference was not statistically significant. The non-Caucasian group was not found to be significantly different from the Caucasian group on any demographics, although, there was a trend for lower education levels (ranging from 12-20 years), χ2(4) = 8.13, p = .09.

Table 1. Participant characteristics and cognitive performance

Table 1. Participant characteristics and cognitive performance

Note: Subtle cognitive impairment is PACC performance below 0.5SD *Cell sizes are too small to count

Associations between computerized batteries and PACC performance

The NIHTB-CB and C3 Learning-Memory were both associated with the PACC (ρNIHTB-CB(47) = 0.49, p < .001 and ρC3 Learning-Memory(47) = 0.58, p <.001). There was no significant relationship found between the PACC and C3 Processing Speed-Attention, ρ(47) = -0.18, p =.22.

Ability of computerized tasks to distinguish subtle cognitive impairment according to the PACC

Logistic regression analyses showed that the NIHTB-CB and Cogstate C3 Learning-Memory models were significantly able to distinguish subtle cognitive impairment from typical PACC performance, and explained 9% and 49% of variance in their respective models (χ2NIHTB-CB(42) = 48.22, p = .04 and χ2COGSTATE-C3(42) = 23.61, p < .001; see Table 2 for all three model fits and estimates). Greater NIHTB-CB performance related to better classification of those with subtle cognitive impairment, however, this finding did not survive multiple comparisons (B (SE) = 0.79 (0.4), p = .05). Better Learning-Memory performance significantly increased the chance of being classified with typical (i.e., better) PACC performance (B (SE) = 3.71 (1.2), p = .003). Our findings showed the same pattern of results with or without age and education included in the models (see Table 2).


Table 2. Regression and ROC analyses with each computerized composite to predict subtle cognitive impairment or typical performance on the PACC

Table 2. Regression and ROC analyses with each computerized composite to predict subtle cognitive impairment or typical performance on the PACC

Note: The large confidence intervals in this analysis are driven largely by the sample size, and so the OR should be interpreted with caution

ROC curves showed that performance on the C3 Learning-Memory composite accounted for the largest AUC (92%), and exhibited the greatest sensitivity (61%) and specificity (80%) indices for classifying subtle cognitive impairment (see Table 2 for all sensitivity and specificity parameters). Figure 3 depicts a scatterplot between the NIHTB-CB and Cogstate C3 (by averaging performance on both C3 composites) according to the PACC groups. Scores sitting in the top right-hand quadrant depict high performance on both computerized batteries; all but one of these scores included individuals with typical PACC performance, illustrating high specificity.
As the C3 Learning-Memory composite exhibited the highest odds ratio and ROC parameters, we ran a post-hoc logistic regression to determine which measures within the Learning-Memory composite (FNMT, FNLT, FSBT, BPXT and OCL) were driving these results. Better performance on the Face Name Letter Task, a measure of delayed free recall, was the only measure within the Learning-Memory composite found to significantly increase the likelihood of typical PACC performance (B (SE) = 5.6 (3.1), p = .05). As a comparison, we also conducted a post-hoc analysis with the NIHTB PSMT task, a free recall memory task, and found that better PSMT performance significantly predicted typical PACC performance, (OR = 3.3, p = .04, CI95%: 1.3-12.5). Neither the FNLT task nor the NIHTB PSMT task were better able to classify subtle cognitive impairment in comparison with the full composite measures, with AUC, sensitivity and specificity parameters comparable to their counterparts (see Table 2 and Figure 2).

Figure 2. ROCs for the NIHTB-CB and Cogstate C3 composites, and the C3 FNLT task alone to distinguish between high and low PACC performance (Blue = C3 Learning-Memory, Red = NIHTB-CB, Green = C3 Processing Speed-Attention, Grey = C3 FNLT, Black-dash = NIH PSMT)

Figure 2. ROCs for the NIHTB-CB and Cogstate C3 composites, and the C3 FNLT task alone to distinguish between high and low PACC performance (Blue = C3 Learning-Memory, Red = NIHTB-CB, Green = C3 Processing Speed-Attention, Grey = C3 FNLT, Black-dash = NIH PSMT)


Figure 3. Scatterplot of association between NIHTB-CB and Cogstate C3 battery, with slopes estimating group effect of high and low PACC performance

Figure 3. Scatterplot of association between NIHTB-CB and Cogstate C3 battery, with slopes estimating group effect of high and low PACC performance


This pilot study in normal older adults sought to directly compare performance on computerized batteries, the NIHTB-CB and Cogstate C3 batteries, to the PACC, a clinical trial outcome measure composed of conventional paper and pencil cognitive tasks. The Learning-Memory composite from the Cogstate C3 battery was able to distinguish between normal PACC performance and subtle cognitive impairment (see Fig 4 for a diagrammatic representation of findings). The composite also showed particularly high specificity and AUCs for correctly classifying normal individuals. These findings were found to be primarily driven by the delayed free recall index from the Face-Name task that was featured within the composite. By contrast, the NIHTB-CB yielded a moderate level of specificity, with a sensitivity at chance level, while the C3 Processing Speed-Attention composite was poor on both parameters. We did find, however, that the NIHTB-CB showed a comparable level of correlation with the PACC as was found with C3 Learning-Memory. By contrast, the C3 Processing Speed-Attention composite did not show any affinity with the PACC. This supports other findings suggesting that processing speed and attention domains are less sensitive to AD-related change very early in the trajectory (18), and perhaps are more sensitive to age-related etiologies (27). These results most likely reflect the nuanced differences in ‘intended purpose’ for the NIHTB-CB and Cogstate (C3) batteries. The NIHTB-CB has been proposed as a well-validated measure that can be utilized in a broad range of age-groups and education levels (14), while the Cogstate C3 is a battery primarily intended for clinical trials, and which has been shown to be sensitive to AD-related cognitive change (28).

Figure 4. Diagrammatic representation of each composite arising from the Cogstate C3 and NIHTH-CB computerized batteries, and their corresponding tests. Each composite is also attached to an odds ratio (OR) which represents the ability of each composite to distinguish between typical and subtly impaired PACC performance. The pink boxes denote the tasks that were most contributory to the variance explained in the logistic regression model

Figure 4. Diagrammatic representation of each composite arising from the Cogstate C3 and NIHTH-CB computerized batteries, and their corresponding tests. Each composite is also attached to an odds ratio (OR) which represents the ability of each composite to distinguish between typical and subtly impaired PACC performance. The pink boxes denote the tasks that were most contributory to the variance explained in the logistic regression model

One strength of the NIHTB-CB in this study was that it formed a clear singular composite, and displayed largely unified within-battery alignment as suggested by clustering methods. The NIHTB-CB has shown strong convergent validity with other standard neuropsychological paper-and-pencil tests along the broad developmental trajectory (14), and was originally designed to complement measures used in research studies of cognition or to serve as a brief adjunct measure in longitudinal and epidemiologic studies (14, 29). It was not, however, specifically developed as an early diagnostic tool for AD-related cognitive impairment or as a target for disease outcomes. The NIHTB composite was able to identify subtle cognitive impairment, particularly using the NIHTB memory task. This supports the notion that the NIHTB-CB is a suitable measure of cognitive performance in clinically-normal older adults. Sensitivity for classification of subtle cognitive impairment was not as high in comparison with the Cogstate C3 Learning-Memory composite. An additional advantage of the NIHTB-CB battery is that it includes a measure of IQ, which is not covered by the C3. As such, this battery has the unique potential to efficiently measure cognitive reserve outcomes, and may well have the ability to inform an individuals’ likely compensatory duration for increasing pathology over time. Our findings highlight the different possible utilities of these computerized batteries within the context of secondary prevention clinical trials. It is possible that the NIHTB-CB will be more sensitive to early longitudinal cognitive decline, however, the current pilot study is unable to investigate this question.
Within the Cogstate C3 battery, two distinct composites were extracted, similar to previous studies (18, 19, 30), supporting the notion that the Cogstate Battery was intended to measure distinct cognitive domains. The C3 Learning-Memory composite, however, showed an association with PACC performance, and an ability to classify subtle cognitive impairment. The Cogstate Brief battery has been shown to reliably highlight increasing magnitude of impairment in MCI and AD diagnostic groups, and that computerized performance tracks well with performance on conventional tests (18, 28). Our findings suggest that the FNAME component of the Cogstate C3 battery may be of particular interest for clinical trials of preclinical AD. Although evidence of subtle cognitive impairment was defined in our study, it is not solely an indication of stage 3 preclinical AD as we do not have indications of AD biomarker status. Furthermore, exhibiting subtle cognitive impairment does not by itself indicate progressive cognitive decline. As such, sensitivity to the classification of subtle cognitive impairment will need to be more fully determined by larger, longitudinal investigations. In addition, validation studies will be required in comparison populations of MCI and AD dementia. It may be that the ADAS-Cog and screening tools such as the MMSE are sufficient for clinical populations, but that more challenging neuropsychological tasks included in computerized batteries are more relevant for large-scale clinical trials of clinically-normal individuals. Our findings further suggest that not all C3 tasks have the ability to identify subtle cognitive decline, and as such, may not be necessary for inclusion in large-scale screening procedures for preclinical AD trials.
We found that the driving predictor of sensitivity to subtle cognitive impairment in the current study was the delayed free recall index from the C3 FNAME task. The Cogstate C3 departs from the Cogstate Brief Battery in that it includes the FNAME (1), which has been shown to be sensitive to amyloid-ß deposition (12). The addition of the FNAME measures in the C3 battery may have increased the ability of the C3 to capture variation in PACC performance, which is is the current standard for clinical trials (1). As the PACC is a composite that is more heavily weighted towards memory (by including two memory measures), and is honed to detect amyloid-related change (13), it is not surprising that memory components of the C3 battery are able to classify subtle impairment on the conventional composite. In the current study, delayed free recall from the FNLT was found to drive the group classification, which provides support for the recommendation that the FNAME be included in the Cogstate Brief Battery for longitudinal studies of memory in preclinical AD. Although it was a significant component of the composite to classify group performance, neither the C3 FNLT task nor the NIHTB PSMT task performed significantly better than their composite counterparts. While parsimonious neuropsychological batteries are advantageous, we currently recommend that full Cogstate Learning-Memory or NIHTB-CB batteries are performed.
The current study is a pilot study of clinically normal older adults, and as such we were limited to studying the classification of subtle cognitive impairment as defined by the PACC. Although, the sample size is small, the strength of this study is that it covers a broad range of older ages and maximizes the racial diversity of subjects. As no major demographic differences were present in typical and subtle cognitive impairment PACC performers, we did not covary for race in our analyses, although we acknowledge that more sophisticated examinations of diversity-related cognitive profiles should be conducted in larger samples (31). In addition, we did not acquire AD biomarkers, and cannot conclude on the extent to which these tests measure biological markers of interest. In the future, we plan to include the NIHTB-CB and C3 in a larger cohort of clinically normal older adults who have undergone AD biomarkers and intend to follow the performance of these individuals over time. In addition, it will be important to counterbalance for battery administration, and assess in-home compared to in-clinic testing performance. The trend is moving towards large-scale online cognitive testing, as evidenced by registries that include online testing such as the Brain Health Registry (32) and the UKBioBank (33). Determining test-retest reliability between at-home and in-clinic testing will be vital. Large secondary prevention trials that require access to trial-ready cohorts who are identified based on cognitive performance are needed. Computerized on-line testing, that is well validated, will make this feasible. We believe that both iPad batteries presented in this study, show promise as valid cognitive assessments in the clinical trial setting. However, more work will be needed before they can be effectively utilized as on-line cognitive tests for large-scale prevention trials.


Funding: Neurotrack Technologies funded this study. Rachel F. Buckley is funded by the NHMRC/ARC Dementia Research Fellowship (APP1105576). Reisa A. Sperling has served as a paid consultant for Abbvie, Biogen, Bracket, Genentech, Lundbeck, Merck, Otsuka, Roche, and Sanofi. She has served as a co-investigator for Avid, Eli Lilly, and Janssen Alzheimer Immunotherapy clinical trials. She has spoken at symposia sponsored by Eli Lilly, Biogen, and Janssen Alzheimer Immunotherapy. Dorene M. Rentz has served as a paid consultant for Eli Lilly, Lundbeck Pharmaceuticals and Biogen Idec. She also serves on the Scientific Advisory Board for Neurotrack. Kathryn V. Papp has served as a paid consultant for Biogen Idec. These relationships are not related to the content in the manuscript.

Acknowledgements: We would like to thank Drs. Sandy Weintraub, Jerry Slotkin, and Paul Maruff for their invaluable comments and input to the development of this manuscript.

Ethical standards: The Partners Human Research Committee approved this study. All subjects underwent informed consent.



1.    Rentz DM, Dekhtyar M, Sherman J, Burnham S, Blacker D, Aghjayan SL, Papp KV, Amariglio RE, Schembri A, Chenhall T. The Feasibility of At-Home iPad Cognitive Testing For Use in Clinical Trials. JPAD 2016;3, 8-12.
2.    Wild K, Howieson D, Webbe F, Seelye A, Kaye J. Status of computerized cognitive testing in aging: a systematic review. Alzheimer’s & Dementia 2008;4, 428-437.
3.    Sano M, Egelko S, Ferris S, Kaye J, Hayes TL, Mundt JC, Donohue M, Walter S, Sun S, Sauceda-Cerda L. Pilot Study to Show the Feasibility of a Multicenter Trial of Home-based Assessment of People Over 75 Years Old. Alzheimer disease and associated disorders 2010;24, 256-263.
4.    Fredrickson J, Maruff P, Woodward M, Moore L, Fredrickson A, Sach J, Darby D. Evaluation of the usability of a brief computerized cognitive screening test in older people for epidemiological studies. Neuroepidemiology 2009;34, 65-75.
5.    Steinberg SI, Negash S, Sammel MD, Bogner H, Harel BT, Livney MG, McCoubrey H, Wolk DA, Kling MA, Arnold SE. Subjective Memory Complaints, Cognitive Performance, and Psychological Factors in Healthy Older Adults. American Journal of Alzheimer’s Disease & Other Dementias 2013;28, 776-783.
6.    Weintraub S, Dikmen SS, Heaton RK, Tulsky DS, Zelazo PD, Slotkin J, Carlozzi NE, Bauer PJ, Wallner-Allen K, Fox N. The cognition battery of the NIH toolbox for assessment of neurological and behavioral function: Validation in an adult sample. JINS 2014;20, 567-578.
7.    Maruff P, Thomas E, Cysique L, Brew B, Collie A, Snyder P, Pietrzak RH. Validity of the CogState brief battery: relationship to standardized tests and sensitivity to cognitive impairment in mild traumatic brain injury, schizophrenia, and AIDS dementia complex. Archives of Clinical Neuropsychology 2009;24, 165-178.
8.    Hammers D, Spurgeon E, Ryan K, Persad C, Barbas N, Heidebrink J, Darby D, Giordani B. Validity of a brief computerized cognitive screening test in dementia. Journal of geriatric psychiatry and neurology 2012;25, 89-99.
9.    Sperling RA, Rentz DM, Johnson KA, Karlawish J, Donohue M, Salmon DP, Aisen P. The A4 Study: Stopping AD Before Symptoms Begin? Science Translational Medicine 2014;6, 228fs213.
10.    Bauer RM, Iverson GL, Cernich AN, Binder LM, Ruff RM, Naugle RI. Computerized Neuropsychological Assessment Devices: Joint Position Paper of the American Academy of Clinical Neuropsychology and the National Academy of Neuropsychology(). Archives of Clinical Neuropsychology 2012;27, 362-373.
11.    Sperling RA, Mormino E, Johnson K. The Evolution of Preclinical Alzheimer’s Disease: Implications for Prevention Trials. Neuron 2014;84, 608-622.
12.    Rentz DM, Parra Rodriguez MA, Amariglio R, Stern Y, Sperling RA, Ferris S.  Promising developments in neuropsychological approaches for the detection of preclinical Alzheimer’s disease: a selective review. Alzheimer’s Research & Therapy 2013;5, 58.
13.    Donohue MC, Sperling RA, Salmon DP, et al. The preclinical alzheimer cognitive composite: Measuring amyloid-related decline. JAMA Neurology 2014;71, 961-970.
14.    Weintraub S, Dikmen SS, Heaton RK, Tulsky DS, Zelazo PD, Bauer PJ, Carlozzi NE, Slotkin J, Blitz D, Wallner-Allen K. Cognition assessment using the NIH Toolbox. Neurology 2013;80, S54-S64.
15.    Akshoomoff N, Beaumont JL, Bauer PJ, Dikmen SS, Gershon RC, Mungas D, Slotkin J, Tulsky D, Weintraub S, Zelazo PD, Heaton RK. NIH Toolbox cognition battery (CB): Composite scores of crystallized, fluid, and overall cognition. Monographs of the Society for Research in Child Development 2013;78, 119-132.
16.    Rentz DM, Amariglio RE, Becker JA, Frey M, Olson LE, Frishe K, Carmasin J, Maye JE, Johnson KA, Sperling RA. Face-name associative memory performance is related to amyloid burden in normal elderly. Neuropsychologia 2011;49, 2776-2783.
17.    Stark SM, Yassa MA, Lacy JW, Stark CE. A task to assess behavioral pattern separation (BPS) in humans: Data from healthy aging and mild cognitive impairment. Neuropsychologia 2013;51, 2442-2449.
18.    Lim YY, Ellis KA, Harrington K, Ames D, Martins RN, Masters CL, Rowe C, Savage G, Szoeke C, Darby D. Use of the CogState Brief Battery in the assessment of Alzheimer’s disease related cognitive impairment in the Australian Imaging, Biomarkers and Lifestyle (AIBL) study. Journal of Clinical and Experimental Neuropsychology 2012;34, 345-358.
19.    Maruff P, Lim YY, Darby D, Ellis KA, Pietrzak RH, Snyder PJ, Bush AI, Szoeke C, Schembri A, Ames D. Clinical utility of the cogstate brief battery in identifying cognitive impairment in mild cognitive impairment and Alzheimer’s disease. BMC Psychology 2013;1, 30.
20.    Sperling RA, Beckett L, Bennett D, Craft S, Fagan A, Kaye J, Montine T, Park D, Reiman E, Siemers E, Stern Y, Yaffe K. Criteria for preclinical Alzheimer’s disease. Alzheimer’s Association, 2010.
21.    Carrillo MC, Vellas B. New and different approaches needed for the design and execution of Alzheimer’s clinical trials. Alzheimer’s & dementia: the journal of the Alzheimer’s Association 2013;9, 436-437.
22.    Reiman EM, Langbaum J, Fleisher AS, Caselli RJ, Chen K, Ayutyanont N, Quiroz YT, Kosik KS, Lopera F, Tariot PN. Alzheimer’s Prevention Initiative: a plan to accelerate the evaluation of presymptomatic treatments. Journal of Alzheimer’s Disease 2011;26, 321-329.
23.    Knopman DS, Roberts RO, Geda YE, Pankratz VS, Christianson TJ, Petersen RC, Rocca WA. Validation of the telephone interview for cognitive status-modified in subjects with normal cognition, mild cognitive impairment, or dementia. Neuroepidemiology 2010;34, 34-42.
24.    Dagley A, LaPoint M, Huijbers W, Hedden T, McLaren DG, Chatwal JP, Papp KV, Amariglio RE, Blacker D, Rentz DM. Harvard aging brain study: Dataset and accessibility. In press, 2015;S1053-8119.
25.    Hedden T, Mormino EC, Amariglio RE, Younger AP, Schultz AP, Becker JA, Buckner RL, Johnson KA, Sperling RA, Rentz DM. Cognitive profile of amyloid burden and white matter hyperintensities in cognitively normal older adults. The Journal of Neuroscience 2012;32, 16233-16242.
26.    Albert MS, DeKosky ST, Dickson D, Dubois B, Feldman HH, Fox NC, Gamst A, Holtzman DM, Jagust WJ, Petersen RC. The diagnosis of mild cognitive impairment due to Alzheimer’s disease: Recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimer’s & Dementia 2011;7, 270-279.
27.    Hedden T, Gabrieli JDE. Insights into the ageing mind: a view from cognitive neuroscience. Nat Rev Neurosci 2004;5, 87-96.
28.    Lim YY, Ellis KA, Harrington K, Pietrzak RH, Gale J, Ames D, Bush AI, Darby D, Martins RN, Masters CL, Rowe CC, Savage G, Szoeke C, Villemagne VL, Maruff P.  Cognitive Decline in Adults with Amnestic Mild Cognitive Impairment and High Amyloid-β: Prodromal Alzheimer’s Disease? J Alz Dis 2013;33, 1167-1176.
29.    Gershon RC, Wagster MV, Hendrie HC, Fox NA, Cook KF, Nowinski CJ. NIH toolbox for assessment of neurological and behavioral function. Neurol 2013;80, S2-S6.
30.    Lim YY, Ellis KA, Ames D, Darby D, Harrington K, Martins RN, Masters CL, Rowe CC, Savage G, Szoeke C, Villemagne VL, Maruff P. Aβ amyloid, cognition, and APOE genotype in healthy older adults. Alzheimer Dem 2013;9, 538-545.
31.    Manly JJ, Schupf N, Tang M-X, Stern Y.  Cognitive decline and literacy among ethnically diverse elders. Journal of Geriatric Psychiatry and Neurology 2005;18, 213-217.
32.    Nosheny RL, Flennkiken D, Insel PS, Finley S, Mackin S, Camacho M, Truran-Sacrey D, Maruff P, Weiner MW (2015) Internet-based recruitment of subjects for prodromal and secondary prevention Alzheimer’s disease trials using the brain health registry. Alzheimer’s & Dementia: The Journal of the Alzheimer’s Association 2015;11, P156.
33.    Matthews PM, Sudlow C. The UK Biobank. Brain 2015;138, 3463-3465.