jpad journal

AND option

OR option

POLYGENIC RISK SCORE REVEALS GENETIC HETEROGENEITY OF ALZHEIMER’S DISEASE BETWEEN THE CHINESE AND EUROPEAN POPULATIONS

 

F. Li1, S. Xie1, J. Cui1, Y. Li1, T. Li1, Y. Wang1, J. Jia1,2,3,4,5

 

1. Innovation Center for Neurological Disorders and Department of Neurology, Xuanwu Hospital, Capital Medical University, National Clinical Research Center for Geriatric Diseases, Beijing P.R. China; 2. Beijing Key Laboratory of Geriatric Cognitive Disorders, Beijing P.R. China; 3. Clinical Center for Neurodegenerative Disease and Memory Impairment, Capital Medical University, Beijing P.R. China; 4. Center of Alzheimer’s Disease, Beijing Institute of Brain Disorders, Collaborative Innovation Center for Brain Disorders, Capital Medical University, Beijing P.R. China; 5. Key Laboratory of Neurodegenerative Diseases, Ministry of Education, Beijing P.R. China;

Corresponding Author: Jianping Jia, MD, PhD, Professor of Neurology, Innovation Center for Neurological Disorders and Department of Neurology, Xuanwu Hospital, Capital Medical University, National Clinical Research Center for Geriatric Diseases, Changchun Street 45, Xicheng District, Beijing, China, 100053, Tel: +86-10-83199449, e-mail: jiajp@vip.126.com.

J Prev Alz Dis 2024;
Published online January 30, 2024, http://dx.doi.org/10.14283/jpad.2024.29

 


Abstract

BACKGROUND: The polygenic risk score (PRS) aggregates the effects of numerous genetic variants associated with a condition across the human genome and may help to predict late-onset Alzheimer’s disease (LOAD). Most of the current PRS studies on Alzheimer’s disease (AD) have been conducted in Caucasian ancestry populations, while it is less studied in Chinese.
OBJECTIVE: To establish and examine the validity of Chinese PRS, and explore its racial heterogeneity.
DESIGN: We constructed a PRS using both discovery (N = 2012) and independent validation samples (N = 1008) from Chinese population. The associations between PRS and age at onset of LOAD or cerebrospinal fluid (CSF) biomarkers were assessed. We also replicated the PRS in an independent replication cohort with CSF data and constructed an alternative PRS using European weights.
SETTING: Multi-center genetics study.
PARTICIPANTS: A total of 3020 subjects were included in the study.
MEASUREMENTS: PRS was calculated using genome-wide association studies data and evaluated the performance alone (PRSnoAPOE) and with other predictors (full model: LOAD ~ PRSnoAPOE + APOE+ sex + age) by measuring the area under the receiver operating curve (AUC).
RESULTS: PRS of the full model achieved the highest AUC of 84.0% (95% CI = 81.4-86.5) with pT< 0.5, compared with the model containing APOE alone (61.0%). The AUC of PRS with pT< 5e-8 was 77.8% in the PRSnoAPOE model, 81.5% in the full model, and only ranged from 67.5% to 75.1% in the PRS with the European weights model. A higher PRS was significantly associated with an earlier age at onset (P <0.001). The PRS also performed well in the replication cohort of the full model (AUC=83.1%, 95% CI = 74.3-92.0). The CSF biomarkers of Aβ42 and the ratio of Aβ42/Aβ40 were significantly inversely associated with the PRS, while p-Tau181 showed a positive association.
CONCLUSIONS: This finding suggests that PRS reveal genetic heterogeneity and higher prediction accuracy of the PRS for AD can be achieved using a base dataset and validation within the same ethnicity. The effective PRS model has the clinical potential to predict individuals at risk of developing LOAD at a given age and with abnormal levels of CSF biomarkers in the Chinese population.

Key words: Genetic, polygenic risk score, late-onset Alzheimer’s disease, prediction, biomarker.


 

Introduction

Alzheimer’s disease (AD) and other forms of dementia are ranked as the seventh leading cause of death globally (1). In China, approximately 57 million people live with dementia, and AD accounts for approximately 65% of the cases (2). Clinically, late-onset AD (LOAD) accounts for the vast majority (90%–95%) of AD cases (3). LOAD has a high genetic component, primarily attributed to multiple low penetrance genetic variants (4), with an estimated heritability of 56%–79% (5). Genome-wide association studies (GWAS) have identified over 50 risk loci of genome-wide significance associated with AD (6-9), providing insights into its pathogenesis and aiding in early prediction.
Polygenic risk scores (PRS) can be generated using the risk alleles and effect sizes obtained from GWAS data (10). Since it aggregates the effects of numerous genetic variants associated with a condition across the human genome, PRS can be utilized to predict the risk of certain polygenic diseases, such as LOAD. PRS has been successfully used in predicting breast cancer (11), type 2 diabetes mellitus (11), and psychiatric diseases (10). In AD, PRS has shown associations with the age at onset of LOAD (12) and cerebrospinal fluid biomarkers of AD (13). PRS has also been employed to differentiate AD patients from cognitively normal individuals, predict cognitive decline in at-risk individuals and forecast progression from mild cognitive impairment to AD (14-16).
However, the current PRS models cannot be universally applied due to their predominantly European population-based studies (17). Presently, a Chinese-specific PRS model derived solely from Chinese GWAS data is lacking, as such studies on AD are scarce. Consequently, European GWAS summary data are often used to develop PRS models for the Chinese population (13, 18, 19), potentially yielding erroneous results. It is important to note that there is a genetic heterogeneity of AD between the Chinese and European populations, which may affect the predictive value of PRS models (20). While genes such as TOMM40 (21), TREM2 (22) and PSENs/APP (23) are shared susceptibility genes, different polymorphisms exist, and certain rare genetic variants in genes such as KCNJ15 and GCH1 have only been identified in Chinese individuals (24). Even the well-defined genetic risk factor for AD, apolipoprotein E4 (APOE4), is less prevalent in Chinese than Europeans (25). Therefore, developing a Chinese-specific PRS model is essential.
In this study, we aimed to construct a PRS model using single nucleotide polymorphisms (SNPs) and their effect sizes derived from a subset of the previous Chinese GWAS dataset (N = 2012). We assessed the model’s predictive accuracy and examined the influence of APOE, age, and sex on its effects. Additionally, we replicated the PRS in an independent cohort with cerebrospinal fluid data and constructed a model using European weights to detect genetic heterogeneity. The original model is expected to promote the clinical application of PRS in the Chinese population.

 

Materials and Methods

Study populations

The individuals used in the training and validation stages were independent of each other and were recruited from the outpatient memory clinic of the Department of Neurology at Xuanwu Hospital, Capital Medical University, Beijing, China, and ten other participating hospitals across China from 2013 to 2018. These individuals were partly derived from a previous Chinese case-control GWAS study (25).
AD diagnoses were determined according to the recommendations set using the National Institute on Aging-Alzheimer’s Association workgroup (26), National Institute of Neurological and Communicative Disorders, and Stroke-Alzheimer’s Disease and Related Disorders Association criteria (27). The study included individuals diagnosed with AD, with an age at onset of ≥ 60 years and no family history of dementia. Controls were recruited from the aforementioned hospitals and were ≥ 60 years of age, without a family history of dementia, cognitively normal (without subjective memory complaints, Mini-Mental State Examination score of 26–30, and clinical dementia rating scale score of 0), and free from any evidence of diseases that could affect cognition. The study was approved by the Ethical Committees of Xuanwu Hospital, Capital Medical University, and written informed consent was obtained from all participants.

Training and validation samples

The training samples (N=2012) consisted of 1008 patients with AD and 1004 cognitively normal individuals from Xuanwu Hospital. The validation samples (N=1006) were recruited from other participating hospitals and included 505 patients with AD and 501 cognitively normal individuals. For the PRS calculation in this study, summary statistics from the training samples (N =2012) were used to generate genetic scores for the validation samples as the weighted sum of the risk alleles (Figure S1).

Quality control

Genome-wide genotyping of training and validation samples was performed using Illumina HumanOmniZhongHua-8 Bead Chips (Illumina, San Diego, CA, USA) with original SNP number of 894517. Genotype data underwent standard quality control measures using PLINK (version 1.9) (28). SNPs were excluded if their minor allele frequency was ≤1%, Hardy-Weinberg equilibrium p-value was <1e-4, and if >2% of genotype data was missing. Non-autosomal variants were excluded from statistical analysis. The genotypes of 759,596 SNPs in training samples and 758,508 in validation samples were passed the quality control processing. Among these, there were 756,180 SNPs both in two datasets and 755,537 SNPs remained after removing the SNPs in APOE region (chromosome 19: 44.4-46.5 Mb). Detailed information on genotyping and quality control processes can be found in the Supplementary Materials.

Primary PRS calculation

The clumping and thresholding (C+T) approach, commonly used for PRS calculation, was employed, where markers most strongly associated with the disease were retained (29). PRSs were generated using the PLINK genetic data analysis toolset for various p-value thresholds (pTs) of 5e-8, 1e-5, 1e-3, 0.1, and 0.5 on linkage disequilibrium clumped SNPs. SNP variants with stricter r2 > 0.001 in a 1000-kb window were excluded, retaining only the SNP with the smallest p-value. Additional PRSs were computed with r2 thresholds of 0.01 and 0.1 for all p-value thresholds. The same set of SNPs was used for both the training and validation samples. PRSnoAPOE was calculated by excluding SNPs in the APOE region (chromosome 19: 44.4-46.5 Mb) due to the high linkage disequilibrium in this region. The PRS was derived using the following formula:
PRS = β1x1 = β2x2 + … + βkxk ,

Where β1 is the per allele log OR and x1 is the allele dosage for i independent SNP. The PRS was inter-cohort standardized to have a mean of 0 and a standard deviation (SD) of 1. After calculating the PRS for each participant, we performed a logistic regression analysis adjusting for age, sex, APOE ε4 carrier status, and the five principal components (PCs) to detect the association between PRS and AD.

Evaluation of PRS performance

The area under the curve (AUC) of the receiver operating characteristic curve (ROC) was estimated to evaluate the PRS’s accuracy in the validation dataset and to evaluate the discriminatory power of a predictive model, with values ranging from 50% to 100%, where 50% is random classification, and 100% is perfect classification. The PRSs were also fitted as a continuous variable in logistic regression models to assess the association between the PRS and LOAD.
We tested four main statistical models:
Model: LOAD~APOE
Model A: LOAD~ PRSnoAPOE
Model B: LOAD~ PRSnoAPOE+APOE
Model C: LOAD~ PRSnoAPOE+APOE+sex+age
Here, we used the number of APOE ε4 as an independent covariate when modeling the LOAD prediction performance. APOE ε4 was defined into to two (ε4/ε4), one (ε2/ε4 and ε3/ε4) and zero (ε3/ε3, ε2/ε3, and ε2/ε2). The APOE haploid genotypes derived from rs7412 and rs429358 for samples in both stages of the study were determined using the Sanger sequencing method.

PRS-CSx calculation method

Furthermore, we utilized the PRS-CSx method to improve the cross-population polygenic prediction by integrating GWAS summary statistics from multiple populations to re-calculate PRSs in our validation dataset (30). The reference panels of 1k data from East Asian populations were used. The summary statistics and population sample sizes are the same as those used in the main PRS calculation above. The summary statistics were taken from the training samples (N=2012) and the sample sizes of the validation samples were 1006. The phi parameter of 1e-2 was used. These scores were also standardized after calculation.

Association between PRS and age at onset

Moreover, participants were divided into tertiles based on the ordered distribution of PRS with pT ≤ 5e-8 for Model A, creating three groups (high-PRS, intermediate-PRS, and low-PRS) with each containing a third of the population. We assessed the associations of the PRS with age at onset (AAO) and the cumulative incidence rate of LOAD using Log-Rank test.

Replication of PRS in one independent CSF cohort

We constructed the PRS using an in-house script, prioritizing the SNP list with pT ≤ 5e-8 and tested its association with LOAD in an independent Chinese cohort (N = 80). This small cohort with cerebrospinal fluid (CSF) data, was obtained from the China Cognition and Aging study. The dataset comprised 40 controls (50.0%) and 52 women (65.0%), with mean (±SD) ages of 71.43±6.98 and 70.98±7.18 for patients and controls, respectively.
The CSF biomarker data, including Aβ42, Aβ40, t-Tau, and p-Tau181, were determined using the MILLIPLEX® MAP Human Amyloid Beta and Tau Magnetic Bead Panel-Multiplex Assay (Merck Millipore, Darmstadt, Germany). All measurements were performed by an experienced laboratory technician who was blinded to the clinical information, using a batch of reagents in a single analysis round. Our laboratory technician has extensive experience in examining CSF biomarkers. Spearman correlation analyses were conducted to assess the relationships between the PRS and CSF biomarkers. Additionally, linear regression models were used, with the PRS as the independent variable and the CSF biomarkers, age, sex, and APOE genotype as covariates (Model: PRS ~ CSF biomarkers + age + sex + APOE, the CSF biomarkers were Aβ42, Aβ42/Aβ40, t-Tau, and p-Tau181, respectively). Corrections for multiple comparisons were performed using the false discovery rate method.

PRS performance in European populations

To determine racial heterogeneity in the genetic risk contributors between the Chinese and European populations, we calculated the alternative PRSs using Chinese validation sample data with European weights. The largest late-onset AD GWAS dataset (86,531 cases and 676,386 controls) from a large-scale meta-analysis excluding 23andMe (referred to as the PGCALZ dataset, available at https://ctg.cncr.nl/software/summary_statistics) (8) was used as the training dataset to test the Chinese validation data. The PRS with European weights model was constructed using the same variants as the Chinese PRS with pT of 5e-8. Variants not found in the PGCALZ dataset were replaced with the highest LD-based SNP.
We also built another PRS-PGCALZ model to explore the racial heterogeneity using multiple p-value thresholds (5e-8, 1e-5, 1e-3, 0.1, and 0.5) and linkage disequilibrium clumping (r2 thresholds of 0.001, 0.01, and 0.1) in the PGCALZ summary dataset. We tested these PRSs in the Chinese population.

Statistical analysis

We performed chi-square tests and t-tests to determine differences in population demographics. The association of the PRS with AD risk was tested using logistic regression, with APOE, age, and sex as covariates. Additional stratified analyses were conducted to evaluate PRS performance (for all pTs models of 5e-8, 1e-5, 1e-3, 0.1, and 0.5) in different cohort strata: (1) age (< 72 vs ≥ 72); (2) APOE ε4 status (carrier [ε4/ε4, ε3/ε4, and ε2/ε4] vs non-carrier [ε3/ε3, ε2/ε3, and ε2/ε2]); and (3) sex (men vs women). The AUCs obtained from the models were compared using the roc.test function in the pROC R software (version 4.1.0) library, employing the bootstrap method. Quality control, PRS calculations, and European-based PRS score calculations were conducted by PLINK (version 1.9). PRS-CSx was performed by python (version 3.10). All other statistical analyses (linear regressions, Log-Rank test, correlations, etc.) were performed using R software (version 4.1.0). We used two-sided p ≤ 0.05 for significance.

 

Results

Demographics

We found no significant differences in sex and APOE distribution between the training and validation datasets (p = 0.251 and p = 0.064, respectively) (see Supplementary materials: Table S1). In contrast, there was a significant age difference between the two datasets (mean age: 72.45±6.76 and 71.55±6.40, respectively; p < 0.001).

PRS performance

The results for different r2 parameters were very similar, with r2 < 0.001 being more stringent. Furthermore, except for pT<5e-8, all the other p-value thresholds showed the best performance for r2 < 0.001 (Table S2, S3). Therefore, we used the scenario with r2 < 0.001 for further analysis.
For model A, the AUC was 77.9% for pT≤ 5e-8, 79.0% for pT≤ 1e-5, 79.2% for pT≤ 1e-3, 79.4% for pT≤ 0.1 and 79.3% for pT≤ 0.5. For model B, the prediction accuracy increased from an AUC of 80.0% for pT ≤ 5e-8 to 82.6% for pT ≤ 0.1. For model C, the prediction accuracy increased from an AUC of 81.5% for pT ≤ 5e-8 to 84.0% for pT ≤ 0.5 (Table S4). After adjusting for principal components, the AUCs of three models for different pTs were slightly higher than the unadjusted (Figure 1, Table S5). In our analysis, the PRS using pT < 0.5 showed the best performance for Model A, B, and C after adjusting PCs (Table S5). The PRSnoAPOE model performed better than the APOE alone (AUC = 61.0%) (Table S9).
The results of the PRS-CSx method are shown in Table S6. Except for pT<5e-8, the AUCs for the other pTs were similar to those obtained by the C+T method.

Figure 1. Receiver operating characteristic (ROC) curve analyses for distinguishing the LOAD group from the cognitive healthy control group for models A(A), B(B), and C(C) after adjusting principal components

Note: LOAD, late-onset Alzheimer’s disease; PRS, polygenic risk score; pT, p-value threshold; AUC, area under the receiver operating curve; CI, confidence interval.

 

Association between PRS and LOAD

Logistic regression analyses revealed a significant positive correlation (OR>1) between the risk of LOAD and PRS in pT≤ 5e-8, 1e-5, 1e-3, 0.1, and 0.5 models. Specifically, ORs (95% CI) were 3.15 (2.67-3.71), 3.66 (3.02-4.21), 3.60 (3.06-4.24), 3.63 (3.09-4.27), and 3.63 (3.09-4.27) for pT≤ 5e-8, 1e-5, 1e-3, 0.1, and 0.5 models, respectively in model A. In Model B adjusted APOE, we achieved ORs (95% CI) of 3.17 (2.68‒3.76), 3.60 (3.03-4.26), 3.63 (3.07-4.29), 3.67 (3.10-4.34) and 3.67 (3.10-4.34), respectively. The full model achieved ORs (95% CI) of 3.08 (2.59‒3.66), 3.50 (2.94-4.16), 3.55 (3.00-4.22), 3.59 (3.02-4.26), and 3.59 (3.03-4.27), respectively after adjusting sex, age and APOE. Details are shown in Table S7.

Predictive ability of PRS models for the incidence rate of LOAD and AAO

The predictive ability of PRS models for the incidence rate of LOAD was assessed for PRSs with pTs≤ 5e-8, 1e-5, 1e-3, 0.1, and 0.5 from Model A (Figure 2, Figure S2). Participants were classified into three risk groups (high-PRS, intermediate-PRS, and low-PRS) based on the PRS tertiles. The Log-Rank test revealed a significant association between higher PRS and earlier AAO for all pTs models (Table S8). For pT≤ 5e-8 model, the adjusted p were < 0.001 for all pairwise comparisons. For other pTs models (1e-5, 1e-3, 0.1, and 0.5), the comparisons were significantly for high-PRS vs low-PRS and high-PRS vs intermediate-PRS (adjusted p < 0.001), not for intermediate-PRS vs low-PRS (adjusted p > 0.05). For all pTs models, in a high-PRS cohort, the expected age for 50% of individuals to develop LOAD was approximately 72 years, which was earlier than in individuals with a low PRS (where the expected age for 50% to develop LOAD was around 78~80 years). Additionally, the cumulative incidence rates in the high-PRS group were higher compared to the low-PRS group. For example, among two groups of 72-year-old individuals (one with high PRS and the other with low PRS), the percentage of sAD patients in the high-PRS group was higher than that in the low-PRS group (50% vs 23~25%) (Figure 2, Figure S2).

Figure 2. Cumulative incidence rates of LOAD in three genetic risk groups for pT<5e-8 model (A) and pT<0.5 model (B). Participants were partitioned into tertiles (low vs intermediate vs high PRS), with PRS cut-offs at 33.33% and 66.67%

Note: LOAD, late-onset Alzheimer’s disease; PRS, polygenic risk score; AAO, age at onset.

 

Stratified analyses

Stratified analyses were conducted based on age groups, with patients aged ≥ 72 years (N = 484) and those aged < 72 years (N = 522) for all pTs models in Model A. The results showed that the PRS with pT ≤ 5e-8 had significantly better performance (bootstrap p = 0.017) in the older age group (AUC = 81.6%, 95% CI = 77.6–85.6) compared to the younger age group (AUC = 74.1%, 95% CI = 69.1‒79.0) (Table S9, S10). However, the trends did not reach statistical significance in other pTs models (1e-5, 1e-3, 0.1, and 0.5). For sex strata, there were no significant statistical differences in PRS between females and males (all bootstrap p >0.05). Similarly, no significant differences were observed in the APOE strata for all pTs models (all bootstrap p> 0.05).

PRS replication

Our statistical analysis indicates a significant disparity between the numbers of clumped and total SNPs for pT < 0.5 and pT < 5e-8. Specifically, in the r2 < 0.001 model, there were 6889 and 739272 clumped and total SNPs for pT < 0.5, respectively, which is markedly higher than the corresponding number for pT < 5e-8 (20 and 45993) (Table S11). Additionally, we observed a slightly lower AUC for pT < 5e-8 compared to pT < 0.5 (Figure 1). Considering the practicality and cost-effectiveness, we genotyped loci and calculated the PRS using the list of 20 prioritized SNPs in the replication cohort with cerebrospinal fluid data (Table S12). We found that PRS also performed well (Model A: AUC = 77.1%, 95% CI = 66.8-87.4). When the full model was tested, we achieved AUC values of 81.8% (95% CI = 72.5-91.0) and 83.1% (95% CI = 74.3-92.0) for models B and C, respectively (Figure 3A). Figure 3B shows PRSs in LOAD cases vs. controls.

Figure 3. Results of PRS in the replication cohort. (A) Receiver operating characteristic (ROC) curve analyses for different models in the replication dataset. (B) Box plot for PRS in the replication cohort

Note: PRS, polygenic risk score; AUC, area under the receiver operating curve; CI, confidence interval.

 

Correlations between PRS and AD biomarkers in CSF

We analyzed the correlations between PRS and the CSF levels of Aβ42, Aβ42/Aβ40, t-Tau, and p-Tau181 in the replication cohort. We used the PRS with pT <5e-8 from Model A for this analysis. The results showed an inverse association between the PRS and the CSF levels of Aβ42 (Spearman ρ = -0.29, adjusted p = 0.016) and the ratio of Aβ42/Aβ40 (Spearman ρ = -0.38, adjusted p < 0.001) ( Figure 4). In contrast, there was a positive association between the PRS and the CSF levels of t-Tau (Spearman ρ = 0.14, adjusted p = 0.204) and p-Tau181 (Spearman ρ= 0.27, adjusted p = 0.023) (Figure 4). After adjusting for age, sex, and APOE genotype using linear regression, only the correlation between the PRS and Aβ42/Aβ40 remained similar (b = -0.33, adjusted p = 0.016). However, the associations between the PRS and Aβ42, t-Tau and p-Tau181 became less significant (Aβ42: b=-0.24, adjusted p=0.082; t-Tau: b = 0.17, adjusted p = 0.176; p-Tau181: b = 0.22, adjusted p = 0.087) (Table S13).

Figure 4. Correlations between PRSs and CSF biomarkers. A-D Scatterplots of PRS with (A) Ab42, (B) Ab42/Ab40, (C) t-Tau, and (D) p-Tau181

Spearman correlation coefficients (ρ) were used to assess the correlations; PRS, polygenic risk score; CSF, cerebrospinal fluid.

 

PRS performance using the European weights

We constructed a PRS model using the European ancestry PGCALZ summary statistics as a training dataset. The European weights model included the same variants as the Chinese PRS with a pT of 5e-8. Compared to the Chinese PRS, the PRS with the European weights model reached a lower prediction accuracy. The AUCs for models A, B, and C were 67.5% (95% CI = 64.2‒70.8), 72.0% (95% CI = 68.9‒75.1), and 75.1% (95% CI = 72.1‒78.1), respectively. And the ORs (95% CI) were 1.97 (1.71-2.27), 1.97 (1.70-2.28), and 1.94 (1.67-2.45), respectively (Figure 5). Results for PRSPGCALZ models with re-clumped PGCALZ summary statistics with different r2 values (0.001, 0.01, and 0.1) and all pTs (5e-8, 1e-5, 1e-3, 0.1, and 0.5) are shown in Table S14. The best results were obtained with r2 < 0.1, and the corresponding results for Models A, B, and C for all pTs were listed in Table S15. The AUCs for these models were relatively small.

Figure 5. The association between PRSs and LOAD for models A, B, and C with pT<5e-8 in Chinese (PRS_China) and European (PRS_PGCALZ) populations

PRS, polygenic risk score; LOAD, late-onset Alzheimer’s disease; OR, odds ratio; CI, confidence interval.

 

Discussion

To the best of our knowledge, this is the first PRS for LOAD derived directly from GWAS results based on Chinese samples, ensuring that the training and validation datasets share the same genetic ancestral background. When applied to differentiate patients with LOAD from controls, our full model, comprising PRSnoAPOE, APOE, sex, and age, achieved a higher AUC compared to the model that only included APOE. Our study findings demonstrate that PRS can effectively identify LOAD patients, potentially facilitating population-based screening and forecasting the genetic risk of LOAD in the Chinese population.
Our prediction models include LOAD ~ APOE, LOAD ~ PRSnoAPOE, LOAD ~ PRSnoAPOE + APOE, and LOAD ~ PRSnoAPOE + APOE + sex + age. The AUC results showed that the model achieved an AUC of 61.0% for APOE alone, 82.6% for PRSnoAPOE + APOE, and 84.0% for PRSnoAPOE + APOE + age + sex. These models showed superior performance in identifying patients with high genetic risk for LOAD compared to previously available European GWAS-based Chinese PRS models (AUC = 58% –71%) (13, 18, 19, 31, 32). This improvement can be explained by the use of samples with the same Chinese ancestral background in both the training and validation stages of our study. In contrast, other studies used genetic summary statistics primarily based on Europeans. Considering the existence of differences in SNP allele frequencies, linkage disequilibrium patterns, and genetic architectures among diverse populations (33), and the enrollment of only individuals with LOAD rather than early-onset AD, our PRS model may better reflect the genetic characteristics for LOAD in the Chinese population and thus possess a higher capacity to predict individuals at risk for this condition. To further assess the racial heterogeneity of our PRS model, we used European ancestry PGCALZ summary statistics as the base summary dataset to establish PRS. However, in our validation cohort, the resulting PRS achieved an AUC of only 67.5%, which is significantly inferior in predictive accuracy compared to our PRSnoAPOE model (AUC of 77.8%). Moreover, the best predictive accuracy of our PRS models in this study is similar to that observed in other European studies with cohorts sharing the same ancestral background (AUC = 74.0%–84.0%) (34-36). Furthermore, an interesting methodological observation is that we achieved the highest prediction accuracy with a clumping parameter of r2 <0.001, while r2 <0.1 for PRSPGCALZ models and most European studies used clumping parameters ranging from 0.2 to 0.01 (29, 34, 36). This discrepancy may be partially explained by different genetic architectures, especially LD patterns, between the Chinese and European populations. Although one study has reported transferability of AD PRS between populations (37), most studies have confirmed its ethnic heterogeneity (35, 38). There have been comparable investigations in other diseases, such as Parkinson’s disease and type 2 diabetes, which back up population-specific PRS (39, 40). These findings indicate the need for cautious generalizations when establishing PRS and emphasize the importance of utilizing training and validation data from the same ancestral background.
Interestingly, our study revealed that the predictive accuracy of APOE alone (AUC = 61%) was even lower than that of PRSnoAPOE with pT < 5e-8 in our study, which is inconsistent with the results in the European population. Typically, in European populations, APOE alone (AUC = 70%) exhibits superior predictive accuracy compared to PRS without APOE (55.7% ≤ AUC ≤ 61.3% with different pTs) (29). Moreover, recent research on adult populations of European descent concluded that APOE ε4 status was a better predictor of AD than PRS (41). These observations indicate that the predictive value of the APOE genotype for LOAD varies across different ethnic groups, and is less robust in Chinese populations compared to Europeans. Similar results were demonstrated in a study conducted on Caribbean Hispanics (35), where the AUC for APOE genotype and PRS alone was 55% and 62.2%, respectively. Additionally, it is important to note another characteristic of APOE in the Chinese population regarding prediction. Empirical data has revealed that the impact of APOE on dementia onset decreases with age (42), particularly after the age of 70 years, whereas the influence of other risk loci is more pronounced (43). Our results were in agreement with these conclusions. The age-stratified analysis with our study demonstrated that PRS exhibited significantly better performance in older individuals (≥ 72 years, AUC = 81.6%) compared to younger individuals (< 72 years, AUC = 74.1%). These results provide further support to our hypothesis that genetic loci beyond APOE play a more significant role than APOE itself in the pathogenesis of LOAD in the Chinese population. This also highlights the importance of considering the relationship between age and APOE when predicting LOAD.
Additionally, we explored the association between PRS and AAO of LOAD using Log-Rank test. Consistent with previous findings (13, 44-46), We discovered that individuals with a high genetic risk were more prone to develop LOAD. Furthermore, the onset of the disease occurred at an earlier age in comparison to low genetic risk individuals. These findings implied that the polygenic profile played a role in influencing the incidence risk of LOAD and the AAO. The impact of genetic risk on the biomarkers of LOAD can offer valuable understanding of the pathogenesis of the disease. In the replicated cohort, we further found that the CSF biomarkers of Aβ42 and the ratio of Aβ42/Aβ40 were significantly inversely linked to the PRS, whilst p-Tau181 demonstrated a positive correlation. This suggests that the genetic profile influences the pathogenesis of LOAD. As the pathological changes of AD commence 15–20 years prior to clinical presentation (47) and clinical trials of disease-modifying therapy in the preclinical stage are promising (48). A polygenetic model seems to have value in identifying individuals with abnormal levels of CSF biomarkers. From a clinical perspective, despite not yet being suitable for clinical use, our highly accuracy PRS model has the ability to forecast individuals who are vulnerable to developing LOAD at a given age and to detect those with abnormal levels of CSF biomarkers. This provides potential opportunities for early diagnosis and treatment for potential LOAD patients.

Study Limitations

This study had several limitations. First, both the training and validation datasets in our study had relatively small sample sizes. This may lead to false positive findings and undermine the utility of the PRS. Increasing the training sample size would allow more loci with small individual effects to reach the genome-wide significance threshold, thus improving the PRS in our study. Second, while the high predictive accuracy of PRS models has been validated in a cross-sectional cohort, further studies are needed to substantiate the clinical usefulness of our PRS model, especially using larger Chinese or even Asian longitudinal cohorts. PRS pathway analyses should also be conducted in future research. Third, our result may not provide strong evidence for the generalizability of PRS across different populations. We replaced variants not found in the European dataset with the highest LD-based SNPs, which may include SNPs irrelevant to AD in our analyses. Fourthly, despite reports of PRS heterogeneity, it cannot be discounted that our findings may be just a matter of optimizing the predictive capacity in a particular sample. Finally, PRS has several limitations regarding its utility. Firstly, SNPs in linkage disequilibrium are removed before analysis, resulting in the potential loss of information. Secondly, interactions between SNPs are not considered in PRS.

 

Conclusions

In conclusion, the LOAD-PRS model of our study accurately identifies individuals with genetic risk profiles in the Chinese population. It also shows significant correlations with LOAD risk, age at onset, and CSF biomarkers. However, it is important to note that genetic risk profiles can vary among populations, highlighting the limitations of generalizing PRS to different populations. Therefore, the application of ethnic-specific PRS models should be emphasized. Furthermore, in the Chinese population, the disease liability of APOE is relatively smaller compared to Europeans. It is the genetic loci beyond APOE that contribute significantly to the risk profile. This Chinese PRS model has the clinical potential to predict individuals at risk of developing LOAD at a given age and with abnormal levels of CSF biomarkers. Its implementation can enhance the prediction and identification of individuals at high risk of developing LOAD, providing an opportunity for early prevention in clinical practice.

 

Acknowledgments: We thank all individuals in this study and all neurologists at relevant academic centers for their help in the recruitment of the individuals. We would like to thank Editage (www.editage.cn) for English language editing.
Funding: This study was supported by the Key Project of the National Natural Science Foundation of China (U20A20354); Beijing Brain Initiative from Beijing Municipal Science & Technology Commission (Z201100005520017); STI2030-Major Projects (No.2021ZD0201802); the National Key Scientific Instrument and Equipment Development Project (31627803); the Key Project of the National Natural Science Foundation of China (81530036).

Ethical standards: The study was approved by the Ethical Committees of Xuanwu Hospital, Capital Medical University.

Conflicts of Interest: On behalf of all authors, the corresponding author states that there is no conflict of interest.

 

 

SUPPLEMENTARY MATERIAL

 

References

1. World Health Organization. Global status report on the public health response to dementia. Geneva: World Health Organization; 2021.
2. Jia L, et al., Prevalence, risk factors, and management of dementia and mild cognitive impairment in adults aged 60 years or older in China: a cross-sectional study. Lancet Public Health, 2020. 5(12): p. e661-e71.
3. Harman D. Alzheimer’s disease pathogenesis: role of aging. Ann N Y Acad Sci, 2006. 1067: p. 454-60.
4. Naj AC, Schellenberg GD, Alzheimer’s Disease Genetics C. Genomic variants, genes, and pathways of Alzheimer’s disease: An overview. Am J Med Genet B Neuropsychiatr Genet, 2017. 174(1): p. 5-26.
5. Gatz M, et al., Role of genes and environments for explaining Alzheimer disease. Arch Gen Psychiatry, 2006. 63(2): p. 168-74.
6. Sims R, Hill M, Williams J. The multiplex model of the genetics of Alzheimer’s disease. Nat Neurosci, 2020. 23(3): p. 311-22.
7. Kunkle BW, et al., Genetic meta-analysis of diagnosed Alzheimer’s disease identifies new risk loci and implicates Aβ, tau, immunity and lipid processing. Nature Genetics, 2019. 51(3): p. 414-30.
8. Wightman DP, et al., A genome-wide association study with 1,126,563 individuals identifies new risk loci for Alzheimer’s disease. Nat Genet, 2021. 53(9): p. 1276-82.
9. Bellenguez C, et al., New insights into the genetic etiology of Alzheimer’s disease and related dementias. Nature Genetics, 2022. 54(4): p. 412-36.
10. Purcell SM, et al., Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature, 2009. 460(7256): p. 748-52.
11. Mars N, et al., Polygenic and clinical risk scores and their impact on age at onset and prediction of cardiometabolic diseases and common cancers. Nat Med, 2020. 26(4): p. 549-57.
12. Tosto G, et al., Polygenic risk scores in familial Alzheimer disease. Neurology, 2017. 88(12): p. 1180-6.
13. Li WW, et al., Association of Polygenic Risk Score with Age at Onset and Cerebrospinal Fluid Biomarkers of Alzheimer’s Disease in a Chinese Cohort. Neurosci Bull, 2020. 36(7): p. 696-704.
14. Stocker H, et al., Prediction of clinical diagnosis of Alzheimer’s disease, vascular, mixed, and all-cause dementia by a polygenic risk score and APOE status in a community-based cohort prospectively followed over 17 years. Mol Psychiatry, 2021. 26(10): p. 5812-22.
15. Mormino EC, et al., Polygenic risk of Alzheimer disease is associated with early- and late-life processes. Neurology, 2016. 87(5): p. 481-8.
16. Liu H, Lutz M, Luo S, Alzheimer’s Disease Neuroimaging I. Association Between Polygenic Risk Score and the Progression from Mild Cognitive Impairment to Alzheimer’s Disease. J Alzheimers Dis, 2021. 84(3): p. 1323-35.
17. Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet, 2019. 51(4): p. 584-91.
18. Zhou X, et al., Genetic and polygenic risk score analysis for Alzheimer’s disease in the Chinese population. Alzheimers Dement (Amst), 2020. 12(1): p. e12074.
19. Xiao Q, et al., Risk prediction for sporadic Alzheimer’s disease using genetic risk score in the Han Chinese population. Oncotarget, 2015. 6(35): p. 36955-64.
20. Schultz LM, et al., Stability of polygenic scores across discovery genome-wide association studies. HGG Adv, 2022. 3(2): p. 100091.
21. Zhu Z, et al., TOMM40 and APOE variants synergistically increase the risk of Alzheimer’s disease in a Chinese population. Aging Clin Exp Res, 2021. 33(6): p. 1667-75.
22. Jiang T, et al., A rare coding variant in TREM2 increases risk for Alzheimer’s disease in Han Chinese. Neurobiol Aging, 2016. 42: p. 217 e1-3.
23. Jia L, et al., PSEN1, PSEN2, and APP mutations in 404 Chinese pedigrees with familial Alzheimer’s disease. Alzheimers Dement, 2020. 16(1): p. 178-91.
24. Zhou X, et al., Identification of genetic risk factors in the Chinese population implicates a role of immune system in Alzheimer’s disease pathogenesis. Proc Natl Acad Sci U S A, 2018. 115(8): p. 1697-706.
25. Jia L, et al., Prediction of Alzheimer’s disease using multi-variants from a Chinese genome-wide association study. Brain, 2021. 144(3): p. 924-37.
26. McKhann GM, et al., The diagnosis of dementia due to Alzheimer’s disease: Recommendations from the National Institute on Aging‐Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimers Dement, 2011. 7(3): p. 263-9.
27. McKhann G, Drachman D, Folstein M, Katzman R, Price D, Stadlan EM. Clinical diagnosis of Alzheimer’s disease: report of the NINCDS-ADRDA Work Group under the auspices of Department of Health and Human Services Task Force on Alzheimer’s Disease. Neurology, 1984. 34(7): p. 939-44.
28. Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience, 2015. 4: p. 7.
29. Leonenko G, et al., Identifying individuals with high risk of Alzheimer’s disease using polygenic risk scores. Nat Commun, 2021. 12(1): p. 4506.
30. Ruan Y, et al., Improving polygenic prediction in ancestrally diverse populations. Nat Genet, 2022. 54(5): p. 573-80.
31. Li J, et al., Polygenic risk for Alzheimer’s disease influences precuneal volume in two independent general populations. Neurobiol Aging, 2018. 64: p. 116-22.
32. Jiao B, et al., Associations of risk genes with onset age and plasma biomarkers of Alzheimer’s disease: a large case-control study in mainland China. Neuropsychopharmacology, 2022. 47(5): p. 1121-7.
33. Martin AR, et al., Human Demographic History Impacts Genetic Risk Prediction across Diverse Populations. Am J Hum Genet, 2017. 100(4): p. 635-49.
34. Escott-Price V, et al., Common polygenic variation enhances risk prediction for Alzheimer’s disease. Brain, 2015. 138(Pt 12): p. 3673-84.
35. Sariya S, et al., Polygenic Risk Score for Alzheimer’s Disease in Caribbean Hispanics. Ann Neurol, 2021. 90(3): p. 366-76.
36. Leonenko G, et al., Genetic risk for alzheimer disease is distinct from genetic risk for amyloid deposition. Ann Neurol, 2019. 86(3): p. 427-35.
37. Jung S-H, et al., Transferability of Alzheimer Disease Polygenic Risk Score Across Populations and Its Association With Alzheimer Disease-Related Phenotypes. JAMA Network Open, 2022. 5(12): p.
38. Baker E, Escott-Price V. Polygenic Risk Scores in Alzheimer’s Disease: Current Applications and Future Directions. Frontiers in Digital Health, 2020. 2: p.
39. Pan H, et al., Genome-wide association study using whole-genome sequencing identifies risk loci for Parkinson’s disease in Chinese population. npj Parkinson’s Disease, 2023. 9(1): p.
40. Michael L. Multhaup RK, Becca Krock, Nicholas Eriksson, Pierre Fontanillas, Stella Aslibekyan, Liana Del Gobbo, Janie F. Shelton, Ruth I. Tennen, Alisa Lehman, Nicholas A. Furlotte, and Bertram L. Koelsch. Estimating the likelihood of developing type 2 diabetes with polygenic models. The science behind 23andMe’s Type 2 Diabetes report, 2019. 23: p. 1-33.
41. Stocker H, Möllers T, Perna L, Brenner H. The genetic risk of Alzheimer’s disease beyond APOE ε4: systematic review of Alzheimer’s genetic risk scores. Transl Psychiatry, 2018. 8(1): p. 166.
42. Farrer LA. Effects of Age, Sex, and Ethnicity on the Association Between Apolipoprotein E Genotype and Alzheimer Disease. Jama, 1997. 278(16): p.
43. Bellou E, et al., Age-dependent effect of APOE and polygenic component on Alzheimer’s disease. Neurobiol Aging, 2020. 93: p. 69-77.
44. Brayne C, et al., Genetic assessment of age-associated Alzheimer disease risk: Development and validation of a polygenic hazard score. PLOS Medicine, 2017. 14(3): p.
45. Sleegers K, et al., A 22-single nucleotide polymorphism Alzheimer’s disease risk score correlates with family history, onset age, and cerebrospinal fluid Abeta42. Alzheimers Dement, 2015. 11(12): p. 1452-60.
46. Cruchaga C, et al., Polygenic risk score of sporadic late-onset Alzheimer’s disease reveals a shared architecture with the familial and early-onset forms. Alzheimers Dement, 2018. 14(2): p. 205-14.
47. Mormino EC, et al., Heterogeneity in Suspected Non-Alzheimer Disease Pathophysiology Among Clinically Normal Older Individuals. JAMA Neurol, 2016. 73(10): p. 1185-91.
48. Scheltens P, et al., Alzheimer’s disease. Lancet, 2016. 388(10043): p. 505-17.

© Serdi 2024