Machine learning algorithm helps identify nondiagnosed prodromal alzheimer’s disease patients in the general population

Uspenskaya-Cadoz, O.; Alamuri, C.; Wang, L.; Yang, M.; Khinda, S.; Nigmatullina, Y.; Cao, T.; Kayal, N.; O’Keefe, M.; Rubel, C.

Archives

journal articles

MACHINE LEARNING ALGORITHM HELPS IDENTIFY NONDIAGNOSED PRODROMAL ALZHEIMER’S DISEASE PATIENTS IN THE GENERAL POPULATION

O. Uspenskaya-Cadoz, C. Alamuri, L. Wang, M. Yang, S. Khinda, Y. Nigmatullina, T. Cao, N. Kayal, M. O’Keefe, C. Rubel

J Prev Alz Dis 2019;6(3):185-191

Background: Recruiting patients for clinical trials of potential therapies for Alzheimer’s disease (AD) remains a major challenge, with demand for trial participants at an all-time high. The AD treatment R&D pipeline includes around 112 agents. In the United States alone, 150 clinical trials are seeking 70,000 participants. Most people with early cognitive impairment consult primary care providers, who may lack time, diagnostic skills and awareness of local clinical trials. Machine learning and predictive analytics offer promise to boost enrollment by predicting which patients have prodromal AD, and which will go on to develop AD. Objectives: The authors set out to develop a machine learning predictive model that identifies prodromal AD patients in the general population, to aid early AD detection by primary care physicians and timely referral to expert sites for biomarker confirmation of diagnosis and clinical trial enrollment. Design: The authors use a classification machine learning algorithm to extract patterns within healthcare claims and prescription data three years prior to AD diagnosis/AD drug initiation. Setting: The study focused on subjects included within proprietary IQVIA US data assets (claims and prescription databases). Patient information was extracted from January 2010 to July 2018, for cohorts aged between 50 and 85 years. Participants: A total of 88,298,289 subjects aged between 50 and 85 years were identified. For the positive cohort, 667,288 subjects were identified who had 24 months of medical history and at least one record with AD or AD treatment. For the negative cohort, 3,670,254 patients were selected who had a similar length of medical history and who were matched to positive cohort subjects based on the prevalence rate. The scoring cohort was selected based on availability of recent medical data of 2-5 years and included 72,670,283 subjects between the ages of 50 and 85 years. Intervention (if any): None. Measurements: A list of clinically-relevant and interpretable predictors was generated and extracted from the data sets for each subject, including pharmacological treatments (NDC/product), office/specialist visits (specialty), tests and procedures (HCPCS and CPT), and diagnosis (ICD). The positive cohort was defined as patients who have AD diagnosis/AD treatment with a 3 years offset as an estimate for prodromal AD diagnosis. Supervised ML techniques were used to develop algorithms to predict the occurrence of prodromal AD cases. The sample dataset was divided randomly into a training dataset and a test dataset. The classification models were trained and executed in the PySpark framework. Training and evaluation of LogisticRegression, DecisionTreeClassifier, RandomForestClassifier, and GBTClassifier were executed using PySpark’s mllib module. The area under the precision-recall curve (AUCPR) was used to compare the results of the various models. Results: The AUCPRs are 0.426, 0.157, 0.436, and 0.440 for LogisticRegression, DecisionTreeClassifier, RandomForestClassifier, and GBTClassifier, respectively, meaning that GBTClassifier (Gradient Boosted Tree) outperforms the other three classifiers. The GBT model identified 222,721 subjects in the prodromal AD stage with 80% precision. Some 76% of identified prodromal AD patients were in the primary care setting. Conclusions: Applying the developed predictive model to 72,670,283 U.S. residents, 222,721 prodromal AD patients were identified, the majority of whom were in the primary care setting. This could drive major advances in AD research by enabling more accurate and earlier prodromal AD diagnosis at the primary care physician level , which would facilitate timely referral to expert sites for in-depth assessment and potential enrolment in clinical trials.

CITATION:
O. Uspenskaya-Cadoz ; C. Alamuri ; L. Wang ; M. Yang ; S. Khinda ; Y. Nigmatullina ; T. Cao ; N. Kayal ; M. O’Keefe ; C. Rubel (2019): Machine Learning Algorithm Helps Identify Non-Diagnosed Prodromal Alzheimer’s Disease Patients in the General Population . The Journal of Prevention of Alzheimer’s Disease (JPAD). http://dx.doi.org/10.14283/jpad.2019.10

Download PDF View HTML