Application of Machine Learning to Predict the Risk of Alzheimer’s Disease: An Accurate and Practical Solution for Early Diagnostics

David Castiñeira
4 min readJun 17, 2020

--

Courtney Cochrane, David Castiñeira, Nisreen Shiban and Pavlos Protopapas (Institute for Applied Computational Science, Harvard John A. Paulson School of Engineering and Applied Sciences, Cambridge, MA, US)

Source: knowalzheimer.com

Alzheimer’s Disease (AD) ravages the cognitive ability of more than 5 million Americans and creates an enormous strain on the health care system. Our paper proposes a machine learning predictive model for AD development without medical imaging and with fewer clinical visits and tests, in hopes of earlier and cheaper diagnoses. That earlier diagnoses could be critical in the effectiveness of any drug or medical treatment to cure this disease. In fact this work can be adapted into a practical early diagnostic tool for predicting the development of Alzheimer’s that maximizes accuracy while minimizing the number of necessary diagnostic tests and clinical visits.

Our model was trained and validated using demographic, biomarker and cognitive test data from two prominent research studies: Alzheimer’s Disease Neuroimaging Initiative (ADNI) and Australian Imaging, Biomarker & Lifestyle Flagship Study of Aging (AIBL). We systematically explored different machine learning models, pre-processing methods and feature selection techniques.

Automated data science/machine learning pipeline

The most performant model demonstrated greater than 90% accuracy and recall in predicting AD. Our results generalized across sub-studies of ADNI and to the independent AIBL study.

Model Comparison

We also identified the features that are most important for some of our best predictive models:

Feature Importance Analysis (Top 25 Features) for Machine Learning Model

We also demonstrated that our results are robust to reducing the number of clinical visits or tests per visit. Using a meta-classification algorithm and longitudinal data analysis we were able to produce a “lean” diagnostic protocol with only 3 tests and 4 clinical visits that can predict Alzheimer’s development with 87% accuracy and 79% recall.

Meta-classification approach . This approach balances the accuracy of models with the cost of obtaining the necessary data.

Key conclusions

  1. Our work shows that it is possible to build a data-driven model that can confidently predict the risk of developing Alzheimer’s in the future with a level of accuracy and recall that are above 90%. The necessary data for such a prediction is patient demographic information, a genetic test (APOE4 genotyping) and a battery of cognitive tests. We demonstrated that imaging data (MRI and PET scans), which are more costly in terms of time and money, are not necessary for highly accurate predictions.
  2. We also demonstrated how well our model generalizes by evaluating the model performance for different ADNI sub-studies (testing one against the others and quantifying model performance) and against a cohort of patients that belong to a completely different repository (AIBL). In all cases, our predictive models show very robust performance. We carefully quantified the impact that the number of clinical visits of data available for a patient has on the predictive performance of our model.
  3. We also implemented a meta-classification technique to identify the combination of features that provide the optimal balance between model prediction and feature cost. In each case we have identified models that can still provide a high level of accuracy and recall. We believe our work provides the right framework for a practical deployment of an AD predictive tool in clinical settings. As an example, we have proposed a diagnostic protocol with only 3 tests and 4 clinical visits that can predict AD with 87% accuracy and 79% recall.
  4. Ultimately our model framework could be used by physicians and patients together to determine appropriate plans for diagnosis and monitoring of the risk of developing AD. Any potential model to be deployed in real world settings will have to perform well relative to a clinician. Based on the literature, physicians can diagnose Alzheimer’s with 87% accuracy and 91% recall. Our best models produce equivalent or better predictions relative to physicians for the harder problem of predicting future development of AD.
  5. Going forward, a parallel study of model prediction versus physician prediction would be necessary to validate the models and gain doctor’s trust in this method. A limitation to this approach is due to our training labels being provided by doctors. Those ”true” labels carry some level of uncertainty as AD is a difficult disease to diagnose in vivo. Our predictive models are ultimately only as good as the training data used to build them. Finally, it is important to recognize that this work has focused on proposing models that offer high predictive performance, with no consideration for interpretation of these models. Expected FDA new regulations for CDS (Clinical Decision Support) software could incentivize developing models.

--

--

No responses yet