Rxivist logo

Predicting onset, progression, and clinical subtypes of Parkinson disease using machine learning

By Faraz Faghri, Sayed Hadi Hashemi, Hampton Leonard, Sonja W. Scholz, Roy H Campbell, Mike A Nalls, Andrew B. Singleton

Posted 04 Jun 2018
bioRxiv DOI: 10.1101/338913

The clinical manifestations of Parkinson disease are characterized by heterogeneity in age at onset, disease duration, rate of progression, and constellation of motor versus non-motor features. Due to these variable presentations, counseling of patients about their individual risks and prognosis is limited. There is an unmet need for predictive tests that facilitate early detection and characterization of distinct disease subtypes as well as improved, individualized predictions of the disease course. The emergence of machine learning to detect hidden patterns in complex, multi-dimensional datasets provides unparalleled opportunities to address this critical need. In this work, we use unsupervised and supervised machine learning approaches for subtype identification and prediction. We use machine learning methods on comprehensive, longitudinal clinical data from the Parkinson Disease Progression Marker Initiative (PPMI) (n=328 cases) to identify patient subtypes and to predict disease progression. The resulting models are validated in an independent, clinically well-characterized cohort from the Parkinson Disease Biomarker Program (PDBP) (n=112 cases). Our analysis distinguishes three distinct disease subtypes with highly predictable progression rates, corresponding to slow, moderate and fast disease progressors. We achieve highly accurate projections of disease progression four years after initial diagnosis with an average Area Under the Curve of 0.93 (95% CI: 0.96 ± 0.01 for PDvec1, 0.87 ± 0.03 for PDvec2, and 0.96 ± 0.02 for PDvec3). We also demonstrate robust replication of these findings in the independent validation cohort. These data-driven results enable clinicians to deconstruct the heterogeneity within their patient cohorts. This knowledge could have immediate implications for clinical trials by improving the detection of significant clinical outcomes that might have been masked by cohort heterogeneity. We anticipate that machine learning models will improve patient counseling, clinical trial design, allocation of healthcare resources and ultimately individualized clinical care.

Download data

  • Downloaded 2,993 times
  • Download rankings, all-time:
    • Site-wide: 5,215
    • In neuroscience: 441
  • Year to date:
    • Site-wide: 14,060
  • Since beginning of last month:
    • Site-wide: 34,302

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide