Rxivist logo

Privacy-preserving generative deep neural networks support clinical data sharing

By Brett K. Beaulieu-Jones, Zhiwei Steven Wu, Chris Williams, Ran Lee, Sanjeev P Bhavnani, James Brian Byrd, Casey S Greene

Posted 05 Jul 2017
bioRxiv DOI: 10.1101/159756 (published DOI: 10.1161/CIRCOUTCOMES.118.005122)

Background: Data sharing accelerates scientific progress but sharing individual level data while preserving patient privacy presents a barrier. Methods and Results: Using pairs of deep neural networks, we generated simulated, synthetic "participants" that closely resemble participants of the SPRINT trial. We showed that such paired networks can be trained with differential privacy, a formal privacy framework that limits the likelihood that queries of the synthetic participants' data could identify a real a participant in the trial. Machine-learning predictors built on the synthetic population generalize to the original dataset. This finding suggests that the synthetic data can be shared with others, enabling them to perform hypothesis-generating analyses as though they had the original trial data. Conclusions: Deep neural networks that generate synthetic participants facilitate secondary analyses and reproducible investigation of clinical datasets by enhancing data sharing while preserving participant privacy.

Download data

  • Downloaded 12,291 times
  • Download rankings, all-time:
    • Site-wide: 149 out of 83,615
    • In bioinformatics: 20 out of 8,013
  • Year to date:
    • Site-wide: 1,491 out of 83,615
  • Since beginning of last month:
    • Site-wide: 1,528 out of 83,615

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)