Rxivist logo

Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 65,415 bioRxiv papers from 289,746 authors.

Scaling computational genomics to millions of individuals with GPUs

By Amaro Taylor-weiner, Francois Aguet, Nicholas J Haradhvala, Sager Gosai, Shankara Anand, JAEGIL KIM, Kristin Ardlie, Eliezer M Van Allen, Gad Getz

Posted 14 Nov 2018
bioRxiv DOI: 10.1101/470138 (published DOI: 10.1186/s13059-019-1836-7)

Current genomics methods were designed to handle tens to thousands of samples, but will soon need to scale to millions to keep up with the pace of data and hypothesis generation in biomedical science. Moreover, costs associated with processing these growing datasets will become prohibitive without improving the computational efficiency and scalability of methods. Here, we show that recently developed machine-learning libraries (TensorFlow and PyTorch) facilitate implementation of genomics methods for GPUs and significantly accelerate computations. To demonstrate this, we re-implemented methods for two commonly performed computational genomics tasks: QTL mapping and Bayesian non-negative matrix factorization. Our implementations ran > 200 times faster than current CPU-based versions, and these analyses are ~5-10 fold cheaper on GPUs due to the vastly shorter runtimes. We anticipate that the accessibility of these libraries, and the improvements in run-time will lead to a transition to GPU-based implementations for a wide range of computational genomics methods.

Download data

  • Downloaded 2,953 times
  • Download rankings, all-time:
    • Site-wide: 1,181 out of 65,415
    • In genomics: 278 out of 4,463
  • Year to date:
    • Site-wide: 545 out of 65,415
  • Since beginning of last month:
    • Site-wide: 4,006 out of 65,415

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide

Sign up for the Rxivist weekly newsletter! (Click here for more details.)