Rxivist logo

Efficient Bayesian inference of phylogenetic trees from large scale, low-depth genome-wide single-cell data

By Fatemeh Dorri, Sohrab Salehi, Kevin Chern, Tyler Funnell, Marc Williams, Daniel Lai, Mirela Andronescu, Kieran R Campbell, Andrew McPherson, Samuel Aparicio, Andrew Roth, Sohrab Shah, Alexandre Bouchard-Côté

Posted 07 May 2020
bioRxiv DOI: 10.1101/2020.05.06.058180

A new generation of scalable single cell whole genome sequencing (scWGS) methods, allows unprecedented high resolution measurement of the evolutionary dynamics of cancer cells populations. Phylogenetic reconstruction is central to identifying sub-populations and distinguishing mutational processes. The ability to sequence tens of thousands of single genomes at high resolution per experiment is challenging the assumptions and scalability of existing phylogenetic tree building methods and calls for tailored phylogenetic models and scalable inference algorithms. We propose a phylogenetic model and associated Bayesian inference procedure which exploits the specifics of scWGS data. A first highlight of our approach is a novel phylogenetic encoding of copy-number data providing an attractive statistical-computational trade-off by simplifying the site dependencies induced by rearrangements while still forming a sound foundation to phylogenetic inference. A second highlight is an innovative phylogenetic tree exploration move which makes the cost of MCMC iterations bounded by O(|C| + |L|), where |C| is the number of cells and |L| is the number of loci. In contrast, existing off-the-shelf likelihood-based methods incur iteration cost of O(|C| |L|). Moreover, the novel move considers an exponential number of neighbouring trees whereas off-the-shelf moves consider a polynomial size set of neighbours. The third highlight is a novel mutation calling method that incorporates the copy-number data and the underlying phylogenetic tree to overcome the missing data issue. This framework allows us to realistically consider routine Bayesian phylogenetic inference at the scale of scWGS data. ### Competing Interest Statement SPS and SA are shareholders and consultants of Contextual Genomics Inc.

Download data

  • Downloaded 604 times
  • Download rankings, all-time:
    • Site-wide: 28,666 out of 101,301
    • In bioinformatics: 3,805 out of 9,292
  • Year to date:
    • Site-wide: 5,673 out of 101,301
  • Since beginning of last month:
    • Site-wide: 14,335 out of 101,301

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)


  • 20 Oct 2020: Support for sorting preprints using Twitter activity has been removed, at least temporarily, until a new source of social media activity data becomes available.
  • 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
  • 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
  • 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
  • 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
  • 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
  • 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
  • 22 Jan 2019: Nature just published an article about Rxivist and our data.
  • 13 Jan 2019: The Rxivist preprint is live!