PhISCS - A Combinatorial Approach for Sub-perfect Tumor Phylogeny Reconstruction via Integrative use of Single Cell and Bulk Sequencing Data
Farid Rashidi Mehrabadi,
S. Cenk Sahinalp
Posted 25 Jul 2018
bioRxiv DOI: 10.1101/376996 (published DOI: 10.1101/gr.234435.118)
Posted 25 Jul 2018
Recent technological advances in single cell sequencing (SCS) provide high resolution data for studying intra-tumor heterogeneity and tumor evolution. Available computational methods for tumor phylogeny inference via SCS typically aim to identify the most likely perfect phylogeny tree satisfying infinite sites assumption (ISA). However limitations of SCS technologies such as frequent allele dropout or highly variable sequence coverage, commonly result in mutational call errors and prohibit a perfect phylogeny. In addition, ISA violations are commonly observed in tumor phylogenies due to the loss of heterozygosity, deletions and convergent evolution. In order to address such limitations, we, for the first time, introduce a new combinatorial formulation that integrates single cell sequencing data with matching bulk sequencing data, with the objective of minimizing a linear combination of (i) potential false negatives (due to e.g. allele dropout or variance in sequence coverage) and (ii) potential false positives (due to e.g. read errors) among mutation calls, as well as (iii) the number of mutations that violate ISA - to define the optimal sub-perfect phylogeny. Our formulation ensures that several lineage constraints imposed by the use of variant allele frequencies (VAFs, derived from bulk sequence data) are satisfied. We express our formulation both in the form of an integer linear program (ILP) and - for the first time in the context of tumor phylogeny reconstruction - a boolean constraint satisfaction problem (CSP) and solve them by leveraging state-of-the-art ILP/CSP solvers. The resulting method, which we name PhISCS, is the first to integrate SCS and bulk sequencing data under the finite sites model. Using several simulated and real SCS data sets, we demonstrate that PhISCS is not only more general but also more accurate than the alternative tumor phylogeny inference tools. PhISCS is very fast especially when its CSP based variant is used returns the optimal solution, except in rare instances for which it provides an optimality gap. PhISCS is available at https://github.com/haghshenas/PhISCS.
- Downloaded 745 times
- Download rankings, all-time:
- Site-wide: 27,621
- In bioinformatics: 3,250
- Year to date:
- Site-wide: 68,826
- Since beginning of last month:
- Site-wide: 92,685
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!