Minerva: An Alignment and Reference Free Approach to Deconvolve Linked-Reads for Metagenomics
Emerging linked-read technologies (aka read-cloud or barcoded short-reads) have revived interest in standard short-read technology as a viable way to understand large scale structure in genomes and metagenomes. Linked-read technologies, such as the 10X Chromium system, use a microfluidic system and a set of specially designed 3 prime barcodes (aka UIDs) to tag short DNA reads which were originally sourced from the same long fragment of DNA; subsequently these specially barcoded reads are sequenced on standard short read platforms. This approach results in interesting compromises. Each long fragment of DNA is covered only sparsely by short reads, no information about the relative ordering of reads from the same fragment is preserved, and typically each 3 prime barcode matches reads from 5-20 long fragments of DNA. However, the cost per base to sequence is far lower than single molecule long read sequencing systems, far less input DNA is required, and the error rate is that of standard short-reads. Linked-reads represent a new set of algorithmic challenges. In this paper we formally describe one particular issue common to all applications of linked-read technology: the deconvolution of reads with a single barcode into clusters that correspond to a single long fragment of DNA. We introduce Minerva, A graph-based algorithm which approximately solves the barcode deconvolution problem for metagenomic data (where reference genomes may be incomplete or unavailable). Additionally, we demonstrate that deconvolved barcoded reads significantly improve downstream results by improving the specificity of taxonomic assignments, and by improving the ability of topic models to identify clusters of related sequences.
- Downloaded 1,367 times
- Download rankings, all-time:
- Site-wide: 11,354
- In bioinformatics: 1,438
- Year to date:
- Site-wide: 17,623
- Since beginning of last month:
- Site-wide: 75,760
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!