Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 54,968 bioRxiv papers from 253,677 authors.
Storing, transmitting, and archiving the amount of data produced by next generation sequencing is becoming a significant computational burden. For example, large-scale RNA-seq meta-analyses may now routinely process tens of terabytes of sequence. We present here an approach to biological sequence compression that reduces the difficulty associated with managing the data produced by large-scale transcriptome sequencing. Our approach offers a new direction by sitting between pure reference-based compression and reference-free compression and combines much of the benefit of reference-based approaches with the flexibility of de novo encoding. Our method, called path encoding, draws a connection between storing paths in de Bruijn graphs --- a common task in genome assembly --- and context-dependent arithmetic coding. Supporting this method is a system, called a bit tree, to compactly store sets of kmers that is of independent interest. Using these techniques, we are able to encode RNA-seq reads using 3% -- 11% of the space of the sequence in raw FASTA files, which is on average more than 34% smaller than recent competing approaches. We also show that even if the reference is very poorly matched to the reads that are being encoded, good compression can still be achieved.
- Downloaded 615 times
- Download rankings, all-time:
- Site-wide: 12,108 out of 54,968
- In bioinformatics: 1,989 out of 5,689
- Year to date:
- Site-wide: 48,001 out of 54,968
- Since beginning of last month:
- Site-wide: 45,333 out of 54,968
Downloads over time
Distribution of downloads per paper, site-wide
- Top preprints of 2018
- Paper search
- Author leaderboards
- Overall metrics
- The API
- Email newsletter
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!