Rxivist logo

ReSeq simulates realistic Illumina high-throughput sequencing data

By Stephan Schmeing, Mark D Robinson

Posted 17 Jul 2020
bioRxiv DOI: 10.1101/2020.07.17.209072

In high-throughput sequencing data, performance comparisons between computational tools are essential for making informed decisions in the data processing from raw data to the scientific result. Simulations are a critical part of method comparisons, but for standard Illumina sequencing of genomic DNA, they are often oversimplified, which leads to optimistic results for most tools. ReSeq improves the authenticity of synthetic data by extracting and reproducing key components from real data. Major advancements are the inclusion of systematic errors, a fragment-based coverage model and sampling-matrix estimates based on two-dimensional margins. These improvements lead to a better representation of the original k-mer spectrum and more faithful performance evaluations. ReSeq and all of its code are available at: https://github.com/schmeing/ReSeq ### Competing Interest Statement The authors have declared no competing interest.

Download data

  • Downloaded 586 times
  • Download rankings, all-time:
    • Site-wide: 54,726
    • In genomics: 4,100
  • Year to date:
    • Site-wide: 46,878
  • Since beginning of last month:
    • Site-wide: 80,313

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide