Rxivist logo

Automated quality control of next generation sequencing data using machine learning

By Steffen Albrecht, Miguel A. Andrade-Navarro, Jean-Fred Fontaine

Posted 14 Sep 2019
bioRxiv DOI: 10.1101/768713

Controlling quality of next generation sequencing (NGS) data files is a necessary but complex task. To address this problem, we statistically characterized common NGS quality features and developed a novel quality control procedure involving tree-based and deep learning classification algorithms. Predictive models, validated on internal data and external disease diagnostic datasets, are to some extent generalizable to data from unseen species. The derived statistical guidelines and predictive models represent a valuable resource for users of NGS data to better understand quality issues and perform automatic quality control. Our guidelines and software are available at the following URL: <https://github.com/salbrec/seqQscorer>.

Download data

  • Downloaded 871 times
  • Download rankings, all-time:
    • Site-wide: 15,263 out of 94,912
    • In bioinformatics: 2,346 out of 8,837
  • Year to date:
    • Site-wide: 3,074 out of 94,912
  • Since beginning of last month:
    • Site-wide: 6,569 out of 94,912

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)