Rxivist logo

W2RAP: a pipeline for high quality, robust assemblies of large complex genomes from short read data

By Bernardo Clavijo, Gonzalo GarcĂ­a-Accinelli, Jonathan Wright, Darren Heavens, Katie Barr, Luis Yanes, Federica Di-Palma

Posted 22 Feb 2017
bioRxiv DOI: 10.1101/110999

Producing high-quality whole-genome shotgun de novo assemblies from plant and animal species with large and complex genomes using low-cost short read sequencing technologies remains a challenge. But when the right sequencing data, with appropriate quality control, is assembled using approaches focused on robustness of the process rather than maximization of a single metric such as the usual contiguity estimators, good quality assemblies with informative value for comparative analyses can be produced. Here we present a complete method described from data generation and qc all the way up to scaffold of complex genomes using Illumina short reads and its application to data from plants and human datasets. We show how to use the w2rap pipeline following a metric-guided approach to produce cost-effective assemblies. The assemblies are highly accurate, provide good coverage of the genome and show good short range contiguity. Our pipeline has already enabled the rapid, cost-effective generation of de novo genome assemblies from large, polyploid crop species with a focus on comparative genomics.

Download data

  • Downloaded 1,944 times
  • Download rankings, all-time:
    • Site-wide: 9,835
    • In bioinformatics: 1,123
  • Year to date:
    • Site-wide: 35,836
  • Since beginning of last month:
    • Site-wide: 49,572

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide