Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 57,789 bioRxiv papers from 265,997 authors.
Background Highly accurate next-generation sequencing (NGS) of genetic variants is key to many areas of science and medicine, such as cataloguing population genetic variation and diagnosing patients with genetic diseases. Certain genomic loci and regions can be prone to higher rates of systematic sequencing and alignment bias that pose a challenge to achieving high accuracy, resulting in false positive variant calls. Current standard practices to differentiate between loci that can and cannot be sequenced with high confidence utilise consensus between different sequencing methods as a proxy for sequencing confidence. This assumption is not accurate in cases where all sequencing pipelines have consensus on the same errors due to similar systematic biases in sequencing. Alternative methods are therefore required to identify systematic biases. Methods We have developed a novel statistical method based on summarising sequenced reads from whole genome clinical samples and cataloguing them in “Incremental Databases” (IncDBs) that maintain individual confidentiality. Variant statistics were analysed and catalogued for each genomic position that consistently showed systematic biases with the corresponding sequencing pipeline. Results We have demonstrated that systematic errors in NGS data are widespread, with persistent low-fraction alleles present at 1.26-2.43% of the human autosomal genome across three different Illumina-based pipelines, each consisting of at least 150 patient samples. We have identified a variety of genomic regions that are more or less prone to systematic biases, such as GC-rich regions (OR = 6.47-8.19) and the NIST high-confidence genomic regions (OR = 0.154-0.191). We have verified our predictions on a gold-standard reference genome and have shown that these systematic biases can lead to suspect variant calls at clinically important loci, including within introns and exons. Conclusions Our results recommend increased caution to minimise the effect of systematic biases in whole genome sequencing and alignment. This study supports the utility of a statistical approach to enhance quality control of clinically sequenced samples in order to flag up variant calls made at known suspect loci for further analysis or exclusion, using anonymised summary databases from which individual patients cannot be re-identified, so that results can be shared more widely. * BAM : Binary Alignment Map (file format) BED : Browser Extensible Data (file format) cfDNA : Cell-free DNA ctDNA : Circulating tumour DNA GIAB : Genome in a Bottle (consortium) gnomAD : Genome Aggregation Database IGV : Integrative Genomics Viewer (software tool) IncDB : Incremental Database MC : Monte-Carlo NGS : Next-Generation Sequencing NIST : National Institute of Standards and Technology (organisation) SD : Standard Deviation SNPs : Single-Nucleotide Polymorphism SNVs : Single-Nucleotide Variant WGS : Whole-Genome Sequencing
- Downloaded 334 times
- Download rankings, all-time:
- Site-wide: 27,186 out of 57,789
- In bioinformatics: 3,664 out of 5,897
- Year to date:
- Site-wide: 7,075 out of 57,789
- Since beginning of last month:
- Site-wide: 4,224 out of 57,789
Downloads over time
Distribution of downloads per paper, site-wide
- Top preprints of 2018
- Paper search
- Author leaderboards
- Overall metrics
- The API
- Email newsletter
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!