Rxivist logo

GeneQC: A quality control tool for gene expression estimation based on RNA-sequencing reads mapping

By Adam McDermaid, Xin Chen, Yiran Zhang, Juan Xie, Cankun Wang, Qin Ma

Posted 15 Feb 2018
bioRxiv DOI: 10.1101/266445

Motivation: One of the main benefits of using modern RNA-sequencing (RNA-seq) technology is the more accurate gene expression estimations. However, numerous issues can result in the possibility that an RNA-seq read can be mapped to multiple locations on the reference genome with the same alignment scores, which occurs in plant, animal, and metagenome samples. Such a read is so-called a multiple mapping read (MMR). The impact of these MMRs is reflected in gene expression estimation and all downstream analyses, including differential gene expression, functional enrichment, etc. Current analysis pipelines lack the tools to test the reliability of gene expression estimations, thus are incapable of ensuring the validity of all downstream analyses. Results: Our investigation into 95 RNA-seq datasets from seven species (totaling 1,951GB) indicates an average of roughly 22% of all reads are MMRs for plant and animal species. Here we present a tool called GeneQC (Gene expression Quality Control), which can accurately estimate the reliability of each gene's expression level. The underlying algorithm is designed based on extracted genomic and transcriptomic features through extensive use of mathematical and statistical modeling and design. GeneQC utilizes big data-driven mathematical modeling approaches and allows researchers to determine reliable expression estimations and conduct further analysis on the gene expression that are of sufficient quality. This tool also enables researchers to investigate continued analysis to determine more accurate gene expression estimates for those with low reliability.

Download data

  • Downloaded 729 times
  • Download rankings, all-time:
    • Site-wide: 35,688
    • In bioinformatics: 3,931
  • Year to date:
    • Site-wide: 95,711
  • Since beginning of last month:
    • Site-wide: 114,259

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide