Rxivist logo

Benchmarking Association Analyses of Continuous Exposures with RNA-seq in Observational Studies

By Tamar Sofer, Nuzulul Kurniansyah, Francois Aguet, Kristin Ardlie, Peter Durda, Deborah A. Nickerson, Joshua D Smith, Yongmei Liu, Sina A Gharib, Susan Redline, Stephen Rich, Jerome Rotter, Kent Taylor

Posted 13 Feb 2021
bioRxiv DOI: 10.1101/2021.02.12.430989

Large datasets of hundreds to thousands of individuals measuring RNA-seq in observational studies are becoming available. Many popular software packages for analysis of RNA-seq data were constructed to study differences in expression signatures in an experimental design with well-defined conditions (exposures). In contrast, observational studies may have varying levels of confounding of the transcript-exposure associations; further, exposure measures may vary from discrete (exposed, yes/no) to continuous (levels of exposure), with non-normal distributions of exposure. We compare popular software for gene expression - DESeq2, edgeR, and limma - as well as linear regression-based analyses for studying the association of continuous exposures with RNA-seq. We developed a computation pipeline that includes transformation, filtering, and generation of empirical null distribution of association p-values, and we apply the pipeline to compute empirical p-values with multiple testing correction. We employ a resampling approach that allows for assessment of false positive detection across methods, power comparison, and the computation of quantile empirical p-values. The results suggest that linear regression methods are substantially faster with better control of false detections than other methods, even with the resampling method to compute empirical p-values. We provide the proposed pipeline with fast algorithms in R.

Download data

  • Downloaded 112 times
  • Download rankings, all-time:
    • Site-wide: 150,636
    • In bioinformatics: 11,428
  • Year to date:
    • Site-wide: 77,168
  • Since beginning of last month:
    • Site-wide: 126,464

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide