Rxivist logo

TWO-SIGMA: a novel TWO-component SInGle cell Model-based Association method for single-cell RNA-seq data

By Eric Van Buren, Ming Hu, Chen Weng, Fulai Jin, Yan Li, Di Wu, Yun Li

Posted 22 Jul 2019
bioRxiv DOI: 10.1101/709238 (published DOI: 10.1002/gepi.22361)

In this paper, we develop TWO-SIGMA, a TWO-component SInGle cell Model-based Association method for differential expression (DE) analyses in single-cell RNA-seq (scRNA-seq) data. The first component models the probability of “drop-out” with a mixed-effects logistic regression model and the second component models the (conditional) mean expression with a mixed-effects negative binomial regression model. TWO-SIGMA is extremely flexible in that it: (i) does not require a log-transformation of the outcome, (ii) allows for overdispersed and zero-inflated counts, (iii) accommodates a correlation structure between cells from the same biological sample via random effect terms, (iv) can analyze unbalanced designs (in which the number of cells does not need to be identical for all samples), (v) can control for additional sample-level and cell-level covariates including batch effects, (vi) provides interpretable effect size estimates, and (vii) enables general tests of DE beyond two-group comparisons. To our knowledge, TWO-SIGMA is the only method for analyzing scRNA-seq data that can simultaneously accomplish each of these features. Simulations studies show that TWO-SIGMA outperforms alternative regression-based approaches in both type-I error control and power enhancement when the data contains even moderate within-sample correlation. A real data analysis using pancreas islet single-cells exhibits the flexibility of TWO-SIGMA and demonstrates that incorrectly failing to include random effect terms can have dramatic impacts on scientific conclusions. TWO-SIGMA is implemented in the R package twosigma available at <https://github.com/edvanburen/twosigma>.

Download data

  • Downloaded 651 times
  • Download rankings, all-time:
    • Site-wide: 43,145
    • In genomics: 3,483
  • Year to date:
    • Site-wide: 46,559
  • Since beginning of last month:
    • Site-wide: 99,163

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide