Rxivist logo

QUBIC2: A novel biclustering algorithm for large-scale bulk RNA-sequencing and single-cell RNA-sequencing data analysis

By Juan Xie, Anjun Ma, Yu Zhang, Bingqiang Liu, Changlin Wan, Sha Cao, Chi Zhang, Qin Ma

Posted 07 Sep 2018
bioRxiv DOI: 10.1101/409961

The combination of biclustering and large-scale gene expression data holds a promising potential for inference of the condition specific functional pathways/networks. However, existing biclustering tools do not have satisfied performance on high-resolution RNA-sequencing (RNA-Seq) data, majorly due to the lack of (i) a consideration of high sparsity of RNA-Seq data, e.g., the massive zeros or lowly expressed genes in the data, especially for single-cell RNA-Seq (scRNA-Seq) data, and (ii) an understanding of the underlying transcriptional regulation signals of the observed gene expression values. Here we presented a novel biclustering algorithm namely QUBIC2, for the analysis of large-scale bulk RNA-Seq and scRNA-Seq data. Key novelties of the algorithm include (i) used a truncated model to handle the unreliable quantification of genes with low or moderate expression, (ii) adopted the mixture Gaussian distribution and an information-divergency objective function to capture shared transcriptional regulation signals among a set of genes, (iii) utilized a Core-Dual strategy to identify biclusters and optimize relevant parameters, and (iv) developed a size-based P-value framework to evaluate the statistical significances of all the identified biclusters. Our method validation on comprehensive data sets of bulk and single-cell RNA-seq data suggests that QUBIC2 had superior performance in functional modules detection and cell type classification compared with the other five widely-used biclustering tools. In addition, the applications of temporal and spatial data demonstrated that QUBIC2 can derive meaningful biological information from scRNA-Seq data. The source code for QUBIC2 can be freely accessed at https://github.com/maqin2001/qubic2.

Download data

  • Downloaded 926 times
  • Download rankings, all-time:
    • Site-wide: 25,599
    • In bioinformatics: 2,944
  • Year to date:
    • Site-wide: 91,138
  • Since beginning of last month:
    • Site-wide: 111,586

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide