Rxivist logo

An Empirical Bayes Approach for the Identification of Long-range Chromosomal Interaction from Hi-C Data

By Qi Zhang, Zheng Xu, Yutong Lai

Posted 17 Dec 2018
bioRxiv DOI: 10.1101/497776

Hi-C experiments have become very popular for studying the 3D genome structure in recent years. Identification of long-range chromosomal interaction, i.e., peak detection, is crucial for Hi-C data analysis. But it remains a challenging task due to the inherent high dimensionality, sparsity and the over-dispersion of the Hi-C count data matrix. We propose EBHiC, an empirical Bayes approach for peak detection from Hi-C data. The proposed framework provides flexible over-dispersion modeling by explicitly including the 'true' interaction intensities as latent variables. To implement the proposed peak identification method (via the empirical Bayes test), we estimate the overall distributions of the observed counts semiparametrically using a smoothed EM algorithm, and the empirical null by discrete curve fitting. We conducted extensive simulations to validate and evaluate the performance of our proposed approach and applied it to real datasets. Our results suggest that EBHiC can better identify peaks than Fit-Hi-C in terms of accuracy, biological interpretability, and the consistency across biological replicates.

Download data

  • Downloaded 305 times
  • Download rankings, all-time:
    • Site-wide: 106,262
    • In bioinformatics: 8,959
  • Year to date:
    • Site-wide: 143,358
  • Since beginning of last month:
    • Site-wide: 122,591

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide