Rxivist logo

Accurate estimation of intrinsic biases for improved analysis of chromatin accessibility sequencing data using SELMA

By Shengen Shawn Hu, Lin Liu, Qi Li, Wenjing Ma, Michael J Guertin, Clifford A Meyer, Ke Deng, Tingting Zhang, Chongzhi Zang

Posted 24 Oct 2021
bioRxiv DOI: 10.1101/2021.10.22.465530

Genome-wide profiling of chromatin accessibility by DNase-seq or ATAC-seq has been widely used to identify regulatory DNA elements and transcription factor binding sites. However, enzymatic DNA cleavage exhibits intrinsic sequence biases that confound chromatin accessibility profiling data analysis. Existing computational tools are limited in their ability to account for such intrinsic biases. Here, we present Simplex Encoded Linear Model for Accessible Chromatin (SELMA), a computational method for systematic estimation of intrinsic cleavage biases from genomic chromatin accessibility profiling data. We demonstrate that SELMA yields accurate and robust bias estimation from both bulk and single-cell DNase-seq and ATAC-seq data. We show that transcription factor binding inference from DNase footprints can be improved by incorporating estimated biases using SELMA. We also demonstrate improved cell clustering of single-cell ATAC-seq data by considering the SELMA-estimated bias effect. SELMA can be applied to existing bioinformatics tools to improve the analysis of chromatin accessibility sequencing data.

Download data

  • Downloaded 311 times
  • Download rankings, all-time:
    • Site-wide: 117,665
    • In bioinformatics: 9,645
  • Year to date:
    • Site-wide: 29,233
  • Since beginning of last month:
    • Site-wide: 22,308

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide