Rxivist logo

Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 70,177 bioRxiv papers from 306,448 authors.

Estimating the functional impact of INDELs in transcription factor binding sites: a genome-wide landscape

By Esben Eickhardt, Thomas Damm Als, Jakob Grove, Anders D. Børglum, Francesco Lescai

Posted 07 Jun 2016
bioRxiv DOI: 10.1101/057604

Background: Variants in transcription factor binding sites (TFBSs) may have important regulatory effects, as they have the potential to alter transcription factor (TF) binding affinities and thereby affecting gene expression. With recent advances in sequencing technologies the number of variants identified in TFBSs has increased, hence understanding their role is of significant interest when interpreting next generation sequencing data. Current methods have two major limitations: they are limited to predicting the functional impact of single nucleotide variants (SNVs) and often rely on additional experimental data, laborious and expensive to acquire. We propose a purely bioinformatic method that addresses these two limitations while providing comparable results. Results: Our method uses position weight matrices and a sliding window approach, in order to account for the sequence context of variants, and scores the consequences of both SNVs and INDELs in TFBSs. We tested the accuracy of our method in two different ways. Firstly, we compared it to a recent method based on DNase I hypersensitive sites sequencing (DHS-seq) data designed to predict the effects of SNVs: we found a significant correlation of our score both with their DHS-seq data and their prediction model. Secondly, we called INDELs on publicly available DHS-seq data from ENCODE, and found our score to represent well the experimental data. We concluded that our method is reliable and we used it to describe the landscape of variation in TFBSs in the human genome, by scoring all variants in the 1000 Genomes Project Phase 3. Surprisingly, we found that most insertions have neutral effects on binding sites, while deletions, as expected, were found to have the most severe TFBS-scores. We identified four categories of variants based on their TFBS-scores and tested them for enrichment of variants classified as pathogenic, benign and protective in ClinVar: we found that the variants with the most negative TFBS-scores have the most significant enrichment for pathogenic variants. Conclusions: Our method addresses key shortcomings of currently available bioinformatic tools in predicting the effects of INDELs in TFBSs, and provides an unprecedented window into the genome-wide landscape of INDELs, their predicted influences on TF binding, and potential relevance for human diseases. We thus offer an additional tool to help prioritising non-coding variants in sequencing studies.

Download data

  • Downloaded 456 times
  • Download rankings, all-time:
    • Site-wide: 25,051 out of 70,182
    • In bioinformatics: 3,463 out of 6,883
  • Year to date:
    • Site-wide: 47,728 out of 70,182
  • Since beginning of last month:
    • Site-wide: 53,123 out of 70,182

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)