Rxivist logo

Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 62,734 bioRxiv papers from 278,354 authors.

Exploiting regulatory heterogeneity to systematically identify enhancers with high accuracy

By Hamutal Arbel, William W. Fisher, Ann S. Hammonds, Kenneth H. Wan, Soo Park, Richard Weiszmann, Soile Keränen, Clara Henriquez, Omid Shams Solari, Peter Bickel, Mark D. Biggin, Susan E Celniker, James B Brown

Posted 18 Jan 2018
bioRxiv DOI: 10.1101/250241 (published DOI: 10.1073/pnas.1808833115)

Identifying functional enhancers elements in metazoan systems is a major challenge. For example, large-scale validation of enhancers predicted by ENCODE reveal false positive rates of at least 70%. Here we use the pregrastrula patterning network of Drosophila melanogaster to demonstrate that loss in accuracy in held out data results from heterogeneity of functional signatures in enhancer elements. We show that two classes of enhancer are active during early Drosophila embryogenesis and that by focusing on a single, relatively homogeneous class of elements, over 98% prediction accuracy can be achieved in a balanced, completely held-out test set. The class of well predicted elements is composed predominantly of enhancers driving multi-stage, segmentation patterns, which we designate segmentation driving enhancers (SDE). Prediction is driven by the DNA occupancy of early developmental transcription factors, with almost no additional power derived from histone modifications. We further show that improved accuracy is not a property of a particular prediction method: after conditioning on the SDE set, naive Bayes and logistic regression perform as well as more sophisticated tools. Applying this method to a genome-wide scan, we predict 1,640 SDEs that cover 1.6% of the genome, 916 of which are novel. An analysis of 32 novel SDEs using wholemount embryonic imaging of stably integrated reporter constructs chosen throughout our prediction rank-list showed >90% drove expression patterns. We achieved 86.7% precision on a genome-wide scan, with an estimated recall of at least 98%, indicating high accuracy and completeness in annotating this class of functional elements.

Download data

  • Downloaded 613 times
  • Download rankings, all-time:
    • Site-wide: 14,671 out of 62,734
    • In genomics: 1,910 out of 4,316
  • Year to date:
    • Site-wide: 15,023 out of 62,734
  • Since beginning of last month:
    • Site-wide: 60,462 out of 62,734

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)


News