Exploiting regulatory heterogeneity to systematically identify enhancers with high accuracy
William W. Fisher,
Ann S. Hammonds,
Kenneth H. Wan,
Omid Shams Solari,
Mark D. Biggin,
Susan E Celniker,
James B Brown
Posted 18 Jan 2018
bioRxiv DOI: 10.1101/250241 (published DOI: 10.1073/pnas.1808833115)
Posted 18 Jan 2018
Identifying functional enhancers elements in metazoan systems is a major challenge. For example, large-scale validation of enhancers predicted by ENCODE reveal false positive rates of at least 70%. Here we use the pregrastrula patterning network of Drosophila melanogaster to demonstrate that loss in accuracy in held out data results from heterogeneity of functional signatures in enhancer elements. We show that two classes of enhancer are active during early Drosophila embryogenesis and that by focusing on a single, relatively homogeneous class of elements, over 98% prediction accuracy can be achieved in a balanced, completely held-out test set. The class of well predicted elements is composed predominantly of enhancers driving multi-stage, segmentation patterns, which we designate segmentation driving enhancers (SDE). Prediction is driven by the DNA occupancy of early developmental transcription factors, with almost no additional power derived from histone modifications. We further show that improved accuracy is not a property of a particular prediction method: after conditioning on the SDE set, naive Bayes and logistic regression perform as well as more sophisticated tools. Applying this method to a genome-wide scan, we predict 1,640 SDEs that cover 1.6% of the genome, 916 of which are novel. An analysis of 32 novel SDEs using wholemount embryonic imaging of stably integrated reporter constructs chosen throughout our prediction rank-list showed >90% drove expression patterns. We achieved 86.7% precision on a genome-wide scan, with an estimated recall of at least 98%, indicating high accuracy and completeness in annotating this class of functional elements.
- Downloaded 745 times
- Download rankings, all-time:
- Site-wide: 18,142 out of 89,274
- In genomics: 2,213 out of 5,695
- Year to date:
- Site-wide: 44,378 out of 89,274
- Since beginning of last month:
- Site-wide: 48,709 out of 89,274
Downloads over time
Distribution of downloads per paper, site-wide
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!