New guidelines for DNA methylome studies regarding 5-hydroxymethylcytosine for understanding transcriptional regulation
Many DNA methylome profiling methods cannot distinguish between 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC). Since 5mC typically acts as a repressive mark whereas 5hmC is an intermediate form during active demethylation, the inability to separate their signals could lead to incorrect interpretation of the data. Meanwhile, many analysis pipelines quantify methylation level by the count or ratio of methylated reads, but the proportion of discordant reads (PDR) has recently been proposed to be a better indicator of gene expression level. Is the amount of extra information contained in 5hmC signals and PDR worth the additional experimental and computational costs? Here we combine whole-genome bisulfite sequencing (WGBS) and oxidative WGBS (oxWGBS) data in normal human lung and liver tissues and their paired tumors to investigate the quantitative relationships between gene expression and signals of the two forms of DNA methylation at promoters, transcript bodies, and immediate downstream regions. We find that 5mC and 5hmC signals correlate with gene expression in the same direction in most samples, but considering both types of signals increases the accuracy of expression levels inferred from methylation data by a median of 18.2% as compared to having only standard WGBS data, showing that the two forms of methylation provide complementary information about gene expression. In addition, differential analysis between matched tumor and normal pairs is particularly affected by the superposition of 5mC and 5hmC signals in WGBS data, with at least 25-40% of the differentially methylated regions (DMRs) identified from 5mC signals not detected from WGBS data. We do not find PDR to be more informative about expression levels than ratio of methylated reads, and integrating the two types of methylation features only improves the accuracy of inferred expression levels by at most 9.8%. Our results also confirm previous finding that methylation signals at transcript bodies are more indicative of gene expression levels than promoter methylation signals, and further show that in addition to the first exon, methylation signals at the last exon and internal introns also contain non-redundant information about gene expression. Overall, our study provides concrete data for evaluating the cost effectiveness of some experimental and analysis options in the study of DNA methylation in normal and cancer samples.
- Downloaded 450 times
- Download rankings, all-time:
- Site-wide: 84,810
- In genomics: 5,446
- Year to date:
- Site-wide: None
- Since beginning of last month:
- Site-wide: 92,348
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!