Rxivist logo

New guidelines for DNA methylome studies regarding 5-hydroxymethylcytosine for understanding transcriptional regulation

By Le Li, Yuwei Gao, Qiong Wu, Alfred S. L. Cheng, Kevin Yip

Posted 30 May 2018
bioRxiv DOI: 10.1101/334318 (published DOI: 10.1101/gr.240036.118)

Many DNA methylome profiling methods cannot distinguish between 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC). Since 5mC typically acts as a repressive mark whereas 5hmC is an intermediate form during active demethylation, the inability to separate their signals could lead to incorrect interpretation of the data. Meanwhile, many analysis pipelines quantify methylation level by the count or ratio of methylated reads, but the proportion of discordant reads (PDR) has recently been proposed to be a better indicator of gene expression level. Is the amount of extra information contained in 5hmC signals and PDR worth the additional experimental and computational costs? Here we combine whole-genome bisulfite sequencing (WGBS) and oxidative WGBS (oxWGBS) data in normal human lung and liver tissues and their paired tumors to investigate the quantitative relationships between gene expression and signals of the two forms of DNA methylation at promoters, transcript bodies, and immediate downstream regions. We find that 5mC and 5hmC signals correlate with gene expression in the same direction in most samples, but considering both types of signals increases the accuracy of expression levels inferred from methylation data by a median of 18.2% as compared to having only standard WGBS data, showing that the two forms of methylation provide complementary information about gene expression. In addition, differential analysis between matched tumor and normal pairs is particularly affected by the superposition of 5mC and 5hmC signals in WGBS data, with at least 25-40% of the differentially methylated regions (DMRs) identified from 5mC signals not detected from WGBS data. We do not find PDR to be more informative about expression levels than ratio of methylated reads, and integrating the two types of methylation features only improves the accuracy of inferred expression levels by at most 9.8%. Our results also confirm previous finding that methylation signals at transcript bodies are more indicative of gene expression levels than promoter methylation signals, and further show that in addition to the first exon, methylation signals at the last exon and internal introns also contain non-redundant information about gene expression. Overall, our study provides concrete data for evaluating the cost effectiveness of some experimental and analysis options in the study of DNA methylation in normal and cancer samples.

Download data

  • Downloaded 450 times
  • Download rankings, all-time:
    • Site-wide: 84,810
    • In genomics: 5,446
  • Year to date:
    • Site-wide: None
  • Since beginning of last month:
    • Site-wide: 92,348

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide