Rxivist logo

mSigHdp: hierarchical Dirichlet process mixture modeling for mutational signature discovery

By Mo Liu, Yang Wu, Nanhai Jiang, Arnoud Boot, Steven G Rozen

Posted 02 Feb 2022
bioRxiv DOI: 10.1101/2022.01.31.478587

Mutational signatures are characteristic patterns of mutations caused by endogenous or exogenous mutational processes. These signatures can be discovered by analyzing mutations in a large set of samples - usually somatic mutations in tumor samples. Most approaches to mutational-signature discovery are based on non-negative matrix factorization. Alternatively, signatures can be inferred using hierarchical Dirichlet process (HDP) mixture models, an approach that has been relatively little explored. These models assign mutations to clusters and view each cluster of mutations as being generated from a particular mutational process. Here we describe mSigHdp, an improved approach to using HDP mixture models to discover mutational signatures. We benchmarked mSigHdp and several other programs on realistic synthetic data. For single-base mutations, mSigHdp and the widely used SigProfilerExtractor discovered signatures better than the other 2 programs and had different strengths: mSigHdp discovered more of the signatures present in the synthetic data, while SigProExtractor discovered fewer false positives. mSigHdp was better able to discover rare signatures, which may be an advantage, since most common signatures have probably already been discovered. For small insertion-and-deletion mutations, mSigHdp discovered signatures better than the other 3 programs. Thus, mSigHdp is an advance for discovering rare SBSs mutational signatures and for discovering small insertion-and-deletion mutational signatures.

Download data

  • Downloaded 466 times
  • Download rankings, all-time:
    • Site-wide: 99,207
    • In bioinformatics: 11,422
  • Year to date:
    • Site-wide: 7,369
  • Since beginning of last month:
    • Site-wide: 131,538

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide