Rxivist logo

DAGBagM: Learning directed acyclic graphs of mixed variables with an application to identify prognostic protein biomarkers in ovarian cancer

By Shrabanti Chowdhury, Ru Wang, Qing Yu, Catherine J. Huntoon, Larry M. Karnitz, Scott H Kaufmann, Steven P Gygi, Michael J. Birrer, Amanda G. Paulovich, Jie Peng, Pei Wang

Posted 27 Oct 2020
bioRxiv DOI: 10.1101/2020.10.26.349076

Directed gene/protein regulatory networks inferred by applying directed acyclic graph (DAG) models to proteogenomic data has been shown effective for detecting causal biomarkers of clinical outcomes. However, there remain unsolved challenges in DAG learning to jointly model clinical outcome variables, which often take binary values, and biomarker measurements, which usually are continuous variables. Therefore, in this paper, we propose a new tool, DAGBagM, to learn DAGs with both continuous and binary nodes. By using appropriate models for continuous and binary variables, DAGBagM allows for either type of nodes to be parents or children nodes in the learned graph. DAGBagM also employs a bootstrap aggregating strategy to reduce false positives and achieve better estimation accuracy. Moreover, the aggregation procedure provides a flexible framework to robustly incorporate prior information on edges for DAG reconstruction. As shown by simulation studies, DAGBagM performs better in identifying edges between continuous and binary nodes, as compared to commonly used strategies of either treating binary variables as continuous or discretizing continuous variables. Moreover, DAGBagM outperforms several popular DAG structure learning algorithms including the score-based hill climbing (HC) algorithm, constraint-based PCalgorithm (PC-alg), and the hybrid method max-min hill climbing (MMHC) even for constructing DAG with only continuous nodes. The HC implementation in the R package DAGBagM is much faster than that in a widely used DAG learning R package bnlearn. When applying DAGBagM to proteomics datasets from ovarian cancer studies, we identify potential prognostic protein biomarkers in ovarian cancer. DAGBagM is made available as a github repository https://github.com/jie108/dagbagM. ### Competing Interest Statement The authors have declared no competing interest.

Download data

  • Downloaded 206 times
  • Download rankings, all-time:
    • Site-wide: 117,620
    • In systems biology: 2,616
  • Year to date:
    • Site-wide: 44,629
  • Since beginning of last month:
    • Site-wide: 68,228

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide