Rxivist logo

PyBoost: A parallelized Python implementation of 2D boosting with hierarchies

By Peyton G Greenside, Nadine Hussami, Jessica Chang, Anshul Kundaje

Posted 31 Jul 2017
bioRxiv DOI: 10.1101/170803

Gene expression is controlled by networks of transcription factors that bind specific sequence motifs in regulatory DNA elements such as promoters and enhancers. GeneClass is a boosting-based algorithm that learns gene regulatory networks from complementary paired feature sets such as transcription factor expression levels and binding motifs across conditions. This algorithm can be used to predict functional genomics measures of cell state, such as gene expression and chromatin accessibility, in different cellular conditions. We present a parallelized, Python-based implementation of GeneClass, called PyBoost, along with a novel hierarchical implementation of the algorithm, called HiBoost. HiBoost allows regulatory logic to be constrained to a hierarchical group of conditions or cell types. The software can be used to dissect differentiation cascades, time courses or other perturbation data that naturally form a hierarchy or trajectory. We demonstrate the application of PyBoost and HiBoost to learn regulators of tadpole tail regeneration and hematopoeitic stem cell differentiation and validate learned regulators through an inducible CRISPR system.

Download data

  • Downloaded 588 times
  • Download rankings, all-time:
    • Site-wide: 24,034 out of 85,120
    • In bioinformatics: 3,366 out of 8,142
  • Year to date:
    • Site-wide: 76,443 out of 85,120
  • Since beginning of last month:
    • Site-wide: 65,312 out of 85,120

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)