Rxivist logo

Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 62,719 bioRxiv papers from 278,291 authors.

PyBoost: A parallelized Python implementation of 2D boosting with hierarchies

By Peyton G Greenside, Nadine Hussami, Jessica Chang, Anshul Kundaje

Posted 31 Jul 2017
bioRxiv DOI: 10.1101/170803

Gene expression is controlled by networks of transcription factors that bind specific sequence motifs in regulatory DNA elements such as promoters and enhancers. GeneClass is a boosting-based algorithm that learns gene regulatory networks from complementary paired feature sets such as transcription factor expression levels and binding motifs across conditions. This algorithm can be used to predict functional genomics measures of cell state, such as gene expression and chromatin accessibility, in different cellular conditions. We present a parallelized, Python-based implementation of GeneClass, called PyBoost, along with a novel hierarchical implementation of the algorithm, called HiBoost. HiBoost allows regulatory logic to be constrained to a hierarchical group of conditions or cell types. The software can be used to dissect differentiation cascades, time courses or other perturbation data that naturally form a hierarchy or trajectory. We demonstrate the application of PyBoost and HiBoost to learn regulators of tadpole tail regeneration and hematopoeitic stem cell differentiation and validate learned regulators through an inducible CRISPR system.

Download data

  • Downloaded 530 times
  • Download rankings, all-time:
    • Site-wide: 17,787 out of 62,719
    • In bioinformatics: 2,684 out of 6,243
  • Year to date:
    • Site-wide: 45,925 out of 62,719
  • Since beginning of last month:
    • Site-wide: 55,730 out of 62,719

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)


News