Rxivist logo

treeclimbR pinpoints the data-dependent resolution of hierarchical hypotheses

By Ruizhu Huang, Charlotte Soneson, Pierre-Luc Germain, Thomas SB Schmidt, Christian von Mering, Mark D Robinson

Posted 09 Jun 2020
bioRxiv DOI: 10.1101/2020.06.08.140608

The arrangement of hypotheses in a hierarchical structure (e.g., phylogenies, cell types) appears in many research fields and indicates different resolutions at which data can be interpreted. A common goal is to find a representative resolution that gives high sensitivity to identify relevant entities (e.g., microbial taxa or cell subpopulations) that are related to a phenotypic outcome (e.g. disease status) while controlling false detections, therefore providing a more compact view of detected entities and summarizing characteristics shared among them. Current methods, either performing hypothesis tests at an arbitrary resolution or testing hypotheses at all possible resolutions leading to nested results, are suboptimal. Moreover, they are not flexible enough to work in situations where each entity has multiple features to consider and different resolutions might be required for different features. For example, in single cell RNA-seq data, an increasing focus is to find differential state genes that change expression within a cell subpopulation in response to an external stimulus. Such differential expression might occur at different resolutions (e.g., all cells or a small set of cells) for different genes. Our new algorithm treeclimbR is designed to fill this gap by exploiting a hierarchical tree of entities, proposing multiple candidates that capture the latent signal and pinpointing branches or leaves that contain features of interest, in a data-driven way. It outperforms currently available methods on synthetic data, and we highlight the approach on various applications, including microbiome and microRNA surveys as well as single cell cytometry and RNA-seq datasets. With the emergence of various multi-resolution genomic datasets, treeclimbR provides a thorough inspection on entities across resolutions and gives additional flexibility to uncover biological associations. ### Competing Interest Statement The authors have declared no competing interest.

Download data

  • Downloaded 855 times
  • Download rankings, all-time:
    • Site-wide: 32,909
    • In bioinformatics: 3,612
  • Year to date:
    • Site-wide: 28,713
  • Since beginning of last month:
    • Site-wide: 117,343

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide