Rxivist logo

Big data and single cell transcriptomics: implications for ontological representation

By Brian D. Aevermann, Mark Novotny, Trygve Bakken, Jeremy A. Miller, Alexander D Diehl, David Osumi-Sutherland, Roger S Lasken, Ed S Lein, Richard H. Scheuermann

Posted 31 Jan 2018
bioRxiv DOI: 10.1101/257352 (published DOI: 10.1093/hmg/ddy100)

Cells are fundamental functional units of multicellular organisms, with different cell types playing distinct physiological roles in the body. The recent advent of single cell transcriptional profiling using RNA sequencing is producing "big data", enabling the identification of novel human cell types at an unprecedented rate. In this review, we summarize recent work characterizing cell types in the human central nervous and immune systems using single cell and single nuclei RNA sequencing, and discuss the implications that these discoveries are having on the representation of cell types in the reference Cell Ontology (CL). We propose a method based on random forest machine learning for identifying sets of necessary and sufficient marker genes that can be used to assemble consistent and reproducible cell type definitions for incorporation into the CL. The representation of defined cell type classes and their relationships in the CL using this strategy will make the cell type classes findable, accessible, interoperable, and reusable (FAIR), allowing the CL to serve as a reference knowledgebase of information about the role that distinct cellular phenotypes play in human health and disease.

Download data

  • Downloaded 838 times
  • Download rankings, all-time:
    • Site-wide: 44,242
    • In bioinformatics: 4,306
  • Year to date:
    • Site-wide: 178,715
  • Since beginning of last month:
    • Site-wide: 175,660

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide