Rxivist logo

Comparison of Clustering Methods for High-Dimensional Single-Cell Flow and Mass Cytometry Data

By Lukas M Weber, Mark D Robinson

Posted 08 Apr 2016
bioRxiv DOI: 10.1101/047613 (published DOI: 10.1002/cyto.a.23030)

Recent technological developments in high-dimensional flow cytometry and mass cytometry (CyTOF) have made it possible to detect expression levels of dozens of protein markers in thousands of cells per second, allowing cell populations to be characterized in unprecedented detail. Traditional data analysis by "manual gating" can be inefficient and unreliable in these high-dimensional settings, which has led to the development of a large number of automated analysis methods. Methods designed for unsupervised analysis use specialized clustering algorithms to detect and define cell populations for further downstream analysis. Here, we have performed an up-to-date, extensible performance comparison of clustering methods for high-dimensional flow and mass cytometry data. We evaluated methods using several publicly available data sets from experiments in immunology, containing both major and rare cell populations, with cell population identities from expert manual gating as the reference standard. Several methods performed well, including FlowSOM, X-shift, PhenoGraph, Rclusterpp, and flowMeans. Among these, FlowSOM had extremely fast runtimes, making this method well-suited for interactive, exploratory analysis of large, high-dimensional data sets on a standard laptop or desktop computer. These results extend previously published comparisons by focusing on high-dimensional data and including new methods developed for CyTOF data. R scripts to reproduce all analyses are available from GitHub (https://github.com/lmweber/cytometry-clustering-comparison), and pre-processed data files are available from FlowRepository (FR-FCM-ZZPH), allowing our comparisons to be extended to include new clustering methods and reference data sets.

Download data

  • Downloaded 4,830 times
  • Download rankings, all-time:
    • Site-wide: 2,672
    • In bioinformatics: 206
  • Year to date:
    • Site-wide: 38,504
  • Since beginning of last month:
    • Site-wide: 57,817

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide