Rxivist logo

DAFi: A Directed Recursive Filtering and Clustering Approach to Data-Driven Identification of Cell Populations from Polychromatic Flow Cytometry Data

By Alexandra J Lee, Ivan Chang, Julie G Burel, Cecilia S Lindestam Arlehamn, Daniela Weiskopf, Bjoern Peters, Alessandro Sette, Richard H. Scheuermann, Yu Qian

Posted 26 Sep 2017
bioRxiv DOI: 10.1101/193912

Computational methods for identification of cell populations from high-dimensional flow cytometry data are changing the paradigm of cytometry bioinformatics. Data clustering is the most common computational approach to unsupervised identification of cell populations from multidimensional cytometry data. We found that combining recursive filtering and clustering with constraints converted from the user manual gating strategy can effectively identify overlapping and rare cell populations from smeared data that would have been difficult to resolve by either a single run of data clustering or manual segregation. We named this new method DAFi: Directed Automated Filtering and Identification of cell populations. Design of DAFi preserves the data-driven characteristics of unsupervised clustering for identifying novel cell-based biomarkers, but also makes the results interpretable to experimental scientists as in supervised classification through mapping and merging the high-dimensional data clusters into the user-defined 2D gating hierarchy. By recursive data filtering before clustering, DAFi can uncover small local clusters which are otherwise difficult to identify due to the statistical interference of the irrelevant major clusters. Quantitative assessment of cell type specific characteristics demonstrates that the population proportions calculated by DAFi, while being highly consistent with those by expert centralized manual gating, have smaller technical variance than those from individual manual gating analysis. Visual examination of the dot plots showed that the boundaries of the DAFi-identified cell populations followed the natural shapes of the data distributions. To further exemplify the utility of DAFi, we show that DAFi can incorporate the FLOCK clustering method to identify novel cell-based biomarkers. Implementation of DAFi supports options including clustering, bisecting, slope-based gating, and reversed filtering to meet various auto-gating needs from different scientific use cases.

Download data

  • Downloaded 506 times
  • Download rankings, all-time:
    • Site-wide: 65,802
    • In bioinformatics: 6,332
  • Year to date:
    • Site-wide: 109,359
  • Since beginning of last month:
    • Site-wide: 104,804

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide