Rxivist logo

Scale-invariant geometric data analysis (SIGDA) provides robust, detailed visualizations of human ancestry specific to individuals and populations

By Max Robinson, Anat Zimmer, Terry Farrah, Denise E. Mauldin, Nathan D Price, Leroy E Hood, Gustavo Glusman

Posted 03 Oct 2018
bioRxiv DOI: 10.1101/431585

Scale invariance is a common property of physical laws and a key concept in perspective drawing, which aims to provide a meaningful two-dimensional representation of a more complex, three-dimensional scene. Here we describe Scale Invariant Geometric Data Analysis (SIGDA), a new, general exploratory data analysis (EDA) method based on normalization of data to scale invariance. We discuss similarities and differences between SIGDA and two widely-used EDA methods, Correspondence Analysis (CA) and Principal Components Analysis (PCA). We then illustrate SIGDA's ability to analyze and visualize population structure relationships within the data that inspired its development: genetic marker data, in which context PCA is considered a standard method. We show that SIGDA provides significant advantages over PCA of the same data, including: (a) robust detection and separation of a larger number of population axes, leading to (b) better separation of annotated populations; (c) separation of an independent allele frequency axis interpretable as a proxy for allele age, (d) visualization of marker flow between populations (population history), and (d) robust detection and visualization of relationships between closely-related individuals and among family groups. Although this illustration focuses on a specific task, SIGDA is a general-purpose EDA method and derives its advantages from its novel approach to fundamental issues in data analysis, rather than clever sampling or other task-specific methodology.

Download data

  • Downloaded 416 times
  • Download rankings, all-time:
    • Site-wide: 80,277
    • In bioinformatics: 7,353
  • Year to date:
    • Site-wide: 112,830
  • Since beginning of last month:
    • Site-wide: 115,675

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide