Rxivist logo

Detection of rare disease-related genetic variants using the birthday model

By Yael Berstein, Shane E McCarthy, Melissa Kramer, W Richard McCombie

Posted 07 Nov 2018
bioRxiv DOI: 10.1101/464842

Motivation: Exome sequencing is a powerful technique for the identification of disease-causing genes. A number of Mendelian inherited disease genes have been identified through this method. However, it remains a challenge to leverage exome sequencing for the study of complex disorders, such as schizophrenia and bipolar disorder, due to the genetic and phenotypic heterogeneity of these disorders. Although not feasible for many studies, sequencing large sample sizes (>10,000) may improve statistical power to associate more variants, while the aggregation of distinct rare variants associated with a given disease can make the identification of causal genes statistically challenging. Therefore, new methods for rare variant association are imperative to identify causative genes of complex disorders. Results: Here we propose a method to predict causative rare variants using a popular probabilistic problem: The Birthday Model, which estimates the probability that multiple individuals in a group share the same birthday. We consider the probability and coincidence of samples sharing a variant akin to the chance of individuals sharing the same birthday. We investigated the parameter effects of our model, providing guidelines for its use and interpretation of the results. Using published data on autism spectrum disorder, hypertriglyceridemia in addition to a current case-control study on bipolar disorder, we evaluated this probabilistic method to identify potential causative variants. Several genes in the top results of the case-control study were associated with autism spectrum and bipolar disorder. Given that the core probability based on the birthday model is very sensitive to low recurrence, the method successfully tests the association of rare variants, which generally do not provide enough signal in commonly used statistical tests. Importantly, the simplicity of the model allows quick interpretation of genomic data, enabling users to select gene candidates for further biological validation of specific mutations and downstream functional or other studies.

Download data

  • Downloaded 382 times
  • Download rankings, all-time:
    • Site-wide: 78,875
    • In bioinformatics: 7,245
  • Year to date:
    • Site-wide: 114,935
  • Since beginning of last month:
    • Site-wide: 117,980

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide