Rxivist logo

Rare variants association testing for a binary outcome when pooling individual level data from heterogeneous studies

By Tamar Sofer, Na Guo

Posted 18 Apr 2020
bioRxiv DOI: 10.1101/2020.04.17.047530

Whole genome and exome sequencing studies are used to test the association of rare genetic variants with health traits. Many existing WGS efforts now aggregate data from heterogeneous groups, e.g. combining sets of individuals of European and African ancestries. We here investigate the statistical implications on rare variant association testing with a binary trait when combining together heterogeneous studies, defined as studies with potentially different disease proportion and different frequency of variant carriers. We study and compare in simulations the type 1 error control and power of the naïve Score test, the saddlepoint approximation to the score test (SPA test), and the BinomiRare test in a range of settings, focusing on low numbers of variant carriers. We show that type 1 error control and power patterns depend on both the number of carriers of the rare allele and on disease prevalence in each of the studies. We develop recommendations for association analysis of rare genetic variants. (1) The Score test is preferred when the case proportion in the sample is 50%. (2) Do not down-sample controls to balance case-control ratio, because it reduces power. Rather, use a test that controls the type 1 error. (3) Conduct stratified analysis in parallel with combined analysis. Aggregated testing may have lower power when the variant effect size differs between strata. ### Competing Interest Statement The authors have declared no competing interest.

Download data

  • Downloaded 115 times
  • Download rankings, all-time:
    • Site-wide: 92,063 out of 101,344
    • In genetics: 4,719 out of 5,037
  • Year to date:
    • Site-wide: 56,093 out of 101,344
  • Since beginning of last month:
    • Site-wide: 36,860 out of 101,344

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)


  • 20 Oct 2020: Support for sorting preprints using Twitter activity has been removed, at least temporarily, until a new source of social media activity data becomes available.
  • 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
  • 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
  • 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
  • 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
  • 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
  • 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
  • 22 Jan 2019: Nature just published an article about Rxivist and our data.
  • 13 Jan 2019: The Rxivist preprint is live!