The weighting is the hardest part: on the behavior of the likelihood ratio test and score test under weight misspecification in rare variant association studies
Rare variant association studies are at a critical inflexion point with the increasing availability of exome-sequencing data. A popular test of association is the sequence kernel association test (SKAT). Weights are embedded within SKAT to reflect the hypothesized contribution of the variants to the trait variance. Correct weighting is expected to boost power, and yet the correct weights are generally unknown. It is therefore important to assess the effect of weight misspecification in SKAT. We evaluated the behavior of the score and likelihood ratio tests (LRT) under weight misspecification. Simulation and empirical results revealed that LRT is generally more robust and more powerful than score test in such a circumstance. For instance, when the simulated betas were larger for rarer than for more common variants, (incorrectly) assigning equal weights reduced the power of the LRT by ~5%, while the power of the score test dropped by ~30%. To optimize weighting we proposed a data-driven weighting scheme. With this scheme and LRT we detected significant enrichment of rare case mutations (MAF<5%; P-value=7E-04) of a set of constrained genes in the Swedish schizophrenia case-control cohort with exome-sequencing data. The score test is currently preferred for its computational efficiency and power. Indeed, assuming correct specification, in some circumstances the score test is the most powerful test. However, LRT has the compelling qualities of being generally more powerful and more robust to misspecification. This is an important result given that, arguably, misspecified models are likely to be the rule rather than the exception in weighting-based approaches.
- Downloaded 1,092 times
- Download rankings, all-time:
- Site-wide: 21,165
- In genetics: 970
- Year to date:
- Site-wide: 95,113
- Since beginning of last month:
- Site-wide: 123,786
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!