Background: Studying genetic associations with prognosis (e.g. survival, disability, subsequent disease events) is problematic due to selection bias - also termed index event bias or collider bias - whereby selection on disease status can induce associations between causes of incidence with prognosis. A current method for adjusting genetic associations for this bias assumes there is no genetic correlation between incidence and prognosis, which may not be a plausible assumption. Methods: We propose an alternative, the 'Slope-Hunter' approach, which is unbiased even when there is genetic correlation between incidence and prognosis. Our approach has two stages. First, we use cluster-based techniques to identify: variants affecting neither incidence nor prognosis (these should not suffer bias and only a random sub-sample of them are retained in the analysis); variants affecting prognosis only (excluded from the analysis). Second, we fit a cluster-based model to identify the class of variants only affecting incidence, and use this class to estimate the adjustment factor. Results: Simulation studies showed that the Slope-Hunter method reduces type-1 error by between 49%-85%, increases power by 1%-36%, reduces bias by 17%-47% compared to other methods in the presence of genetic correlation and performs as well as previous methods when there is no genetic correlation. Slope-Hunter and the previous methods perform less well as the proportion of variation in incidence explained by genetic variants affecting only incidence decreases. Conclusions: The key assumption of Slope-Hunter is that the contribution of the set of genetic variants affecting incidence only to the heritability of incidence is at least as large as the contribution of those affecting both incidence and prognosis. When this assumption holds, our approach is unbiased in the presence of genetic correlation between incidence and progression, and performs no worse than alternative approaches even when there is no correlation. Bias-adjusting methods should be used to carry out causal analyses when conditioning on incidence.
- Downloaded 269 times
- Download rankings, all-time:
- Site-wide: 54,780 out of 85,056
- In genetics: 3,116 out of 4,463
- Year to date:
- Site-wide: 7,688 out of 85,056
- Since beginning of last month:
- Site-wide: 20,429 out of 85,056
Downloads over time
Distribution of downloads per paper, site-wide
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!