Benchmarker: an unbiased, association-data-driven strategy to evaluate gene prioritization algorithms
Genome-wide association studies (GWAS) are valuable for understanding human biology, but associated loci typically contain multiple associated variants and genes. Thus, algorithms that prioritize likely causal genes and variants for a given phenotype can provide biological interpretations of association data. However, a critical, currently missing capability is to objectively compare performance of such algorithms. Typical comparisons rely on "gold standard" genes harboring causal coding variants, but such gold standards may be biased and incomplete. To address this issue, we developed Benchmarker, an unbiased, data-driven benchmarking method that compares performance of prioritization strategies to each other (and to random chance) by leave-one-chromosome-out cross-validation with stratified linkage disequilibrium (LD) score regression. We first applied Benchmarker to twenty well-powered GWAS and compared gene prioritization based on strategies employing three different data sources, including annotated gene sets and gene expression. No individual strategy clearly outperformed the others, but genes prioritized by multiple strategies had higher per-SNP heritability than those prioritized by one strategy only. We also compared two gene prioritization methods, DEPICT and MAGMA; genes prioritized by both methods strongly outperformed genes prioritized by only one. Our results suggest that combining data sources and algorithms should pinpoint higher quality genes for follow-up. Benchmarker provides an unbiased approach to evaluate any method that provides genome-wide prioritization of gene sets, genes, or variants, and can determine the best such method for any particular GWAS. Our method addresses an important unmet need for rigorous tool assessment and can assist in mapping genetic associations to causal function.
- Downloaded 499 times
- Download rankings, all-time:
- Site-wide: 59,668
- In genomics: 4,375
- Year to date:
- Site-wide: 109,804
- Since beginning of last month:
- Site-wide: 114,529
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!