Rare variants imputation in admixed populations: Comparison across reference panels and bioinformatics tools.
Richard P. Mayeux,
Badri N. Vardarajan,
Ivonne Z Jimenez-Velazquez,
Posted 13 Dec 2018
bioRxiv DOI: 10.1101/494229 (published DOI: 10.3389/fgene.2019.00239)
Posted 13 Dec 2018
Background: Imputation has become a standard approach in genome-wide association studies (GWAS) to infer in silico untyped markers. Although feasibility for common variants imputation is well established, we aimed to assess rare and ultra-rare variants imputation in an admixed Caribbean Hispanic population (CH). Methods: We evaluated imputation accuracy in CH (N=1,000), focusing on rare (0.1% ≤minor allele frequency (MAF) ≤ 1%) and ultra-rare (MAF < 0.1%) variants. We used two reference panels, the Haplotype Reference Consortium (HRC; N=27,165) and 1000 Genome Project (1000G phase 3; N=2,504) and multiple phasing (SHAPEIT, Eagle2) and imputation algorithms (IMPUTE2, MACH-Admix). To assess imputation quality, we reported: a) high-quality variant counts according to imputation tools internal indexes (e.g. IMPUTE2 Info≥80%). b) Wilcoxon Signed-Rank Test comparing imputation quality for genotyped variants that were masked and imputed; c) Cohens kappa coefficient to test agreement between imputed and whole-exome sequencing (WES) variants; d) imputation of G206A mutation in the PSEN1 (ultra-rare in the general population an more frequent in CH) followed by confirmation genotyping. We also tested ancestry proportion (European, African and Native American) against WES-imputation mismatches in a Poisson regression fashion. Results: SHAPEIT2 retrieved higher percentage of imputed high-quality variants than Eagle2 (rare: 51.02% vs. 48.60%; ultra-rare 0.66% vs 0.65%, Wilcoxon p-value < 0.001). SHAPEIT-IMPUTE2 employing HRC outperformed 1000G (64.50% vs. 59.17%; 1.69% vs 0.75% for high-quality rare and ultra-rare variants, respectively; Wilcoxon p-value < 0.001). SHAPEIT-IMPUTE2 outperformed MaCH-Admix. Compared to 1000G, HRC-imputation retrieved a higher number of high-quality rare and ultra-rare variants, despite showing lower agreement between imputed and WES variants (e.g. rare: 98.86% for HRC vs. 99.02% for 1000G). High Kappa (K = 0.99) was observed for both reference panels. Twelve G206A mutation carriers were imputed and all validated by confirmation genotyping. African ancestry was associated with higher imputation errors for uncommon and rare variants (p-value < 1e-05). Conclusion: Reference panels with larger numbers of haplotypes can improve imputation quality for rare and ultra-rare variants in admixed populations such as CH. Ethnic composition is an important predictor of imputation accuracy, with higher African ancestry associated with poorer imputation accuracy.
- Downloaded 402 times
- Download rankings, all-time:
- Site-wide: 68,398
- In bioinformatics: 6,598
- Year to date:
- Site-wide: 94,032
- Since beginning of last month:
- Site-wide: 96,662
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!