SuperDCA for genome-wide epistasis analysis
Ying Ying Xu,
John A Lees,
Stephen D Bentley,
Nicholas J Croucher,
Posted 30 Aug 2017
bioRxiv DOI: 10.1101/182527 (published DOI: 10.1099/mgen.0.000184)
Posted 30 Aug 2017
The potential for genome-wide modeling of epistasis has recently surfaced given the possibility of sequencing densely sampled populations and the emerging families of statistical interaction models. Direct coupling analysis (DCA) has earlier been shown to yield valuable predictions for single protein structures, and has recently been extended to genome-wide analysis of bacteria, identifying novel interactions in the co-evolution between resistance, virulence and core genome elements. However, earlier computational DCA methods have not been scalable to enable model fitting simultaneously to 10000-100000 polymorphisms, representing the amount of core genomic variation observed in analyses of many bacterial species. Here we introduce a novel inference method (SuperDCA) which employs a new scoring principle, efficient parallelization, optimization and filtering on phylogenetic information to achieve scalability for up to 100000 polymorphisms. Using two large population samples of Streptococcus pneumoniae, we demonstrate the ability of SuperDCA to make additional significant biological findings about this major human pathogen. We also show that our method can uncover signals of selection that are not detectable by genome-wide association analysis, even though our analysis does not require phenotypic measurements. SuperDCA thus holds considerable potential in building understanding about numerous organisms at a systems biological level.
- Downloaded 740 times
- Download rankings, all-time:
- Site-wide: 55,233
- In genomics: 3,717
- Year to date:
- Site-wide: 168,949
- Since beginning of last month:
- Site-wide: 164,438
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!