MOGSA: integrative single sample gene-set analysis of multiple omics data
Gene set analysis (GSA) summarizes individual molecular measurements to more interpretable pathways or gene sets and has become an indispensable step in the interpretation of large scale omics data. However, GSA methods are limited to the analysis of single omics data. Here, we introduce a new computation method termed multi-omics gene set analysis (MOGSA), a multivariate single sample gene-set analysis method that integrates multiple experimental and molecular data types measured over the same set of samples. The method learns a low dimensional representation of most variant correlated features (genes, proteins, etc.) across multiple omics data sets, transforms the features onto the same scale and calculates an integrated gene set score from the most informative features in each data type. MOGSA does not require filtering data to the intersection of features (gene IDs), therefore, all molecular features, including those that lack annotation may be included in the analysis. We demonstrate that integrating multiple diverse sources of molecular data increases the power to discover subtle changes in gene-sets and may reduce the impact of unreliable information in any single data type. Using simulated data, we show that integrative analysis with MOGSA outperforms other single sample GSA methods. We applied MOGSA to three studies with experimental data. First, we used NCI60 transcriptome and proteome data to demonstrate the benefit of removing a source of noise in the omics data. Second, we discovered similarities and differences in mRNA, protein and phosphorylation profiles of induced pluripotent and embryonic stem cell lines. We demonstrate how to assess the influence of each data type or feature to a MOGSA gene set score. Finally, we report that three molecular subtypes are robustly discovered when copy number variation and mRNA profiling data of 308 bladder cancers from The Cancer Genome Atlas are integrated using MOGSA. MOGSA is available in the Bioconductor R package "mogsa".
- Downloaded 1,878 times
- Download rankings, all-time:
- Site-wide: 8,568
- In bioinformatics: 1,027
- Year to date:
- Site-wide: 20,969
- Since beginning of last month:
- Site-wide: 24,956
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!