Uncovering complex disease subtypes by integrating clinical data and imputed transcriptome from genome-wide association studies: Applications in psychiatry and cardiovascular medicine
Classifying patients into clinically and biologically homogenous subgroups will facilitate the understanding of disease pathophysiology and development of more targeted prevention and intervention strategies. Traditionally, disease subtyping is based on clinical characteristics alone, however disease subtypes identified by such an approach may not conform exactly to the underlying biological mechanisms. Very few studies have integrated genomic profiles (such as those from GWAS) with clinical symptoms for disease subtyping. In this study, we proposed a novel analytic framework capable of finding subgroups of complex diseases by leveraging both GWAS-predicted gene expression levels and clinical data by a multi-view bicluster analysis. This approach connects SNPs to genes via their effects on expression, hence the analysis is more biologically relevant and interpretable than a pure SNP-based analysis. Transcriptome of different tissues can also be readily modelled. We also proposed various new evaluation or validation metrics, such as a newly modified prediction strength measure to assess generalization of clustering performance. The proposed framework was applied to derive subtypes for schizophrenia, and to stratify subjects into different levels of cardiometabolic risks. Our framework was able to subtype schizophrenia patients with diverse prognosis and treatment response. We also applied the framework to the Northern Finland Cohort (NFBC) 1966 dataset, and identified high- and low cardiometabolic risk subgroups in a gender-stratified analysis. Our results suggest a more data-driven and biologically-informed approach to defining metabolic syndrome. The prediction strength was over 80%, suggesting that the cluster model generalizes well to new datasets. Moreover, we found that the genes blindly selected by the cluster algorithm are significantly enriched for known susceptibility genes discovered in GWAS of schizophrenia and cardiovascular diseases, providing further support to the validity of our approach. The proposed framework may be applied to any complex diseases, and opens up a new approach to patient stratification.
- Downloaded 360 times
- Download rankings, all-time:
- Site-wide: 44,903 out of 89,091
- In genetics: 2,607 out of 4,610
- Year to date:
- Site-wide: 42,229 out of 89,091
- Since beginning of last month:
- Site-wide: 19,712 out of 89,091
Downloads over time
Distribution of downloads per paper, site-wide
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!