Application of a Novel Machine Learning Method to Big Data Infers a Relationship Between Asthma and the Development of Neoplasia
Background: A relationship between asthma and the risk of having cancer has been identified in several studies. However, these studies have used different methodologies, been primarily cross-sectional in nature, and the results have been contradictory. Population-level analyses are required to determine if a relationship truly exists. Methods: We developed a novel machine learning tool to infer associations, Causal Inference using the Composition of Transactions (CICT). Two all payers claim datasets of over two hundred million hospitalization encounters from the US-based Healthcare Cost and Utilization Project (HCUP) were used for discovery and validation. Associations between asthma and neoplasms were discovered in data from the State of Florida. Validation was conducted on eight cohorts of patients with asthma, and seven subtypes of asthma and COPD using datasets from the State of California. Control groups were matched by gender, age, race, and history of tobacco use. Odds ratio analysis with Bonferroni-Holm correction measured the association of asthma and COPD with 26 different benign and malignant neoplasms. ICD9CM codes were used to identify exposures and outcomes. Findings: CICT identified 17 associations between asthma and the risk of neoplasia in the discovery dataset. In the validation studies, 208 case-control analyses were conducted between subtypes of Asthma (N= 999,370, male= 33%, age= 50) and COPD (N=715,971, male = 50%, age=69) with the corresponding matched control groups (N=8,400,004, male= 42%, age= 47). Allergic asthma was associated with benign neoplasms of the meninges, salivary, pituitary, parathyroid, and thyroid glands (OR:1.52 to 2.52), and malignant neoplasms of the breast, intrahepatic biliary system, hematopoietic, and lymphatic system (OR: 1.45 to 2.05). COPD was associated with malignant neoplasms in the lung, bladder, and hematopoietic systems. Interpretation: The combined use of machine learning methods for knowledge discovery and epidemiological methods shows that allergic asthma is associated with the development of neoplasia, including in glandular organs, ductal tissues, and hematopoietic systems. Also, our findings differentiate the pattern of neoplasms between allergic asthma and obstructive asthma. This suggests that inflammatory pathways that are active in asthma also contribute to neoplastic transformation in specific organ systems such as secretory organs. ### Competing Interest Statement The authors have declared no competing interest.
- Downloaded 647 times
- Download rankings, all-time:
- Site-wide: 55,500
- In epidemiology: 2,666
- Year to date:
- Site-wide: 125,645
- Since beginning of last month:
- Site-wide: 80,460
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!