Rxivist logo

Early prediction of high risk gestational diabetes mellitus via machine learning models.

By Yan-Ting Wu, Chen-Jie Zhang, Ben Willem Mol, Cheng Li, Lei Chen, Yu Wang, Jian-Zhong Sheng, Jian-Xia Fan, Yi Shi, He-Feng Huang

Posted 30 Mar 2020
medRxiv DOI: 10.1101/2020.03.26.20040196

AimsGestational diabetes mellitus (GDM) is a pregnancy-specific disorder that can usually be diagnosed after 24 gestational weeks. So far, there is no accurate method to predict GDM in early pregnancy. MethodsWe collected data extracted from the hospitals electronic medical record system included 73 features in the first trimester. We also recorded the occurrence of GDM, diagnosed at 24-28 weeks of pregnancy. We conducted a feature selection method to select a panel of most discriminative features. We then developed advanced machine learning models, using Deep Neural Network (DNN), Support Vector Machine (SVM), K-Nearest Neighboring (KNN), and Logistic Regression (LR), based on these features. ResultsWe studied 16,819 women (2,696 GDM) and 14,992 women (1,837 GDM) for the training and validation group. DNN, SVM, KNN, and LR models based on the 73-feature set demonstrated the best discriminative power with corresponding area under the curve (AUC) values of 0.92 (95%CI 0.91, 0.93), 0.82 (95%CI 0.81, 0.83), 0.63 (95%CI 0.62, 0.64), and 0.85 (95%CI 0.84, 0.85), respectively. The 7-feature (selected from the 73-feature set) DNN, SVM, KNN, and LR models had the best discriminative power with corresponding AUCs of 0.84 (95%CI 0.83, 0.84), 0.69 (95%CI 0.68, 0.70), 0.68 (95%CI 0.67, 0.69), and 0.84 (95% CI 0.83, 0.85), respectively. The 7-feature LR model had the best Hosmer-Lemeshow test outcome. Notably, the AUCs of the existing prediction models did not exceed 0.75. ConclusionsOur feature selection and machine learning models showed superior predictive power in early GDM detection than previous methods; these improved models will better serve clinical practices in preventing GDM. Research in Context sectionO_ST_ABSEvidence before this studyC_ST_ABSO_LIA hysteretic diagnosis of GDM in the 3rd trimester is too late to prevent exposure of the embryos or fetuses to an intrauterine hyperglycemia environment during early pregnancy. C_LIO_LIPrediction models for gestational diabetes are not uncommon in previous literature reports, but laboratory indicators are rarely involved in predictive indicators. C_LIO_LIThe penetration of AI into the medical field makes us want to introduce it into GDM predictive models. C_LI What is the key question?Whether the GDM prediction model established by machine learning has the ability to surpass the traditional LR model? Added value of this studyO_LIUsing machine learning to select features is an effective method. C_LIO_LIDNN prediction model have effective discrimination power for predicting GDM in early pregnancy, but it cannot completely replace LR. KNN and SVM are even worse than LR in this study. C_LI Implications of all the available evidenceThe biggest significance of our research is not only to build a prediction model that surpasses previous ones, but also to demonstrate the advantages and disadvantages of different machine learning methods through a practical case.

Download data

  • Downloaded 250 times
  • Download rankings, all-time:
    • Site-wide: 109,625
    • In endocrinology: 87
  • Year to date:
    • Site-wide: 85,842
  • Since beginning of last month:
    • Site-wide: 29,453

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide