Rxivist logo

A seq2seq model to forecast the COVID-19 cases, deaths and reproductive R numbers in US counties

By Yanli Zhang-James, Jonathan L. Hess, Asif Salekin, Dongliang Wang, Samuel Chen, Peter Winkelstein, Christopher P Morley, Stephen V Faraone

Posted 20 Apr 2021
medRxiv DOI: 10.1101/2021.04.14.21255507

The global pandemic of coronavirus disease 2019 (COVID-19) has killed almost two million people worldwide and over 400 thousand in the United States (US). As the pandemic evolves, informed policy-making and strategic resource allocation relies on accurate forecasts. To predict the spread of the virus within US counties, we curated an array of county-level demographic and COVID-19-relevant health risk factors. In combination with the county-level case and death numbers curated by John Hopkins university, we developed a forecasting model using deep learning (DL). We implemented an autoencoder-based Seq2Seq model with gated recurrent units (GRUs) in the deep recurrent layers. We trained the model to predict future incident cases, deaths and the reproductive number, R. For most counties, it makes accurate predictions of new incident cases, deaths and R values, up to 30 days in the future. Our framework can also be used to predict other targets that are useful indices for policymaking, for example hospitalization or the occupancy of intensive care units. Our DL framework is publicly available on GitHub and can be adapted for other indices of the COVID-19 spread. We hope that our forecasts and model can help local governments in the continued fight against COVID-19.

Download data

  • Downloaded 166 times
  • Download rankings, all-time:
    • Site-wide: 121,957
    • In health informatics: 471
  • Year to date:
    • Site-wide: 25,352
  • Since beginning of last month:
    • Site-wide: 13,809

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide