బయోకెమిస్ట్రీ అండ్ బయోటెక్నాలజీ జర్నల్

నైరూప్య

A machine learning explanation of incidence inequalities of SARS-CoV-2 across 88 days in 157 countries.

Eric Luellen

Because the SARS-CoV-2 (COVID-19) pandemic viral outbreaks will likely continue until effective
vaccines are widely administered, new capabilities to accurately predict incidence rates by location
and time to know in advance the disease burden and specific needs for any given population are
valuable to minimize morbidity and mortality. In this study, a random forest of 9,250 regression trees
was applied to 6,941 observations of 13 statistically significant independent predictor variables
targeting SARS-CoV-2 incidence rates per 100,000 across 88 days in 157 countries. One key finding is
an algorithm that can predict the incidence rate per day of a SARS-CoV-2 epidemic cycle with a
pseudo-R2 accuracy of 98.5% and explains 97.4% of the variances. Another key finding is the relative
importance of 13 demographic, economic, environmental, and public health modulators to the SARSCoV-
2 incidence rate. Four factors proposed in earlier research as potential modulators have no
statistically significant relationship with incidence rates. These findings give leaders new capabilities
for improved capacity planning and targeting stay-at-home interventions and prioritizing
programming by knowing the atypical social determinants that are the root causes of SARS-CoV-2
incidence variance. This work also proves that machine learning can accurately and quickly explain
disease dynamics for zoonoses with pandemic potential.