GAO Identifies Limitations in Public Health COVID-19 Data Collection and Analytical Modeling

Published: August 05, 2020

Federal Market AnalysisBig DataCDCCoronavirus (COVID-19) Pandemic

A recent GAO report promotes understanding in the use and limitations of COVID-19 data collection, analytical methods and forecast modeling.

Key Takeaways:

  • A GAO report issued on July 30 outlines the pros and cons in COVID-19 data collection methods, analytical approaches, and COVID-19 forecast modeling to monitor effects of the outbreak.
  • Many of the consistency and completeness limitations identified by the GAO in COVID-19 case, hospitalization and mobility data collections relate to timeliness and local-level reporting processes that the CDC must mend.  
  • Several analytical methods and data forecasting models may assist in the challenges of data collection. However, also understanding the limitations in these activities is critical to the success of COVID-19 analysis.

The COVID-19 pandemic wreaking worldwide havoc is now an understatement. In the U.S. alone, the pandemic has caused increasing mortality rates, overwhelmed health systems and triggered an economic downturn. As such, the data and analysis surrounding COVID-19 is imperative to help shape policy and decisions for government leaders. A recent GAO report, “COVID-19: Data Quality and Considerations for Modeling and Analysis,” helps with understanding the limitations that currently exist at public health agencies in COVID-19 data collection methods, as well as analytical methods and forecast modeling that may help quell these challenges.  

Surveillance Data Collection

Using several systems alongside state, local and academic health entities, the Centers for Disease Control and Prevention (CDC) within the Department of Health and Human Services (HHS) is responsible for posting COVID-19 surveillance data for public knowledge. The GAO report emphasizes understanding the limitations in the COVID-19 data that CDC collects, because “visual representations of the data can be misleading without appropriate context in terms of data completeness and reliability,” according to the watchdog agency.


Data Use

Collection Method


Risk and Analysis

Case Data

Determines total spread of the virus.

A national case report form from providers and public health officials is submitted to  CDC’s

National Notifiable Diseases Surveillance


  • Inconsistent – jurisdictions varied in the reporting of probable cases and CDC did not display probable case data until June 2, 2020.
  • Incomplete – Case data depends on those seeking care or want to get tested, the availability of tests and staff, and time it takes to get test results and report them
  • Inconsistent and incomplete data results in the inability to assess infection rates by geography and people demographics.
  • Rerunning analyses as the data and/or guidelines change is necessary to obtain effective statistics

Hospitalization Data


health care system capacity and severity of COVID-19.

CDC monitors data through the COVID-19-Associated Hospitalization Surveillance Network (COVID-NET) with a specific catchment area and through an added COVID-19 module to CDC’s National Healthcare Safety Network (NHSN).

  • Inconsistent – health care providers and facilities vary in testing practices and testing availability
  • Incomplete - availability of demographic and epidemiological information limited in COVID-NET and results for COVID-NET does not represent populations outside its catchment area
  • The limitations affect an analyst’s ability to interpret trends over time and across subpopulations
  • CDC is working to complete medical chart abstractions for all COVID-19 hospitalizations identified in COVID-NET

Mortality Data

Determines the severity of COVID-19.

CDC collects data from follow-ups on cases reported through the case surveillance system, and through the National Center for Health Statistics (NCHS) reporting based on death certificates.

  • Inconsistent – different definitions of probable COVID-19 deaths by state and local entities, COVID-19 deaths may be misclassified and variations in death certificate classification of COVID-19
  • Incomplete – timing of data varies due to differences in state reporting, and type of data and demographics reported in the case surveillance system differs from that of NCHS reporting
  • The inaccuracies in COVID-19 case and death data affects the case fatality ratio
  • CDC must streamline classification of data between its two methods to obtain mortality data and distinguish state lag times in reporting deaths


Analytics and Forecast Modeling

The GAO reports various analytical methods that can help researchers and decision leaders utilize these data collections to gain more insight into COVID-19 and its effects. Analytics, in fact, can even help overcome some of the data collection limitations, the report argues. For example, analysis on the deaths due to respiratory diseases and on higher-than-expected deaths from all causes will help to fill the gaps presented in the mortality data. By examining these broader trends, the effects and severity of COVID-19 are realized from a new perspective.  

Additionally, forecast data modeling can also provide valuable insight. According to the report, “A model uses equations and logic to simplify aspects of nature that can be complicated and difficult to understand…in the context of a public health response to a disease outbreak, models are used to understand the drivers of a particular outbreak, assess the risk of certain diseases, detect or forecast new outbreaks, and investigate the potential effects of public health interventions.”

Nonetheless, the report focuses on also understanding the limitations of forecast data modeling. These include being reliant on the underlying data, which if scarce and inconsistent, will affect the precision of the model. Moreover, the overreliance on data, particularly early on in an outbreak, will produce uncertainty in the modeling. Lastly, explaining the model in an effective manner to an audience may be difficult, leaving room for misinterpretation or oversimplification.

The GAO has sent these findings to the CDC and is awaiting a technical response from the agency.