Big Data and HPC Helping COVID-19 Response
Published: March 23, 2020
Use of big data and in particular, high performance computing, is prevalent to helping combat COVID-19. Federal, industry and academic resources are brought together to collectively fight the pandemic.
- Data-driven approaches, coupled with advanced computing capabilities, are key to helping find a cure to COVID-19.
- Collaboration by public, private and academia sectors help advance the scientific community's efforts in COVID-19 through the use of big data and HPC technologies.
- Modernizing technologies, such as cloud computing, will play a key role in helping add additional capacities to advanced computing in the aid of a COVID-19 response.
In recent years, the federal government has been peddling the use of data as a strategic asset to problem solve and power agency missions. Typically referenced as the “new oil” in organizational operations and decision-making, data’s role in the response to COVID-19 is proving no different.
In fact, federal, industry and academic partners have come together to create powerful datasets and apply supercomputing resources to produce ongoing answers to the outbreak, resolve gaps in COVID-19 responses and accelerate scientific solutions to contain the pandemic.
The COVID-19 Open Research Dataset (CORD-19) is described as a “free resource of over 44,000 scholarly articles, including over 29,000 with full text, about COVID-19 and the coronavirus family of viruses for use by the global research community.” CORD-19 is central in the response to COVID-19 by providing users with ongoing and connected scholarly data related to the disease.
The best part of CORD-19? The data is machine-readable, saving researchers and scientists days of work.
CORD-19 is a perfect example of an “all hands on deck” industry collaboration in the fight against COVID-19. According to a White House statement, Microsoft led the charge to gather worldwide scientific studies and reports on the virus. Moreover, the Chan Zuckerberg Initiative (CZI) provided access to pre-publication content, while the National Library of Medicine (NLM) also made its content available. Georgetown University’s Security and Emerging Technology (CSET) and the Allen Institute for AI worked together to transform the unstructured data into machine-readable material. The Allen Institute’s SemanticScholar now houses the data on its site, providing a dynamic search engine for researchers to quickly locate relevant information.
The dataset paves the way for data mining and AI technologies, such as natural language processing, to filter through the data and answer the most critical questions regarding COVID-19. Competitive challenges, such as Kaggle’s CORD-19 Challenge, prompts researchers to discover the answers to a series of key questions that will hopefully lead to new insights about COVID-19.
High Performance Computing
Where data is central in a critical issue, discussion of advanced computing technologies to process the data is sure to follow. On March 23rd, the White House announced the High Performance Computing Consortium to help provide access to the nation’s High Performance Computing (HPC) systems to help advance efforts surrounding COVID-19. The consortium is a collaborative effort led by the Office of Science and Technology Policy (OSTP), with participation by federal (Energy, NSF and NASA), industry (IBM, Amazon Web Services, Google Cloud and Microsoft) and academia (MIT and Rensselaer Polytechnic Institute) partners. The consortium provides access to 16 supercomputers with over 330 petaflops of computing capability. According to Energy's announcement of the effort, HPC “can process massive numbers of calculations related to bioinformatics, epidemiology, molecular modeling, and healthcare system response, helping scientists develop answers to complex scientific questions about COVID-19 in hours or days versus weeks or months.” Researchers are invited to submit proposals in relation to COVID-19 discovery, those reviewed and deemed most beneficial to public health will gain access to the super systems to complete research.
Even before the consortium, Energy’s HPC systems have been playing a role in the fight against COVID-19. Accredited as the world’s most powerful supercomputer, researchers have been granted emergency computation time with the Summit system at Oak Ridge National Laboratory. Summit has been used to perform simulations to help lead to a cure for the COVID-19 virus. Thus far, the system has been able to rank a database of 8,000 drug compounds, narrowed by scientists to a subset of 77 possible compounds to prevent the coronavirus strain from infecting hosting cells. What would have otherwise taken scientists months to discover, has taken days with the use of the Summit computer.