Big Data in Government Today

Published: April 17, 2019

AFRLARMYBig DataDHSInformation TechnologyNOAA

A recent Federal Executive Forum, hosted by Federal News Network, featured experts in the public and private sectors that provided details on the status of big data projects in the federal government today.

The amount of information the federal government collects has grown exponentially and in the last several years, agencies have prioritized the use of data for mission operations and in key decisions. However, many agencies face similar obstacles when it comes to their volumes of data including, how to collect information from disparate sources, moving and converging data to a common area for easy access and protecting data from getting into the wrong hands. Last month, the Federal News Network hosted six panelists to discuss big data’s progress and best practices within the federal government. Panelists spanned agencies such as DHS, NOAA and Army and included experts from the private sector dealing with big data in the government:

  • Donna Roy, Executive Director of Information Sharing & Chief Data Officer, Department of Homeland Security
  • Tom Sasala, Director, Operations & Architecture & Chief Data Officer, U.S. Army
  • Jonathan O’Neil, Director, Big Data Project, National Oceanic and Atmospheric Administration
  • Brigham Bechtel, Chief Strategy Officer, MarkLogic
  • Henry Sowell, Chief Technology Officer, Cloudera
  • Nick Psaki, Principal, Office of the CTO, Pure Storage

Agencies have made much progress in data management, particularly in their use of cloud for scalability, speed and computing capabilities of data. The NOAA Big Data Project, Jonathan O’Neil shared, has made the agency’s data more accessible to the public via the cloud through its partnerships with five vendors. Moreover, some agencies have chosen to parse data out into different data lakes for specific mission purposes rather than retain one, large data lake.  For instance, Donna Roy at DHS described that her office is in the midst of standing up a Data Framework, a series of data lakes that will allow analytical opportunities for different mission operators at the agency. Likewise, Tom Sasala at Army, described how the military department has reached an operational status with its new Army Leader Dashboard, which includes all of the department’s data divided into data lakes according to different agency domains such as financial management, human capital and so forth.

When asked what success federal agencies have recently seen in their big data projects, panelists offered the following:

  • Sasala: The Army Leader Dashboard has allowed a change in leadership perspective on data itself and highlighted how much data the Army really owned. Questions such as how many Army soldiers the department actually has, are being answered with the data.
  • Bechtel: MarkLogic has been supporting the Air Force Research Lab, which has created a material science platform called HYPERTHOUGHT, to allow scientists to name their own data models, assign taxonomies, etc.
  • Roy: One of the data lakes under the Data Framework deals with immigration data and statistics. The agency is now able to take the data and download it more frequently to allow for more real-time decisions on immigration enforcement based on month-to-month data vs. annual information.
  • O-Neil: In one instance within its Big Data Project, NOAA was able to take NEXRAD data out of NOAA’s archives and into the cloud to use for tracking past bird migration patterns. Likewise, the agency is looking to other archived data to apply in critical areas such as forest fire predictions.

These successes have not come without several lessons learned, explained a majority of the panelists. Nick Psaki stated that most data infrastructure platforms today are not designed to handle the sheer scale of AI/ML capabilities. Sasala agreed, stating that a data hierarchy must be in place, with advanced analytics/AI/ML at the top and data at the bottom and accessibility/quality of data making its way to the top. Panelists also agreed that a proper data ecosystem must be in place including, security, skillsets, governance and standards such as when to age data off. Without any part of this ecosystem, everything else is moot, explained Henry Sowell.

When asked what the future likely holds for big data, panelists generally agreed that increased use of AI/ML is almost certain. Moreover, Roy commented that the path to AI/ML will be driven by the speed and security of data in 5G. Sasala agreed, stating that the emergence of enhanced broadband wireless communication will lead to an expansion of edge computing. Soon, users will be able to view, analyze and disseminate data quickly right at their fingertips.