Big Data in Healthcare

Published: September 13, 2017

Big DataHHSHealth CareVA

The healthcare public sector has made use of big data and analytics in recent years.

A few weeks ago, I discussed the impact of big data in the law enforcement community. This week, I’d like to touch on another area of growth for big data and analytics: healthcare.

Major federal healthcare departments, Health and Human Services (HHS) and Veterans Affairs (VA) in particular, have been utilizing big data and analytics in a variety of ways; from biomedical research to medical monitoring and precision medicine to even helping curb waste, fraud and abuse.

To the last point, healthcare agencies have been gathering a greater amount of internal and external data, both structured and unstructured, and using technologies such as machine learning, natural language processing and predictive analytics to sift through the information and identify outliers and unusual patterns to identify potential fraud. Through these methods, the Centers for Medicare and Medicaid (CMS) under HHS has identified hundreds of millions of dollars in fraud. According to a FedTech article, Steve Shandy, program manager at HHS OIG, predicts big investments in natural language processing and social media analytics down the line to continue identifying abuse and fraud.   

Big data in healthcare, especially when it comes to research, simply makes sense. A large amount of various types of data must be collected and studied in order to uncover solutions to medical anomalies. Within recent years, both HHS and VA have announced a sampling of various big data and analytic initiatives the agencies have pursued to aid in medical research and administration:


  • The Veteran’s Health Administration is working with wearable devices such as Fitbit or the Apple Watch in order to monitor the health statistics of patients with programmed sensors/algorithms to report abnormalities in health standards.
  • Using super computing capabilities at the Energy Department, VA is collecting blood samples for genetic analysis in order to predict and treat post-traumatic stress disorder and other combat-related injuries and effects.


  • The National Institutes of Health (NIH) recently expanded its supercomputing capabilities by doubling its capacity to perform 1.2 thousand trillion operations per second in order to dig through an immense amount of data regarding cancer, diabetes, mental health and other medical research.
  • NIH instituted its “All of US” program in order to gather data over time from more than 1 million peoples in the U.S. to study a variety of health conditions and the impact of individual differences in lifestyle, environment and biological makeup.
  • The Food and Drug Administration is using high-performance computing modeling and simulations in order to evaluate medical devices and drugs and observe patients that may need dosage adjustments to utilize the drug’s effectiveness. Moreover, FDA is working to build a natural history database to collect data that may lead to development of “model-based drugs” for chronic diseases.

Reported spending numbers in big data from FY 2014 through FY 2016 seem to confirm the growth within HHS and VA:

Source: FPDS, Deltek

Both VA and HHS saw increases in spending between FY 2015 to FY 2016, up nearly 37% at VA and 15% at HHS. The rise in VA is primarily due to a 79% increase in big data software purchases with $10.9M spent in FY 2015 and $19.6M in FY 2016. Additionally, there was a 7% increase in big data services at VA with $14.9M spent in FY 2015 and $16M in FY 2016. Of that $16M, VA spent $6.7M on analysis support and $5.9M that same year in data management – related services.  In software, of the $19.6M spent in FY 2016, VA spent $7.7M in predictive analytics and $3.3M in maching learning.

At HHS, spending in big data services rose 12% from FY 2015 to FY 2016; $74.9M in FY 2015 to $84.2M in FY 2016. Likewise, spending in big data software increased by 29% from FY 2015 to FY 2016; $34.7M in FY 2015 to $44.9M in FY 2016. Within the services sector, $42M was spent in FY 2016 on analysis support and $31.4M the same year in data management-related services. On the software side, $20.5M was spent in FY 2016 on predictive analytics and $6.6M in health analytics.

Note: the above numbers are based on FPDS reported spending from FY 2014 - FY 2016. Deltek has filtered through the spending using specific big data keywords.

Given the range of use in big data within the healthcare sector, continued interest in big data spending, particularly in software and services, is likely. Technologies and methods such as machine learning, artificial intelligence, natural language processing and predictive analytics seem to be at the forefront in future use of big data in healthcare to aid in its missions.