Observations from NIST’s Cloud and Big Data Workshop

Published: January 17, 2013

Big DataCloud ComputingDOCCybersecurityInnovation

Data Primacy was the name of the game at the three day workshop hosted by the National Institute of Standards and Technology from January 15 through 17, 2013. The event jointly focused on cloud computing and big data, attracting speakers and attendees from academia, the public sector, and government leadership. Discussions highlighted the wealth of opportunities that lay on the horizon for both of these emerging technologies. It’s a significant time for government information technology. Poised to increase the efficiency of its own operations, government could also provide much needed economic stimulus through its role around cloud computing and big data. This combination of technologies holds the promise of better healthcare, improved collaboration, more efficient problem resolution, deeper operational insights, and new levels of technological agility. But first, the current challenges need to be addressed.

One reason for the emphasis on data is the central role it plays for both cloud computing and big data, and not surprisingly many of the current barriers in both fields are data related. Where government adoption is concerned, this is a significant time for information technology. Cloud First’s aggressive push for more cloud computing in government has combined with the baselines being set by the Digital Strategy, leading us to what Federal Chief Information Officer Steve VanRoekel called “the tipping point of the data economy.”
Cloud computing adoption means greater agility, scalability. The potential cost savings are certainly part of the appeal. The big change going forward, though, would be around sharing information and increasing collaboration. Despite the potential rewards, a number of issues have been called out around increasing cloud adoption, particularly around security, migration, standardization and culture. Not surprisingly, these are also areas where issues are arising for Big Data.
Security: Security has been at the forefront of concerns around cloud services. The General Service Administration’s Federal Risk and Authorization Management Program (FedRAMP) aims to cultivate a more predictable environment through consistency around government cybersecurity requirements for cloud services. Having awarded one provisional authorization to date, the program is approaching the phase for full operations with 78 more vendors awaiting the results of their applications for certification.   
For vendors: It will take time to iron out the responsibilities and costs of data stewardship. The upfront costs are sufficient to give many vendors pause when looking at the business case for pursuing FedRAMP certification. As the program ramps up, agency use and requirements will become clearer.
Migration: When looking at implementing cloud service solutions, agencies are faced with the challenge of giving up some control of their data. It’s important to understand the costs of sending their datato the cloud. One estimate suggests it costs around fifty dollars (US) per terabyte of data migrated. Of course, this cost requires additional consideration for groups that generate and deal with large volumes of data, like scientific communities. For many groups, though, massive migrations of agency data seem likely to be an infrequent occurrence. They may look at the initial cost to move to the cloud and only take pause when looking at issues around requirements for search and discovery across clouds or moving from one cloud solution to another.
For vendors: Data driven science presents opportunities to provide solutions for optimal data architectures that scale well. Technical requirements come into play around moving data onto (and off of systems). Many systems prepare to input data but do not anticipate exporting it. Workshop speakers repeatedly stressed the need for a nonproprietary inter-cloud solution that facilitates transmitting data from one cloud to another. The interoperability of clouds will influence brokering opportunities, data portability, market competition (through vendor lock-in, cost and user experience), as well as operational efficiency for large companies and government organizations.
Standardization: While both are still emerging technology areas, cloud services have matured further than Big Data. For one thing, NIST has issued numerous publications on cloud, including a Cloud Reference Architecture. By contrast, at this point, there’s not even consensus on a definition of Big Data.  Beyond that, there’s lack of agreement on data architectures and standards for Big Data. At the same time, there are areas within the government that have been playing in this data analytics space for some time. Scientific communities at the Department of Energy and the National Oceanic and Atmospheric Administration deal with massive volumes of data and can offer a view of some of the challenges they’re encountering. For example, harvesting data can be an issue when dealing with disparate, heterogeneous information. Computing resource limitations pose another hurdle. Privacy concerns and information security requirements also need to be addressed. At the root of these data issues is a profound need for standardization.
For Vendors: The unique positioning of NIST was called out several times over the course of the workshops. In particular, a program manager at the Defense Advanced Research Projects Agency (DARPA) cited the need to define the problems around data for cloud computing analytics and said that government is positioned to reach a consortium across academia, industry and ranges of technical expertise. While these efforts benefit from industry engagement, the government will view its own evaluations as less susceptible to bias. Standardization initiatives led by government groups will have more traction when it comes to evaluation for federal adoption.
Culture: The challenge around government culture is two-fold. There’s technological and behavioral baggage from how agencies have historically operated – legacy systems, requirements for data retention and varying comfort levels with change. There’s also a reservations about new solutions, particularly where information security and continuity of operation planning are concerned.
For vendors: Establishing a degree of interoperability serves as a pre-caution against providers going down. As one workshop panelist pointed out, “You might not be able to afford to have all of your data on two providers, so you want to make sure you can move it.” It’s important to be mindful of the cost component for continuity of operation plans as technology requirements. Drive for clarity around agency risk tolerance and anticipate the impact on possible solutions. Risk-tolerance may need to take performance efficiency and costs into consideration. While agencies may look for tools to manage sensitive information in the cloud, using such tools smartly requires a firm grasp on security posture and its impact.

Ultimately, challenges around cloud computing must be resolved before Big Data can be leveraged fully. In a discussion on the US Government Cloud Computing Technology Roadmap, one panelist echoed Vint Cerf’s comparison of current technology developments to the evolution of the internet. If that suggestion holds, there’s missing pieces related to data standardization and transmission. As academia, government and industry work to mature data-centric operations, realizing the true value of these capabilities relies increasing on implementation context and practical applications.