Since the June 2013 launch of NIST’s big data working group, participants from government, industry, and academia have collaborated to define and prioritize big data requirements. Acknowledging the interplay between data characteristics and scalable system architecture, the working group has defined big data as follows: "Big Data consists of extensive datasets - primarily in the characteristics of volume, variety, velocity, and/or variability- that require a scalable architecture for efficient storage, manipulation, and analysis." Accordingly, efforts over the past year have explored interoperability, portability, reusability, analytic techniques, and technology infrastructure (along with other topics) to support effective adoption of big data capabilities. The working group organized into subgroups to address the various components of the framework. On April 6, 2015, the first draft of this work was opened for public comment.
The final version of these documents will be published as version one of the framework. Three iterations of these documents are planned, each building on the previous work. These three versions of the framework align with the stages of NIST’s Big Data Public Working Group. In the first stage, the working group aims to identify the key components of big data reference architecture that are technology, infrastructure, and vendor agnostic. In the second stage, the working group will define general interfaces between components of the NIST Big Data Reference Architecture. Finally, in stage three, that reference architecture will be validated by building big data general applications through general interfaces. The subgroups have outlined some of the future work that will be included in these next stages for each of the framework components.
FRAMEWORK COMPONENT FUTURE WORK
• Defining the different patterns of communications between Big Data resources to better clarify the different approaches being taken.
• Updating Volume 1 taking into account the efforts of other working groups such as International Organization for Standardization (ISO) Joint Technical Committee 1 (JTC 1) and the Transaction Processing Performance Council.
• Continue exploring the changes in both Management and in Security and Privacy. As changes in the activities within these roles are clarified, the taxonomy and associated definitions will be developed further.
• Continue investigating the interfaces between data characteristics and technologies.
• Explore societal impact issues, such as data ownership and data governance, which need more examination.
Use Cases and General Requirements
• Draw on the use case classification to suggest classes of software models and system architectures.
• Collect benchmarks that capture the “essence” of individual use cases.
• Other future work may include collection and classification of additional use cases in areas that would benefit from additional entries, such as Government Operations, Commercial, Internet of Things, and Energy.
Security and Privacy
• Exploring governance, risk management, data ownership, and valuation with respect to Big Data ecosystem, with a focus on security and privacy.
• Select use cases from the 62 (51 general and 11 security and privacy) submitted use cases or other, to be identified, meaningful use cases.
• Work with domain experts to identify workflow and interactions among the NBDRA components and fabrics.
• Aggregate the common data workflow and interactions between NBDRA components and fabrics and package them into general interfaces.
• Implement the same set of use cases used in Version 2 by using the defined general interfaces.
• Identify and implement a few new use cases outside the Version 2 scenarios.
• Continue to build and refine the gap analysis.
• Identify where standards may accelerate the adoption and interoperability of Big Data technologies.
• Further map standards to NBDRA components and the interfaces between them.
Additional work is not anticipated for the reference architecture survey. The general interfaces developed during Version 2 activities related to the reference architecture will offer a starting point for further refinement. It is not intended to yield a definitive solution to address all implementation needs. The comment period for these initial drafts extends until May 21, 2015, providing several weeks to collect feedback before reviewing and incorporating input in to the final publication of the framework’s first version.