UPDATE January 19, 2016: The HDF5-1.10.0-alpha1 release is now available, adding Collective Metadata I/O to these features:
– Concurrent Access to an HDF5 File: Single Writer / Multiple Reader (SWMR)
– Virtual Dataset (VDS)
– Scalable Chunk Indexing
– Persistent Free File Space Tracking
We’re pleased to announce the release of HDF5 1.10.0-alpha0.
HDF5 1.10.0, planned for release in Spring, 2016, is a major release containing many new features. On January 6, 2016 we announced the release of the first alpha version of the software.
The alpha0 release contains some (but not all) of the features that will be in HDF5 1.10.0. The Single Writer/Multiple Reader and Virtual Data Set features, below, are both contained in this alpha release as are scalable chunk indexing and persistent free file space tracking. More features, such as enhancements to parallel HDF5 and support for compressing contiguous datasets will be added in upcoming alpha releases.
In an earlier blog post , we merely floated the idea of bulk-processing HDF5 files with Apache Spark. In this article, we follow up with a few simple use cases and some numbers for a data collection to which many readers will be able to relate.
If the first question on your mind is, “What kind of resources will I need?”, then you have a valid point, but you also might be the victim of BigData propaganda. Consider this: “Most people don’t realize how much number crunching they can do on a single computer.”
“If you don’t have big data problems, you don’t need MapReduce and Hadoop. It’s great to know they exist and to know what you could do if you had big-data problems.” (, p. 323) In this article, we focus on how far we can push our personal computing devices with Spark, and leave the discussion of Big Iron and Big Data vs. big data vs. big data, etc. for another day. Continue reading →