UPDATE January 19, 2016: The HDF5-1.10.0-alpha1 release is now available, adding Collective Metadata I/O to these features:
– Concurrent Access to an HDF5 File: Single Writer / Multiple Reader (SWMR)
– Virtual Dataset (VDS)
– Scalable Chunk Indexing
– Persistent Free File Space Tracking
We’re pleased to announce the release of HDF5 1.10.0-alpha0.
HDF5 1.10.0, planned for release in Spring, 2016, is a major release containing many new features. On January 6, 2016 we announced the release of the first alpha version of the software.
The alpha0 release contains some (but not all) of the features that will be in HDF5 1.10.0. The Single Writer/Multiple Reader and Virtual Data Set features, below, are both contained in this alpha release as are scalable chunk indexing and persistent free file space tracking. More features, such as enhancements to parallel HDF5 and support for compressing contiguous datasets will be added in upcoming alpha releases.
HDF Server is a new product from The HDF Group which enables HDF5 resources to be accessed and modified using Hypertext Transfer Protocol (HTTP).
HDF Server , released in February 2015, was first developed as a proof of concept that enabled remote access to HDF5 content using a RESTful API. HDF Server version 0.1.0 wasn’t yet intended for use in a production environment since it didn’t initially provide a set of security features and controls. Following its successful debut, The HDF Group incorporated additional planned features. The newest version of HDF Server provides exciting capabilities for accessing HDF5 data in an easy and secure way. Continue reading →
In the face of naysayers, the SQL abides. Read in our latest blog post how HDF5 and ODBC tie the room together.
When I order my HDF5 at Moe’s Data Diner, I usually ask for extra napkins. It’s a meal where you need both hands, and it can be messy before you get to the juicy bits. At least that’s the way it used to be. It’s easier to dissect with h5py , but what’s the hungry stranger to do, who is just coming through town and who is clueless? When asked the other night, I didn’t fall off my chair nor did I choke, but the question got my head going. What would it look like, that SQL on-the-side thing?
Say, I have an HDF5 dataset at /group1/A/dset2 and would like to select a few elements like so:
SELECT * FROM /group1/A/dset2 WHERE value > -999.0
Nice, but how do I get the result into my favorite analytics tool? Isn’t there some standard pipe or conduit that helps me over that last mile? Well, it’s kind of embarrassing to admit, but it’s been there since the early 1990s and is called Open Database Connectivity (ODBC) .
Think of ODBC as the “USB of data sources.” If you have a USB driver for a device, it’s game on. If you have an ODBC driver for your data source, then it’s SQL, milk, and honey from here. A growing number of applications come with some module or package for accessing data stored in HDF5 files, but you can be almost certain that your tool of choice has an ODBC client built in.
To build an HDF5/ODBC driver, we need a splash of the “secret sauce” shown in the figure above, and this is the subject of this blog post. Spoiler Alert: We are not giving away the recipe.
We’re pleased to announce that The HDF Group is now a member of the Open Commons Consortium(formerly Open Cloud Consortium), a not for profit that manages and operates cloud computing and data commons infrastructure to support scientific, medical, health care and environmental research.
The HDF Group will be participating in the NOAA Data Alliance Working Group (WG) on the WG committee that will determine the datasets to be hosted in the NOAA data commons as well as tools to be used in the computational ecosystem surrounding the NOAA data commons.
“The Open Commons Consortium (OCC) is a truly innovative concept for supporting scientific computing,” said Mike Folk, The HDF Group’s President. “Their cloud computing and data commons infrastructure supports a wide range of research, and OCC’s membership spans government, academia, and the private sector. This is a good opportunity for us to learn about how we can best serve these communities.”
The HDF Group will also participate in the Open Science Data Cloud working group and receive resource allocations on the OSDC Griffin resource. The HDF Group’s John Readey is working with the OCC and others to investigate ways to use Griffin effectively. Readey says, “Griffin is a great testbed for cloud-based systems. With access to object storage (using the AWS/S3 api) and the ability to programmatically create VM’s, we will explore new methods for the analysis of scientific datasets.” Continue reading →
The HDF Group’s support for and use of the Java Programming Language consists of Java wrappers for the HDF4 and HDF5 C libraries, an Object Model definition and implementation, and HDFView, a graphical file viewing application. In this article we’ll discuss what we’re doing now with Java, and look toward the future.
By the time the first public version of the Java Programming Language was released in 1995, various groups at the University of Illinois were already experimenting with the then-new language. Among these efforts was a collaboration among several departments; the goal was to produce data browsing tools for use in astronomy and other scientific fields.1 Because The HDF Group was formed to provide access to scientific and engineering data, it seemed natural to extend this early Java work to the display of HDF files and data products. Continue reading →
Interestingly enough, in addition to being known as the place to go for BBQ and live music, Austin, Texas is a major hub of Python development. Each year, Austin is host to the annual confab of Python developers known as the SciPy Conference. Enthought, a local Python-based company, was the major sponsor of the conference and did a great job of organizing the event. By the way, Enthought is active in Python-based training, and I thought the tutorial sessions I attended were very well done. If you would like to get some expert training on various aspects of Python, check outtheir offerings.
As a first-time conference attendee, I found attending the talks and tutorials very informative and entertaining. The conference’s focus is the set of packages that form the core of the SciPy ecosystem (SciPy, iPython, NumPy, Pandas, Matplotlib, and SymPy) and the ever-increasing number of specialized packages around this core. Continue reading →
We’ve recently announced a new viewer application for HDF5 files: HDF Compass. In this blog post we’ll explore the motivations for providing this tool, review its features, and speculate a bit about future direction for Compass.
HDF Compass is a desktop viewer application for HDF5 and other file formats. A free and open source software product, it runs on Mac OS X, Windows, and Linux.
Before the recent release of our PyHexad Excel add-in for HDF5, the title might have sounded like the slogan of a global coffee and baked goods chain. That was then. Today, it is an expression of hope for the spreadsheet users who run this country and who either felt neglected by the HDF5 community or who might suffer from a medical condition known as data-bulging workbook stress disorder. In this article, I would like to give you a quick overview of the novel PyHexad therapy and invite you to get involved (after consulting with your doctor).
To access the data in HDF5 files from Excel is a frontrunner among the all-time TOP 10 most frequently asked for features. A spreadsheet tool might be a convenient window into, and user interface for, certain data stored in HDF5 files. Such a tool could help overcome Excel storage and performance limitations, and allow data to be freely “shuttled” between worksheets and HDF5 data containers. PyHexad (,,,) is an attempt to further explore this concept. Continue reading →
UPDATE Wednesday, March 23, 2016: The HDF5-1.10.0-pre2 release is now available, featuring:
– Concurrent Access to an HDF5 File: Single Writer / Multiple Reader (SWMR) – Virtual Dataset (VDS) – Scalable Chunk Indexing – Persistent Free Filespace Tracking – Collective Metadata I/O – Integration of Java HDF5 JNI into HDF5 – Many changes have been made to the HDF5 configuration –Unfortunately, parallel HDF5 enhancement has been postponed
This version contains a fix for an issue which occurred when building HDF5 within the source code directory.
The HDF Group is committed to meeting our users’ needs and expectations for managing data in today’s fast evolving computational environment. We are pleased to report that the upcoming major new release of HDF5 (HDF5 1.10.0) will have new capabilities that address important data challenges faced by our community. In this blog we introduce you to some of these exciting new features and capabilities.
More powerful than ever before and packed with new features, the release is scheduled for March, 2016. Among many enhancements, HDF5 1.10.0 addresses:
If you have encountered challenges in any of these areas, then we are certain that the upcoming HDF5 1.10.0 will be of interest to you. Continue reading →