Would you like SQL with your HDF5?

Gerd Heber, The HDF Group

In the face of naysayers, the SQL abides.  Read in our latest blog post how HDF5 and ODBC tie the room together.

When I order my HDF5 at Moe’s Data Diner, I usually ask for extra napkins. It’s a meal where you need both hands, and it can be messy before you get to the juicy bits.  At least that’s the way it used to be. It’s easier to dissect with h5py [8], but what’s the hungry stranger to do, who is just coming through town and who is clueless? When asked the other night, I didn’t fall off my chair nor did I choke, but the question got my head going. What would it look like, that SQL on-the-side thing?

Say, I have an HDF5 dataset at /group1/A/dset2 and would like to select a few elements like so:

SELECT * FROM /group1/A/dset2 WHERE value > -999.0

Nice, but how do I get the result into my favorite analytics tool? Isn’t there some standard pipe or conduit that helps me over that last mile? Well, it’s kind of embarrassing to admit, but it’s been there since the early 1990s and is called Open Database Connectivity (ODBC) [4].

ODBC driverThink of ODBC as the “USB of data sources.” If you have a USB driver for a device, it’s game on. If you have an ODBC driver for your data source, then it’s SQL, milk, and honey from here. A growing number of applications come with some module or package for accessing data stored in HDF5 files, but you can be almost certain that your tool of choice has an ODBC client built in.

To build an HDF5/ODBC driver, we need a splash of the “secret sauce” shown in the figure above, and this is the subject of this blog post. Spoiler Alert: We are not giving away the recipe.

Continue reading

The HDF Group is New OCC Member

John Readey, The HDF Group

We’re pleased to announce that The HDF Group is now a member of the Open Commons Consortium (formerly Open Cloud Consortium), a not for profit that manages and operates cloud computing and data commons infrastructure to support scientific, medical, health care and environmental research.

OCC–3.0

The HDF Group will be participating in the NOAA Data Alliance Working Group (WG) on the WG committee that will determine the datasets to be hosted in the NOAA data commons as well as tools to be used in the computational ecosystem surrounding the NOAA data commons.

“The Open Commons Consortium (OCC) is a truly innovative concept for supporting scientific computing,” said Mike Folk, The HDF Group’s President. “Their cloud computing and data commons infrastructure supports a wide range of research, and OCC’s membership spans government, academia, and the private sector.  This is a good opportunity for us to learn about how we can best serve these communities.”

2015-11-10 14_55_08-- Open Science Data Cloud zoom
OSDC website

The HDF Group will also participate in the Open Science Data Cloud working group and receive resource allocations on the OSDC Griffin resource.  The HDF Group’s John Readey is working with the OCC and others to investigate ways to use Griffin effectively.  Readey says, “Griffin is a great testbed for cloud-based systems.  With access to object storage (using the AWS/S3 api) and the ability to programmatically create VM’s, we will explore new methods for the analysis of scientific datasets.”  Continue reading

Whither HDF Java?

Joel Plutchak, The HDF Group

The HDF Group’s support for and use of the Java Programming Language consists of Java wrappers for the HDF4 and HDF5 C libraries, an Object Model definition and implementation, and HDFView, a graphical file viewing application. In this article we’ll discuss what we’re doing now with Java, and look toward the future.

The screen capture shows some of the capabilities of the HDFView application. Being displayed is a JPSS Mission VIIRS (Visible Infrared Imaging Radiometer Suite) Day-Night band dataset in table form and image form with false color palette attached.
The screen capture shows some of the capabilities of the HDFView application.  Displayed is a JPSS Mission VIIRS (Visible Infrared Imaging Radiometer Suite) Day-Night band dataset in table form and image form with false color palette attached.

By the time the first public version of the Java Programming Language was released in 1995, various groups at the University of Illinois were already experimenting with the then-new language.  Among these efforts was a collaboration among several departments; the goal was to produce data browsing tools for use in astronomy and other scientific fields.1  Because The HDF Group was formed to provide access to scientific and engineering data, it seemed natural to extend this early Java work to the display of HDF files and data products.  Continue reading