America Runs on Excel and HDF5*

* With Python’s Help

Gerd Heber, The HDF Group

Before the recent release of our PyHexad Excel add-in for HDF5 [1], the title might have sounded like the slogan of a global coffee and baked goods chain. That was then. Today, it is an expression of hope for the spreadsheet users who run this country and who either felt neglected by the HDF5 community or who might suffer from a medical condition known as data-bulging workbook stress disorder. In this article, I would like to give you a quick overview of the novel PyHexad therapy and invite you to get involved (after consulting with your doctor).

To access the data in HDF5 files from Excel is a frontrunner among the all-time TOP 10 most frequently asked for features. A spreadsheet tool might be a convenient window into, and user interface for, certain data stored in HDF5 files. Such a tool could help overcome Excel storage and performance limitations, and allow data to be freely “shuttled” between worksheets and HDF5 data containers. PyHexad ([4],[5],[6],[7]) is an attempt to further explore this concept.   Continue reading

HDF5 for the Web – HDF Server

John Readey, The HDF Group

HDF5 is a great way to store large data collections, but size can pose its own challenges.  As a thought experiment, imagine this scenario:

Monopoly ukYou write an application that creates the ultimate Monte Carlo simulation of the Monopoly game. The application plays through 1000’s of simulated games for a hundred different strategies and saves its results to an HDF5 file. Given that we want to capture all the data from each simulation, let’s suppose the resultant HDF5 file is over a gigabyte in size.

Naturally, you’d like to share these results with all your Monopoly-playing, statistically-minded friends, but herein lies the problem: How can you make this data accessible?  Your file is too large to put on Dropbox, and even if you did use an online storage provider, interested parties would need to download the entire file when perhaps they are only interested in the results for “Strategy #89: Buy just Park Place and Boardwalk.”  If we could store the data in one place, but enable access to it over the web using all the typical HDF5 operations (listing links, getting type information, dataset slices, etc.) that would be the answer to our conundrum.  Continue reading