HDF at SciPy2015

John Readey, The HDF Group

scipy2015_logo_simple

Interestingly enough, in addition to being known as the place to go for BBQ and live music, Austin, Texas is a major hub of Python development.  Each year, Austin is host to the annual confab of Python developers known as the SciPy Conference.  Enthought, a local Python-based company, was the major sponsor of the conference and did a great job of organizing the event.  By the way, Enthought is active in Python-based training, and I thought the tutorial sessions I attended were very well done.  If you would like to get some expert training on various aspects of Python, check out their offerings.

As a first-time conference attendee, I found attending the talks and tutorials very informative and entertaining.  The conference’s focus is the set of packages that form the core of the SciPy ecosystem (SciPy, iPython, NumPy, Pandas, Matplotlib, and SymPy) and the ever-increasing number of specialized packages around this core.      Continue reading

HDF5 for the Web – HDF Server

John Readey, The HDF Group

HDF5 is a great way to store large data collections, but size can pose its own challenges.  As a thought experiment, imagine this scenario:

Monopoly ukYou write an application that creates the ultimate Monte Carlo simulation of the Monopoly game. The application plays through 1000’s of simulated games for a hundred different strategies and saves its results to an HDF5 file. Given that we want to capture all the data from each simulation, let’s suppose the resultant HDF5 file is over a gigabyte in size.

Naturally, you’d like to share these results with all your Monopoly-playing, statistically-minded friends, but herein lies the problem: How can you make this data accessible?  Your file is too large to put on Dropbox, and even if you did use an online storage provider, interested parties would need to download the entire file when perhaps they are only interested in the results for “Strategy #89: Buy just Park Place and Boardwalk.”  If we could store the data in one place, but enable access to it over the web using all the typical HDF5 operations (listing links, getting type information, dataset slices, etc.) that would be the answer to our conundrum.  Continue reading