Anthony Scopatz, Assistant Professor at the University of South Carolina, HDF guest blogger
“Python is great and its ecosystem for scientific computing is world class. HDF5 is amazing and is rightly the gold standard for persistence for scientific data.Many people use HDF5 from Python, and this number is only growing due to pandas’ HDFStore. However, using HDF5 from Python has at least one more knot than it needs to. Let’s change that.”
Almost immediately when going to use HDF5 from Python you are faced with a choice between two fantastic packages with overlapping capabilities: h5py and PyTables. h5py wraps the HDF5 API more closely using autogenerated Cython. PyTables, while also wrapping HDF5, focuses more on a Table data structure and adds in sophisticated indexing and out-of-core querying. Which package you use depends on your use case – and sometimes you really need both!
At SciPy 2015, developers from PyTables, h5py, The HDF Group, pandas, as well as community members sat down and talked about what to do to make the story for Python and HDF5 more streamlined and more maintainable. Here is what we came up with: Continue reading →
David Dotson, doctoral student, Center for Biological Physics, Arizona State University; HDF Guest Blogger
Recently I had the pleasure of meeting Anthony Scopatz for the first time at SciPy 2015, and we talked shop. I was interested in his opinions on MDSynthesis, a Python package our lab has designed to help manage the complexity of raw and derived data sets from molecular dynamics simulations, about which I was presenting a poster (click zip file to download).
In particular, I wanted his thoughts on how we are leveraging HDF5, and whether we could be doing it better. The discussion gave me plenty to think about going forward, but it also put me in contact with some of the other folks involved in the Python ecosystem surrounding HDF5. Long story short, I was asked to share how we were using HDF5 with a guest post on the HDF Group blog.
First a bit of background. At the Beckstein Lab we perform physics-based simulations of proteins, the molecular machines of life, in order to get at how they do what they do. These simulations may include thousands to millions of atoms, with the raw data a trajectory of their positions with time, which can have hundreds to millions of frames. Continue reading →
Interestingly enough, in addition to being known as the place to go for BBQ and live music, Austin, Texas is a major hub of Python development. Each year, Austin is host to the annual confab of Python developers known as the SciPy Conference. Enthought, a local Python-based company, was the major sponsor of the conference and did a great job of organizing the event. By the way, Enthought is active in Python-based training, and I thought the tutorial sessions I attended were very well done. If you would like to get some expert training on various aspects of Python, check outtheir offerings.
As a first-time conference attendee, I found attending the talks and tutorials very informative and entertaining. The conference’s focus is the set of packages that form the core of the SciPy ecosystem (SciPy, iPython, NumPy, Pandas, Matplotlib, and SymPy) and the ever-increasing number of specialized packages around this core. Continue reading →
Before the recent release of our PyHexad Excel add-in for HDF5, the title might have sounded like the slogan of a global coffee and baked goods chain. That was then. Today, it is an expression of hope for the spreadsheet users who run this country and who either felt neglected by the HDF5 community or who might suffer from a medical condition known as data-bulging workbook stress disorder. In this article, I would like to give you a quick overview of the novel PyHexad therapy and invite you to get involved (after consulting with your doctor).
To access the data in HDF5 files from Excel is a frontrunner among the all-time TOP 10 most frequently asked for features. A spreadsheet tool might be a convenient window into, and user interface for, certain data stored in HDF5 files. Such a tool could help overcome Excel storage and performance limitations, and allow data to be freely “shuttled” between worksheets and HDF5 data containers. PyHexad (,,,) is an attempt to further explore this concept. Continue reading →
“…HDF5 is that rare product which excels in two fields: archiving and sharing data according to strict standardized conventions, and also ad-hoc, highly flexible and iterative use for local data analysis. For more information on using Python together with HDF5…”
An enormous amount of effort has gone into the HDF ecosystem over the past decade. Because of a concerted effort between The HDF Group, standards bodies, and analysis software vendors, HDF5 is one of the best technologies on the planet for sharing numerical data. Not only is the format itself platform-independent, but nearly every analysis platform in common use can read HDF5. This investment continues with tools like HDF Product Designer and the REST-based H5Serv project, for sharing data using the HDF5 object model over the Internet.
What I’d like to talk about today is something very different: the way that I and many others in the Python world use HDF5, not for widely-shared data but for data that may never even leave the local disk… Continue reading →