David Dotson, doctoral student, Center for Biological Physics, Arizona State University; HDF Guest Blogger
Recently I had the pleasure of meeting Anthony Scopatz for the first time at SciPy 2015, and we talked shop. I was interested in his opinions on MDSynthesis, a Python package our lab has designed to help manage the complexity of raw and derived data sets from molecular dynamics simulations, about which I was presenting a poster (click zip file to download).
In particular, I wanted his thoughts on how we are leveraging HDF5, and whether we could be doing it better. The discussion gave me plenty to think about going forward, but it also put me in contact with some of the other folks involved in the Python ecosystem surrounding HDF5. Long story short, I was asked to share how we were using HDF5 with a guest post on the HDF Group blog.
First a bit of background. At the Beckstein Lab we perform physics-based simulations of proteins, the molecular machines of life, in order to get at how they do what they do. These simulations may include thousands to millions of atoms, with the raw data a trajectory of their positions with time, which can have hundreds to millions of frames. Continue reading →
In my previous blog post, I discussed the need for parallel I/O and a few paradigms for doing parallel I/O from applications. HDF5 is an I/O middleware library that supports (or will support in the near future) most of the I/O paradigms we talked about.
In this blog post I will discuss how to use HDF5 to implement some of the parallel I/O methods and some of the ongoing research to support new I/O paradigms. I will not discuss pros and cons of each method since we discussed those in the previous blog post.
But before getting on with how HDF5 supports parallel I/O, let’s address a question that comes up often, which is,
“Why do I need Parallel HDF5 when the MPI standard already provides an interface for doing I/O?”
“Any software used in the computational sciences needs to excel in the area of high performance computing (HPC).”
The Computational Fluid Dynamics (CFD) General Notation System (CGNS) is an effort to standardize CFD input and output data, including grid (both structured and unstructured), flow solution, connectivity, boundary conditions, and auxiliary information. It provides a general, portable, and extensible standard for the storage and retrieval of CFD analysis data. The system consists of two parts: (1) a standard format for recording the data, and (2) software that reads, writes, and modifies data in that format. Continue reading →