The article leads with, “In the quarter century since the US last exploded a nuclear weapon, an extensive research enterprise has maintained the resources and know-how needed to preserve confidence in the country’s stockpile.” It goes on to give the history of how the US Department of Energy (DOE) and its Los Alamos, Sandia and Lawrence Livermore national laboratories pioneered the use of high-performance computing to use computer simulation as a replacement for the actual building and testing of the USA’s nuclear weapons stockpile.
Although HDF5 is not named in this article, the history of The HDF Group and HDF5 are closely linked to this larger story of American science and geopolitics. In 1993, DOE determined that its computing capabilities would require massive improvements, as the article says, to “ramp up computation speeds by a factor of 10,000 over the highest performing computers at the time, equivalent to a factor of 1 million over computers routinely used for nuclear calculations… To meet the [ten-year] goal, the DOE laboratories had to engage the computer industry in massively parallel processing, a technology that was just becoming available, to develop not just new hardware but new software and visualization techniques.” Continue reading →
HDFql (Hierarchical Data Format query language) was recently released to enable users to handle HDF5 files with a language as easy and powerful as SQL.
By providing a simpler, cleaner, and faster interface for HDF across C/C++/Java/Python/C#, HDFql aims to ease scientific computing, big data management, and real-time analytics. As the author of HDFql, Rick is collaborating with The HDF Group by integrating HDFql with tools such as HDF Compass, while continuously improving HDFql to feed user needs.
If you’re handling HDF files on a regular basis, chances are you’ve had your (un)fair share of programming headaches. Sure, you might have gotten used to the hassle, but navigating the current APIs probably feels a tad like filing expense reports: rarely a complete pleasure!
If you’re new to HDF, you might seek to avoid the format all together. Even trained users have been known to occasionally scout for alternatives. One doesn’t have to have a limited tolerance for unnecessary complexity to get queasy around these APIs – one simply needs a penchant for clean and simple data management.
This is what we heard from scientists and data veterans when asked about HDF. It’s what challenged our own synapses and inspired us to create HDFql. Because on the flip-side, we also heard something else:
HDF has proven immensely valuable in research and science
the data format pushes the boundaries on what is achievable with large and complex datasets
and it provides an edge on speed and fast access which is critical in the big data / advanced analytics arena
With an aspiration of becoming the de facto language for HDF, we hope that HDFql will play a vital role in the future of HDF data management by:
Enabling current users to arrive at (scientific) insights faster via cleaner data handling experiences
Inspiring prospective users to adopt the powerful data format HDF by removing current roadblocks
Perhaps even grabbing a few HDF challengers or dissenters along the way…
Many NASA HDF and HDF5 data products can be visualized via the Hyrax OPeNDAP server through Hyrax’s HDF4 and HDF5 handlers. Now we’ve enhanced the HDF5 OPeNDAP handler so that SMAP level 1, level 3 and level 4 products can be displayed properly using popular visualization tools.
Organizations in both the public and private sectors use HDF to meet long term, mission-critical data management needs. For example, NASA’s Earth Observing System, the primary data repository for understanding global climate change, uses HDF. Over the lifetime of the project, which began in 1999, NASA has stored 15 petabytes of satellite data in HDF which will be accessible by NASA data centers and NASA HDF end users for many years to come.
In a previous blog, we discussed the concept of using the Hyrax OPeNDAP web server to serve NASA HDF4 and HDF5 products. Each year, The HDF Group has enhanced the HDF4 and HDF5 handlers that work within the Hyrax OPeNDAP framework to support all sorts of NASA HDF data products, making them interoperable with popular Earth Science tools such as NASA’s Panoply and UCAR’s IDV. The Hyrax HDF4 and HDF5 handlers make data products display properly using popular visualization tools. Continue reading →
The HDF Group is collaborating with the University of California, Santa Barbara and Data Observation Network for Earth (DataONE), to help scientific research communities enhance the consistency and quality of their metadata, to foster discovery, access and understanding of data resources. As part of this collaboration, on February 9, 2016, The HDF Group’s Ted Habermann, Director of Earth Science, and Lindsay Powers, Deputy Director of Earth Science will co-lead a webinar “Sharing Data Through Guided Metadata Improvement” along with Matthew Jones, Director of Informatics Research at the National Center for Ecological Analysis and Synthesis. Continue reading →
The ESIP Federation comes together twice each year to discuss topics around changing technology, data, information and knowledge in support of society. ESIP meetings are interdisciplinary and inclusive. Among the attendees are Earth science data and information technology practitioners; researchers representing a variety of scientific domains that include land, atmosphere, ocean, solid earth, ecology, data and social sciences; science educators; and anyone working in science and technology-related fields who is interested in advancing Earth science information best practices in an open and transparent fashion. Continue reading →
The 2015 HDF workshop held during the ESIP Summer Meeting was a great success thanks to more than 40 participants throughout the four sessions. The workshop was an excellent opportunity for us to interact with HDF community members to better understand their needs and introduce them to new technologies. You can view the slide presentations from the workshop here.
From my perspective, the highlight of the workshop was the Vendors and Tools Session where we heard from Ellen Johnson (Mathworks), Christine White (Esri), Brian Tisdale (NASA), and Gerd Heber (The HDF Group) talk about new, and improved applications of HDF technologies. For example: Continue reading →
David Dotson, doctoral student, Center for Biological Physics, Arizona State University; HDF Guest Blogger
Recently I had the pleasure of meeting Anthony Scopatz for the first time at SciPy 2015, and we talked shop. I was interested in his opinions on MDSynthesis, a Python package our lab has designed to help manage the complexity of raw and derived data sets from molecular dynamics simulations, about which I was presenting a poster (click zip file to download).
In particular, I wanted his thoughts on how we are leveraging HDF5, and whether we could be doing it better. The discussion gave me plenty to think about going forward, but it also put me in contact with some of the other folks involved in the Python ecosystem surrounding HDF5. Long story short, I was asked to share how we were using HDF5 with a guest post on the HDF Group blog.
First a bit of background. At the Beckstein Lab we perform physics-based simulations of proteins, the molecular machines of life, in order to get at how they do what they do. These simulations may include thousands to millions of atoms, with the raw data a trajectory of their positions with time, which can have hundreds to millions of frames. Continue reading →
Interestingly enough, in addition to being known as the place to go for BBQ and live music, Austin, Texas is a major hub of Python development. Each year, Austin is host to the annual confab of Python developers known as the SciPy Conference. Enthought, a local Python-based company, was the major sponsor of the conference and did a great job of organizing the event. By the way, Enthought is active in Python-based training, and I thought the tutorial sessions I attended were very well done. If you would like to get some expert training on various aspects of Python, check outtheir offerings.
As a first-time conference attendee, I found attending the talks and tutorials very informative and entertaining. The conference’s focus is the set of packages that form the core of the SciPy ecosystem (SciPy, iPython, NumPy, Pandas, Matplotlib, and SymPy) and the ever-increasing number of specialized packages around this core. Continue reading →
I first heard of HDF during the “Data Format Wars” of the 1990’s. These “battles” centered on the selection of a format for the emerging NASA Earth Observing System archives, and there were a number of contenders. HDF won that battle in the end because of the inherent flexibility of the format and the tools for reading and writing it.
Now, twenty years later, HDF has emerged as the foundation format for an incredibly diverse and growing selection of scientific and commercial disciplines.
Is it the inherent flexibility of the format that has led to this success? Maybe, but I would pick information integration as the killer HDF feature. Continue reading →