The HDF 2015 Workshop at the ESIP Summer Meeting

Lindsay Powers, The HDF Group

The 2015 HDF workshop held during the ESIP Summer Meeting was a great success thanks to more than 40 participants throughout the four sessions.  The workshop was an excellent opportunity for us to interact with HDF community members to better understand their needs and introduce them to new technologies. You can view the slide presentations from the workshop here.

From my perspective, the highlight of the workshop was the Vendors and Tools Session where we heard from Ellen Johnson (Mathworks), Christine White (Esri), Brian Tisdale (NASA), and Gerd Heber (The HDF Group) talk about new, and improved applications of HDF technologies.  For example:   Continue reading

ESIP Summer Meeting – HDF Workshop and Town Hall

Lindsay Powers, The HDF Group

Please join us to learn about new HDF tools, projects and perspectives.

The HDF Group will be hosting a one-day workshop at the upcoming Federation for Earth Science Information Partners (ESIP) Summer Meeting in Asilomar, CA on Tuesday, July 14th.

There will also be an HDF Town Hall meeting on Wednesday afternoon, July 15th.

Please join us for any and all of the events.  If you are unable to join us in person, you may participate through remote access. Remote access details will be made available through the ESIP meeting website. Questions? Contact Lindsay at lpowers@hdfgroup.org.

The agenda for the July 14 HDF Group workshop:  Continue reading

Putting some Spark into HDF-EOS

Gerd Heber and Joe Lee, The HDF Group

In an earlier blog post [3], we merely floated the idea of bulk-processing HDF5 files with Apache Spark. In this article, we follow up with a few simple use cases and some numbers for a data collection to which many readers will be able to relate.

If the first question on your mind is, “What kind of resources will I need?”, then you have a valid point, but you also might be the victim of BigData propaganda. Consider this: “Most people don’t realize how much number crunching they can do on a single computer.”

HDF HDF-EOS: EOS Satellite Image courtesy of Jesse Allen, NASA Earth Observatory/SSAI
Aura: “A mission dedicated to the health of the earth’s atmosphere” using HDF technologies.  EOS Satellite Image courtesy of Jesse Allen, NASA Earth Observatory/SSAI

“If you don’t have big data problems, you don’t need MapReduce and Hadoop. It’s great to know they exist and to know what you could do if you had big-data problems.” ([5], p. 323)  In this article, we focus on how far we can push our personal computing devices with Spark, and leave the discussion of Big Iron and Big Data vs. big data vs. big data, etc. for another day.  Continue reading

From HDF5 Datasets to Apache Spark RDDs

Gerd Heber, The HDF Group

“I would like to do something with all the datasets in all the HDF5 files in this directory, but I’m not sure how to proceed.”

If this sounds all too familiar, then reading this article might be worth your while. The accepted general answer is to write a Python script (and use h5py [1]), but I am not going to repeat here what you know already. Instead, I will show you how to hot-wire one of the new shiny engines, Apache Spark [2], and make a few suggestions on how to reduce the coding on your part while opening the door to new opportunities.

But what about Hadoop? There is no out-of-the-box interoperability between HDF5 and Hadoop. See our BigHDF FAQs [3] for a few glimmers of hope. Major points of contention remain such as HDFS’s “blocked” worldview and its aversion to relatively small objects, and then there is HDF5’s determination to keep its smarts away from prying eyes. Spark is more relaxed and works happily with HDFS, Amazon S3, and, yes, a local file system or NFS. More importantly, with its Resilient Distributed Datasets (RDD) [4] it raises the level of abstraction and overcomes several Hadoop/MapReduce shortcomings when dealing with iterative methods. See reference [5] for an in-depth discussion.

Figure 1.  A simple HDF5/Spark scenario
Figure 1. A simple HDF5/Spark scenario

As our model problem (see Figure 1), consider the following scenario: Continue reading