Blog

Jun

2015

Letter to the HDF User Community

Lindsay Powers - The HDF Group The HDF Group provides free, open-source software that is widely used in government, academia and industry. The goal of The HDF Group is to ensure the sustainable development of HDF (Hierarchical Data Format) technologies and the ongoing accessibility of HDF-stored data because users and organizations have mission-critical systems and archives relying on these technologies. These users and organizations are a critical element of the HDF community and an important source of new and innovative uses of, and sustainability for, the HDF platforms, libraries and tools. We want to create a sustainability model for the open access platforms and libraries that can serve these diverse communities in the future use and preservation of their data. As a...

Jun

2015

America Runs on Excel and HDF5*

* With Python’s Help

Gerd Heber, The HDF Group

Before the recent release of our PyHexad Excel add-in for HDF5^[1], the title might have sounded like the slogan of a global coffee and baked goods chain. That was then. Today, it is an expression of hope for the spreadsheet users who run this country and who either felt neglected by the HDF5 community or who might suffer from a medical condition known as data-bulging workbook stress disorder. In this article, I would like to give you a quick overview of the novel PyHexad therapy and invite you to get involved (after consulting with your doctor).

To access the data in HDF5 files from Excel is a frontrunner among the all-time TOP 10 most frequently asked for features. A spreadsheet tool might be a convenient window into, and user interface for, certain data stored in HDF5 files. Such a tool could help overcome Excel storage and performance limitations, and allow data to be freely “shuttled” between worksheets and HDF5 data containers. PyHexad (^[4],^[5],^[6],^[7]) is an attempt to further explore this concept.

May

2015

What’s coming in the HDF5 1.10.0 Release?

Elena Pourmal and Quincey Koziol - The HDF Group UPDATE: Check our support pages for the newest version of HDF5-1.10.0. Concurrent Access to an HDF5 File: Single Writer / Multiple Reader (SWMR) Virtual Dataset (VDS) Scalable Chunk Indexing Persistent Free Filespace Tracking Collective Metadata I/O Integration of Java HDF5 JNI into HDF5 Many changes have been made to the HDF5 configuration Unfortunately, parallel HDF5 enhancement has been postponed This version contains a fix for an issue which occurred when building HDF5 within the source code directory. Check our downloads page for more information. We are still on target for releasing HDF5-1.10.0 next week, let us know if you have any comments! The HDF Group is committed to meeting our users' needs and expectations for...

May

2015

Worried about your unlimited data plan bills? Cut them with OPeNDAP

Large, rich and complex collections of HDF data can be filtered and viewed with the help of OPeNDAP. HDF data can be provided in manageable servings, on demand in real time, inexpensively, even on the user's desktop or mobile device....

May

2015

The HDF5 “Value Proposition” for the Fusion Data Lifecycle

When storing data, the rich, portable metadata capabilities, including directed graph structures (e.g., hierarchies), complex attributes, and inter-object references make HDF5 a superior choice for maintaining the bond between data and metadata at the lowest level. Community involvement is an essential part of the HDF Group’s mission: It is vital to sustaining the business and is our brain trust when making decisions about changes to HDF5, setting priorities, and adding new features. ...

Apr

2015

HDF5 Data Compression Demystified #1

Elena Pourmal, The HDF Group What happened to my compression? One of the most powerful features of HDF5 is the ability to compress or otherwise modify, or “filter,” your data during I/O. By far, the most common user-defined filters are ones that perform data compression. As you know, there are many compression options. There are filters provided by the HDF5 library (“predefined filters,”) which include several types of filters for data compression, data shuffling and checksum. Users can implement their own “user-defined filters” and employ them with the HDF5 library. [caption id="attachment_10741" align="alignright" width="300"] Cars in a 1973 Philadelphia junkyard – image from National Archives and Records Administration[/caption] While the programming model and usage of the compression filters is straightforward, it is possible for...

Apr

2015

Putting some Spark into HDF-EOS

...we focus on how far we can push our personal computing devices with Spark. It consists of 7,850 HDF-EOS5 files covering 27 years and totals about 120 GB. We use a driver script, which reads a dataset of interest from each file in the collection, computes per-file quantities of interest, and gathers them in a CSV file for visualization. The processing time on our reference tablet machine for 3.5 years of data using 4 logical processors was about 10 seconds....

Apr

2015

Parallel I/O – Why, How, and Where to?

Mohamad Chaarawi, The HDF Group

First in a series: parallel HDF5

What costs applications a lot of time and resources rather than doing actual computation? Slow I/O. It is well known that I/O subsystems are very slow compared to other parts of a computing system. Applications use I/O to store simulation output for future use by analysis applications, to checkpoint application memory to guard against system failure, to exercise out-of-core techniques for data that does not fit in a processor’s memory, and so on. I/O middleware libraries, such as HDF5, provide application users with a rich interface for I/O access to organize their data and store it efficiently. A lot of effort is invested by such I/O libraries to reduce or completely hide the cost of I/O from applications.

Parallel I/O is one technique used to access data on disk simultaneously from different application processes to maximize bandwidth and speed things up. There are several ways to do parallel I/O, and I will highlight the most popular methods that are in use today.

Blue Waters supercomputer at the National Center for Supercomputing Applications, University of Illinois, Urbana-Champaign campus. Blue Waters is supported by the National Science Foundation and the University of Illinois.

First, to leverage parallel I/O, it is very important that you have a parallel file system;

Letter to the HDF User Community

America Runs on Excel and HDF5*

* With Python’s Help

Gerd Heber, The HDF Group

What’s coming in the HDF5 1.10.0 Release?

Worried about your unlimited data plan bills? Cut them with OPeNDAP

The HDF5 “Value Proposition” for the Fusion Data Lifecycle

HDF5 Data Compression Demystified #1

Putting some Spark into HDF-EOS

Parallel I/O – Why, How, and Where to?

Mohamad Chaarawi, The HDF Group

First in a series: parallel HDF5

Parallel I/O is one technique used to access data on disk simultaneously from different application processes to maximize bandwidth and speed things up. There are several ways to do parallel I/O, and I will highlight the most popular methods that are in use today.

Latest Posts

Atmos Data Store project: A Highly Scalable Data Service (HSDS) Use Case

Release of HDFView 3.3.2 (Newsletter #203)

New HDF5 CVE Issues (Fixed in 1.14.4)

Latest Tweets

Connect

Get Started

Blog

* With Python’s Help

Gerd Heber, The HDF Group

Share this:

Mohamad Chaarawi, The HDF Group

First in a series: parallel HDF5

Parallel I/O is one technique used to access data on disk simultaneously from different application processes to maximize bandwidth and speed things up. There are several ways to do parallel I/O, and I will highlight the most popular methods that are in use today.

Share this:

Latest Posts

Latest Tweets

Connect

Get Started