Adaptable, metadata rich IO methods for portable high performance IO

Jay Lofstead, Fang Zheng, Scott Klasky, Karsten Schwan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

126 Scopus citations

Abstract

Since IO performance on HPC machines strongly depends on machine characteristics and configuration, it is important to carefully tune IO libraries and make good use of appropriate library APIs. For instance, on current petascale machines, independent IO tends to outperform collective IO, in part due to bottlenecks at the metadata server. The problem is exacerbated by scaling issues, since each IO library scales differently on each machine, and typically, operates efficiently to different levels of scaling on different machines. With scientific codes being run on a variety of HPC resources, efficient code execution requires us to address three important issues: (1) end users should be able to select the most efficient IO methods for their codes, with minimal effort in terms of code updates or alterations; (2) such performance-driven choices should not prevent data from being stored in the desired file formats, since those are crucial for later data analysis; and (3) it is important to have efficient ways of identifying and selecting certain data for analysis, to help end users cope with the flood of data produced by high end codes. This paper employs ADIOS, the ADaptable IO System, as an IO API to address (1)-(3) above. Concerning (1), ADIOS makes it possible to independently select the IO methods being used by each grouping of data in an application, so that end users can use those IO methods that exhibit best performance based on both IO patterns and the underlying hardware. In this paper, we also use this facility of ADIOS to experimentally evaluate on petascale machines alternative methods for high performance IO. Specific examples studied include methods that use strong file consistency vs. delayed parallel data consistency, as that provided by MPI-IO or POSIX IO. Concerning (2), to avoid linking IO methods to specific file formats and attain high IO performance, ADIOS introduces an efficient intermediate file format, termed BP, which can be converted, at small cost, to the standard file formats used by analysis tools, such as NetCDF and HDF-5. Concerning (3), associated with BP are efficient methods for data characterization, which compute attributes that can be used to identify data sets without having to inspect or analyze the entire data contents of large files.

Original languageEnglish
Title of host publicationIPDPS 2009 - Proceedings of the 2009 IEEE International Parallel and Distributed Processing Symposium
DOIs
StatePublished - 2009
Event23rd IEEE International Parallel and Distributed Processing Symposium, IPDPS 2009 - Rome, Italy
Duration: May 23 2009May 29 2009

Publication series

NameIPDPS 2009 - Proceedings of the 2009 IEEE International Parallel and Distributed Processing Symposium

Conference

Conference23rd IEEE International Parallel and Distributed Processing Symposium, IPDPS 2009
Country/TerritoryItaly
CityRome
Period05/23/0905/29/09

Fingerprint

Dive into the research topics of 'Adaptable, metadata rich IO methods for portable high performance IO'. Together they form a unique fingerprint.

Cite this