Enhancing I/O throughput via efficient routing and placement for large-scale parallel file systems

David A. Dillow, Galen M. Shipman, Sarp Oral, Zhe Zhang, Youngjae Kim

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

13 Scopus citations

Abstract

As storage systems get larger to meet the demands of petascale systems, careful planning must be applied to avoid congestion points and extract the maximum performance. In addition, the large data sets generated by such systems makes it desirable for all compute resources to have common access to this data without needing to copy it to each machine. This paper describes a method of placing I/O close to the storage nodes to minimize contention on Cray's SeaStar2+ network, and extends it to a routed Lustre configuration to gain the same benefits when running against a center-wide file system. Our experiments using half of the resources of Spider - the center-wide file system at the Oak Ridge Leadership Computing Facility - show that I/O write bandwidth can be improved by up to 45% (from 71.9 to 104 GB/s) for a direct-attached configuration and by 137% (47.6 GB/s to 115 GB/s) for a routed configuration. We demonstrated up to 20.7% reduction in run-time for production scientific applications. With the full Spider system, we demonstrated over 240 GB/s of aggregate bandwidth using our techniques.

Original languageEnglish
Title of host publication30th IEEE International Performance Computing and Communications Conference, IPCCC 2011
DOIs
StatePublished - 2011
Event30th IEEE International Performance, Computing and Communications Conference, IPCCC 2011 - Orlando, FL, United States
Duration: Nov 17 2011Nov 19 2011

Publication series

NameConference Proceedings of the IEEE International Performance, Computing, and Communications Conference

Conference

Conference30th IEEE International Performance, Computing and Communications Conference, IPCCC 2011
Country/TerritoryUnited States
CityOrlando, FL
Period11/17/1111/19/11

Keywords

  • Lustre file systems
  • Network congestion
  • SeaStar network
  • Spider

Fingerprint

Dive into the research topics of 'Enhancing I/O throughput via efficient routing and placement for large-scale parallel file systems'. Together they form a unique fingerprint.

Cite this