TY - GEN
T1 - ISOBAR hybrid compression-I/O interleaving for large-scale parallel I/O optimization
AU - Schendel, Eric R.
AU - Pendse, Saurabh V.
AU - Jenkins, John
AU - Boyuka, David A.
AU - Gong, Zhenhuan
AU - Lakshminarasimhan, Sriram
AU - Liu, Qing
AU - Kolla, Hemanth
AU - Chen, Jackie
AU - Klasky, Scott
AU - Ross, Robert
AU - Samatova, Nagiza F.
PY - 2012
Y1 - 2012
N2 - Current peta-scale data analytics frameworks suffer from a significant performance bottleneck due to an imbalance between their enormous computational power and limited I/O bandwidth. Using data compression schemes to reduce the amount of I/O activity is a promising approach to addressing this problem. In this paper, we propose a hybrid framework for interleaving I/O with data compression to achieve improved I/O throughput side-by-side with reduced dataset size. We evaluate several interleaving strategies, present theoretical models, and evaluate the efficiency and scalability of our approach through comparative analysis. With our theoretical model, considering 19 real-world scientific datasets both from the public domain and peta-scale simulations, we estimate that the hybrid method can result in a 12 to 46% increase in throughput on hard-to-compress scientific datasets. At the reported peak bandwidth of 60 GB/s of uncompressed data for a current, leadership-class parallel I/O system, this translates into an effective gain of 7 to 28 GB/s in aggregate throughput.
AB - Current peta-scale data analytics frameworks suffer from a significant performance bottleneck due to an imbalance between their enormous computational power and limited I/O bandwidth. Using data compression schemes to reduce the amount of I/O activity is a promising approach to addressing this problem. In this paper, we propose a hybrid framework for interleaving I/O with data compression to achieve improved I/O throughput side-by-side with reduced dataset size. We evaluate several interleaving strategies, present theoretical models, and evaluate the efficiency and scalability of our approach through comparative analysis. With our theoretical model, considering 19 real-world scientific datasets both from the public domain and peta-scale simulations, we estimate that the hybrid method can result in a 12 to 46% increase in throughput on hard-to-compress scientific datasets. At the reported peak bandwidth of 60 GB/s of uncompressed data for a current, leadership-class parallel I/O system, this translates into an effective gain of 7 to 28 GB/s in aggregate throughput.
KW - High Performance Computing
KW - Hybrid Interleaving
KW - I/O
KW - ISOBAR
KW - Lossless Compression
KW - Staging
UR - http://www.scopus.com/inward/record.url?scp=84863890077&partnerID=8YFLogxK
U2 - 10.1145/2287076.2287086
DO - 10.1145/2287076.2287086
M3 - Conference contribution
AN - SCOPUS:84863890077
SN - 9781450308052
T3 - HPDC '12 - Proceedings of the 21st ACM Symposium on High-Performance Parallel and Distributed Computing
SP - 61
EP - 72
BT - HPDC '12 - Proceedings of the 21st ACM Symposium on High-Performance Parallel and Distributed Computing
T2 - 21st ACM Symposium on High-Performance Parallel and Distributed Computing, HPDC '12
Y2 - 18 June 2012 through 22 June 2012
ER -