Abstract
Over the past decade, the trajectory to the petascale has been built on increased complexity and scale of the underlying parallel architectures. Meanwhile, software developers have struggled to provide tools that maintain the productivity of computational science teams using these new systems. In this regard, Global Address Space (GAS) programming models provide a straightforward and easy to use addressing model, which can lead to improved productivity. However, the scalability of GAS depends directly on the design and implementation of the runtime system on the target petascale distributed-memory architecture. In this paper, we describe the design, implementation, and optimization of the Aggregate Remote Memory Copy Interface (ARMCI) runtime library on the Cray XT5 2.3 PetaFLOPs computer at Oak Ridge National Laboratory.We optimized our implementationwith the flowintimation technique that we have introduced in this paper. Our optimized ARMCI implementation improves scalability of both the Global Arrays programming model and a realworld chemistry application-NWChem-from small jobs up through 180,000 cores.
Original language | English |
---|---|
Pages (from-to) | 633-655 |
Number of pages | 23 |
Journal | International Journal of Parallel Programming |
Volume | 40 |
Issue number | 6 |
DOIs | |
State | Published - Dec 2012 |
Funding
This paper was authored by at least one employee of UT-Battelle, LLC, under contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the paper for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this contribution, or allow others to do so, for United States Government purposes.
Keywords
- ARMCI
- Flow control
- GA
- GAS
- Global address space
- Global arrays
- NWChem
- PGAS
- XT5