Cooperative server clustering for a scalable GAS model on petascale cray XT5 systems

Weikuan Yu, Xinyu Que, Vinod Tipparaju, Richard L. Graham, Jeffrey S. Vetter

Research output: Contribution to journalArticlepeer-review

Abstract

Global Address Space (GAS) programming models are attractive because they retain the easy-to-use addressing model that is the characteristic of shared-memory style load and store operations. The scalability of GAS models depends directly on the design and implementation of runtime libraries on the targeted platforms. In this paper, we examine the memory requirement of a popular GAS run-time library, Aggregate Remote Memory Copy Interface (ARMCI) on petascale Cray XT 5 systems. Then we describe a new technique cooperative server clustering that enhances the memory scalability of ARMCI communication servers. In cooperative server clustering, ARMCI servers are organized into clusters, and cooperatively process incoming communication requests among them. A request intervention scheme is also designed to expedite the return of responses to the initiating processes. Our experimental results demonstrate that, with very little impact on ARMCI communication latency and bandwidth, cooperative server clustering is able to significantly reduce the memory requirement of ARMCI communication servers, thereby enabling highly scalable scientific applications. In particular, it dramatically reduces the total execution time of a scientific application, NWChem, by 45% on 2400 processes.

Original languageEnglish
Pages (from-to)57-64
Number of pages8
JournalComputer Science - Research and Development
Volume25
Issue number1-2
DOIs
StatePublished - 2010

Funding

This work was funded in part by a UT-Battelle grant (UT-B-4000087151) to Auburn University, and in part by National Center for Computational Sciences. This research used resources of the National Center for Computational Sciences at Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This research was also supported by an allocation of advanced computing resources provided by the National Science Foundation. Part of the computations were performed on Kraken (a Cray XT5) at the National Institute for Computational Sciences ( http://www.nics.tennessee.edu/ ).

FundersFunder number
National Center for Computational Sciences
National Science Foundation1059376
U.S. Department of EnergyDE-AC05-00OR22725
Office of Science
Auburn University
UT-BattelleUT-B-4000087151

    Keywords

    • ARMCI
    • Cray XT5
    • PGAS

    Fingerprint

    Dive into the research topics of 'Cooperative server clustering for a scalable GAS model on petascale cray XT5 systems'. Together they form a unique fingerprint.

    Cite this