TY - GEN
T1 - Moving the code to the data - Dynamic code deployment using activespaces
AU - Docan, Ciprian
AU - Parashar, Manish
AU - Cummings, Julian
AU - Klasky, Scott
PY - 2011
Y1 - 2011
N2 - Managing the large volumes of data produced by emerging scientific and engineering simulations running on leadership-class resources has become a critical challenge. The data has to be extracted off the computing nodes and transported to consumer nodes so that it can be processed, analyzed, visualized, archived, etc. Several recent research efforts have addressed data-related challenges at different levels. One attractive approach is to offload expensive I/O operations to a smaller set of dedicated computing nodes known as a staging area. However, even using this approach, the data still has to be moved from the staging area to consumer nodes for processing, which continues to be a bottleneck. In this paper, we investigate an alternate approach, namely moving the data-processing code to the staging area rather than moving the data. Specifically, we present the Active Spaces framework, which provides (1) programming support for defining the data-processing routines to be downloaded to the staging area, and (2) run-time mechanisms for transporting binary codes associated with these routines to the staging area, executing the routines on the nodes of the staging area, and returning the results. We also present an experimental performance evaluation of Active Spaces using applications running on the Cray XT5 at Oak Ridge National Laboratory. Finally, we use a coupled fusion application workflow to explore the trade-offs between transporting data and transporting the code required for data processing during coupling, and we characterize the sweet spots for each option.
AB - Managing the large volumes of data produced by emerging scientific and engineering simulations running on leadership-class resources has become a critical challenge. The data has to be extracted off the computing nodes and transported to consumer nodes so that it can be processed, analyzed, visualized, archived, etc. Several recent research efforts have addressed data-related challenges at different levels. One attractive approach is to offload expensive I/O operations to a smaller set of dedicated computing nodes known as a staging area. However, even using this approach, the data still has to be moved from the staging area to consumer nodes for processing, which continues to be a bottleneck. In this paper, we investigate an alternate approach, namely moving the data-processing code to the staging area rather than moving the data. Specifically, we present the Active Spaces framework, which provides (1) programming support for defining the data-processing routines to be downloaded to the staging area, and (2) run-time mechanisms for transporting binary codes associated with these routines to the staging area, executing the routines on the nodes of the staging area, and returning the results. We also present an experimental performance evaluation of Active Spaces using applications running on the Cray XT5 at Oak Ridge National Laboratory. Finally, we use a coupled fusion application workflow to explore the trade-offs between transporting data and transporting the code required for data processing during coupling, and we characterize the sweet spots for each option.
KW - coupled simulations
KW - data-intensive application workflows
KW - dynamic code deployment
KW - in situ data processing
UR - http://www.scopus.com/inward/record.url?scp=80053222791&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2011.120
DO - 10.1109/IPDPS.2011.120
M3 - Conference contribution
AN - SCOPUS:80053222791
SN - 9780769543857
T3 - Proceedings - 25th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2011
SP - 758
EP - 769
BT - Proceedings - 25th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2011
T2 - 25th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2011
Y2 - 16 May 2011 through 20 May 2011
ER -