TY - GEN
T1 - A scalable messaging system for accelerating discovery from large scale scientific simulations
AU - Jin, Tong
AU - Zhang, Fan
AU - Parashar, Manish
AU - Klasky, Scott
AU - Podhorszki, Norbert
AU - Abbasi, Hasan
PY - 2012
Y1 - 2012
N2 - Emerging scientific and engineering simulations running at scale on leadership-class High End Computing (HEC) environments are producing large volumes of data, which has to be transported and analyzed before any insights can result from these simulations. The complexity and cost (in terms of time and energy) associated with managing and analyzing this data have become significant challenges, and are limiting the impact of these simulations. Recently, data-staging approaches along with in-situ and in-transit analytics have been proposed to address these challenges by offloading I/O and/or moving data processing closer to the data. However, scientists continue to be overwhelmed by the large data volumes and data rates. In this paper we address this latter challenge. Specifically, we propose a highly scalable and low-overhead associative messaging framework that runs on the data staging resources within the HEC platform, and builds on the staging-based online in-situ/in-transit analytics to provide publish/subscribe/notification-type messaging patterns to the scientist. Rather than having to ingest and inspect the data volumes, this messaging system allows scientists to (1) dynamically subscribe to data events of interest, e.g., simple data values or a complex function or simple reduction (max()/min()/avg()) of the data values in a certain region of the application domain is greater/less than a threshold value, or certain spatial/temporal data features or data patterns are detected; (2) define customized in-situ/in-transit actions that are triggered based on the events, such as data visualization or transformation; and (3) get notified when these events occur. The key contribution of this paper is a design and implementation that can support such a messaging abstraction at scale on high-end computing (HEC) systems with minimal overheads. We have implemented and deployed the messaging system on the Jaguar Cray XK6 machines at Oak Ridge National Laboratory and the Lonestar system at the Texas Advanced Computing Center (TACC), and we present the experimental performance evaluation using these HEC platforms in the paper.
AB - Emerging scientific and engineering simulations running at scale on leadership-class High End Computing (HEC) environments are producing large volumes of data, which has to be transported and analyzed before any insights can result from these simulations. The complexity and cost (in terms of time and energy) associated with managing and analyzing this data have become significant challenges, and are limiting the impact of these simulations. Recently, data-staging approaches along with in-situ and in-transit analytics have been proposed to address these challenges by offloading I/O and/or moving data processing closer to the data. However, scientists continue to be overwhelmed by the large data volumes and data rates. In this paper we address this latter challenge. Specifically, we propose a highly scalable and low-overhead associative messaging framework that runs on the data staging resources within the HEC platform, and builds on the staging-based online in-situ/in-transit analytics to provide publish/subscribe/notification-type messaging patterns to the scientist. Rather than having to ingest and inspect the data volumes, this messaging system allows scientists to (1) dynamically subscribe to data events of interest, e.g., simple data values or a complex function or simple reduction (max()/min()/avg()) of the data values in a certain region of the application domain is greater/less than a threshold value, or certain spatial/temporal data features or data patterns are detected; (2) define customized in-situ/in-transit actions that are triggered based on the events, such as data visualization or transformation; and (3) get notified when these events occur. The key contribution of this paper is a design and implementation that can support such a messaging abstraction at scale on high-end computing (HEC) systems with minimal overheads. We have implemented and deployed the messaging system on the Jaguar Cray XK6 machines at Oak Ridge National Laboratory and the Lonestar system at the Texas Advanced Computing Center (TACC), and we present the experimental performance evaluation using these HEC platforms in the paper.
KW - associative messaging system
KW - data staging
KW - in-situ/in-transit analytics
KW - publish/subscribe
UR - http://www.scopus.com/inward/record.url?scp=84880260641&partnerID=8YFLogxK
U2 - 10.1109/HiPC.2012.6507512
DO - 10.1109/HiPC.2012.6507512
M3 - Conference contribution
AN - SCOPUS:84880260641
SN - 9781467323703
T3 - 2012 19th International Conference on High Performance Computing, HiPC 2012
BT - 2012 19th International Conference on High Performance Computing, HiPC 2012
T2 - 2012 19th International Conference on High Performance Computing, HiPC 2012
Y2 - 18 December 2012 through 21 December 2012
ER -