TY - GEN
T1 - DataSpaces
T2 - 19th ACM International Symposium on High Performance Distributed Computing, HPDC 2010
AU - Docan, Ciprian
AU - Parashar, Manish
AU - Klasky, Scott
PY - 2010
Y1 - 2010
N2 - Emerging high-performance distributed computing environments are enabling new end-to-end formulations in science and engineering that involve multiple interacting processes and data-intensive application workflows. For example, current fusion simulation efforts are exploring coupled models and codes that simultaneously simulate separate application processes, such as the core and the edge turbulence, and run on different high performance computing resources. These components need to interact, at runtime, with each other and with services for data monitoring, data analysis and visualization, and data archiving. As a result, they require efficient support for dynamic and flexible couplings and interactions, which remains a challenge. This paper presents Data-Spaces, a flexible interaction and coordination substrate that addresses this challenge. DataSpaces essentially implements a semantically specialized virtual shared space abstraction that can be associatively accessed by all components and services in the application workflow. It enables live data to be extracted from running simulation components, indexes this data online, and then allows it to be monitored, queried and accessed by other components and services via the space using semantically meaningful operators. The underlying data transport is asynchronous, low-overhead and largely memory-to-memory. The design, implementation, and experimental evaluation of DataSpaces using a coupled fusion simulation workflow is presented.
AB - Emerging high-performance distributed computing environments are enabling new end-to-end formulations in science and engineering that involve multiple interacting processes and data-intensive application workflows. For example, current fusion simulation efforts are exploring coupled models and codes that simultaneously simulate separate application processes, such as the core and the edge turbulence, and run on different high performance computing resources. These components need to interact, at runtime, with each other and with services for data monitoring, data analysis and visualization, and data archiving. As a result, they require efficient support for dynamic and flexible couplings and interactions, which remains a challenge. This paper presents Data-Spaces, a flexible interaction and coordination substrate that addresses this challenge. DataSpaces essentially implements a semantically specialized virtual shared space abstraction that can be associatively accessed by all components and services in the application workflow. It enables live data to be extracted from running simulation components, indexes this data online, and then allows it to be monitored, queried and accessed by other components and services via the space using semantically meaningful operators. The underlying data transport is asynchronous, low-overhead and largely memory-to-memory. The design, implementation, and experimental evaluation of DataSpaces using a coupled fusion simulation workflow is presented.
KW - Code coupling
KW - Data redistribution
KW - I/O
KW - RDMA
KW - Workflows
UR - http://www.scopus.com/inward/record.url?scp=78649984950&partnerID=8YFLogxK
U2 - 10.1145/1851476.1851481
DO - 10.1145/1851476.1851481
M3 - Conference contribution
AN - SCOPUS:78649984950
SN - 9781605589428
T3 - HPDC 2010 - Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
SP - 25
EP - 36
BT - HPDC 2010 - Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Y2 - 21 June 2010 through 25 June 2010
ER -