Abstract
Scientific data analytics in high-performance computing environments has been evolving along with the advancement of computing capabilities. With the onset of exascale computing, the increasing gap between compute performance and I/O bandwidth has rendered the traditional post-simulation processing a tedious process. Despite the challenges due to increased data production, there exists an opportunity to benefit from “cheap” computing power to perform query-driven exploration and visualization during simulation time. To accelerate such analyses, applications traditionally augment, post-simulation, raw data with large indexes, which are then repeatedly utilized for data exploration. However, the generation of current state-of-the-art indexes involves a compute- and memory-intensive processing, thus rendering them inapplicable in an in situ context. In this paper we propose DIRAQ, a parallel in situ, in network data encoding and reorganization technique that enables the transformation of simulation output into a query-efficient form, with negligible runtime overhead to the simulation run. DIRAQ’s effective core-local, precision-based encoding approach incorporates an embedded compressed index that is 3–6× smaller than current state-of-the-art indexing schemes. Its data-aware index adjustmentation improves performance of group-level index layout creation by up to 35 % and reduces the size of the generated index by up to 27 %. Moreover, DIRAQ’s in network index merging strategy enables the creation of aggregated indexes that speed up spatial-context query responses by up to 10× versus alternative techniques. DIRAQ’s topology-, data-, and memory-aware aggregation strategy results in efficient I/O and yields overall end-to-end encoding and I/O time that is less than that required to write the raw data with MPI collective I/O.
Original language | English |
---|---|
Pages (from-to) | 1101-1119 |
Number of pages | 19 |
Journal | Cluster Computing |
Volume | 17 |
Issue number | 4 |
DOIs | |
State | Published - Nov 15 2014 |
Funding
Acknowledgments We would like to thank the FLASH Center for Computational Science at the University of Chicago for providing access to the FLASH simulation code and both the FLASH and S3D teams for providing access to the related datasets. We would like to acknowledge the use of resources at the Leadership Computing Facil- ities at Argonne National Laboratory and Oak Ridge National Laboratory, ALCF and OLCF respectively. Oak Ridge National Laboratory is managed by UT-Battelle for the LLC U.S. D.O.E. under Contract DEAC05-00OR22725. This work was supported in part by the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research (SDAVI Institute and RSVP Project) and the U.S. National Science Foundation (Expeditions in Computing and EAGER programs). The work of MEP and VV was supported by the DOE Contract DE-AC02-06CH11357.
Keywords
- Compression
- Exascale computing
- Indexing
- Query processing