Abstract
We present algorithmic improvements to the loading operations of certain reduced data ensembles produced from neutron scattering experiments at Oak Ridge National Laboratory (ORNL) facilities. Ensembles from multiple measurements are required to cover a wide range of the phase space of a sample material of interest. They are stored using the standard NeXus schema on individual HDF5 files. This makes it a scalability challenge, as the number of experiments stored increases in a single ensemble file. The present work follows up on our previous efforts on data management algorithms, to address identified input output (I/O) bottlenecks in Mantid, an open-source data analysis framework used across several neutron science facilities around the world. We reuse an in-memory binary-tree metadata index that resembles data access patterns, to provide a scalable search and extraction mechanism. In addition, several memory operations are refactored and optimized for the current common use cases, ranging most frequently from 10 to 180, and up to 360 separate measurement configurations. Results from this work show consistent speed ups in wall-clock time on the Mantid LoadMD routine, ranging from 19% to 23% on average, on ORNL production computing systems. The latter depends on the complexity of the targeted instrument-specific data and the system I/O and compute variability for the shared computational resources available to users of ORNL's Spallation Neutron Source (SNS) and the High Flux Isotope Reactor (HFIR) instruments. Nevertheless, we continue to highlight the need for more research to address reduction challenges as experimental data volumes, user time and processing costs increase.
Original language | English |
---|---|
Title of host publication | Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021 |
Editors | Yixin Chen, Heiko Ludwig, Yicheng Tu, Usama Fayyad, Xingquan Zhu, Xiaohua Tony Hu, Suren Byna, Xiong Liu, Jianping Zhang, Shirui Pan, Vagelis Papalexakis, Jianwu Wang, Alfredo Cuzzocrea, Carlos Ordonez |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 2949-2955 |
Number of pages | 7 |
ISBN (Electronic) | 9781665439022 |
DOIs | |
State | Published - 2021 |
Event | 2021 IEEE International Conference on Big Data, Big Data 2021 - Virtual, Online, United States Duration: Dec 15 2021 → Dec 18 2021 |
Publication series
Name | Proceedings - 2021 IEEE International Conference on Big Data, Big Data 2021 |
---|
Conference
Conference | 2021 IEEE International Conference on Big Data, Big Data 2021 |
---|---|
Country/Territory | United States |
City | Virtual, Online |
Period | 12/15/21 → 12/18/21 |
Funding
ACKNOWLEDGMENT We would like to thank B. Ueland, B. Li, R. McQueeney, and T. Han for providing the data for testing and benchmarking purposes. We would like to acknowledge Dr. Chen Zhang for providing feedback to improve the quality of the manuscript. This manuscript has been authored by UT-Battelle, LLC under Contract No. DEAC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a nonexclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/ doe-public-access-plan). Work at Oak Ridge National Laboratory was sponsored by the Division of Scientific User Facilities, Office of Basic Energy Sciences, US Department of Energy, under Contract no. DE-AC05-00OR22725 with UT-Battelle, LLC.
Keywords
- HDF5
- Mantid
- NeXus
- experimental data
- indexing
- meta-data
- neutron scattering
- reduction
- workflows