TY - GEN
T1 - Scalable Proximity-Based Methods for Large-Scale Analysis of Atom Probe Data
AU - Lu, Hao
AU - Seal, Sudip
AU - Poplawsky, Jonathan D.
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/7/2
Y1 - 2018/7/2
N2 - Powered by recent advances in data acquisition technologies, today's state-of-the-art atom probe microscopes yield data sets with sizes ranging from a few million atoms to billions of atoms. Analysis of these atomic data sets within reasonable turnaround times is a pressing data analysis challenge for material scientists currently equipped with software systems that do not scale to these massive data sets. Here, we present the shared memory component of a larger ongoing effort to develop a multi-feature data analysis framework capable of analyzing atom probe data of all sizes and scales from desktop multicore machines to large-scale high-performance computing platforms with hybrid (shared and distributed memory) architectures. Our focus here is on a broad class of popular atom probe data analysis methods that rely on core time-consuming k-NN queries. We present a scalable, heuristic algorithm for k-NN queries using three-dimensional range trees. To demonstrate its efficacy, the k-NN algorithm is integrated with two use cases of atom probe data analysis methods and the resulting analysis times are shown to speedup by over 20X on a 32-core Cray XC40 node using workloads up to 8 million atoms, which is already beyond the at-scale capabilities of existing atom probe software. Using this k-NN algorithm, we also introduce a novel parameter estimation method for a class of cluster finding methods, called friends-of-friends (FoF) methods, to completely bypass their expensive pre-processing steps. In each case, we validate the results on a variety of control data sets.
AB - Powered by recent advances in data acquisition technologies, today's state-of-the-art atom probe microscopes yield data sets with sizes ranging from a few million atoms to billions of atoms. Analysis of these atomic data sets within reasonable turnaround times is a pressing data analysis challenge for material scientists currently equipped with software systems that do not scale to these massive data sets. Here, we present the shared memory component of a larger ongoing effort to develop a multi-feature data analysis framework capable of analyzing atom probe data of all sizes and scales from desktop multicore machines to large-scale high-performance computing platforms with hybrid (shared and distributed memory) architectures. Our focus here is on a broad class of popular atom probe data analysis methods that rely on core time-consuming k-NN queries. We present a scalable, heuristic algorithm for k-NN queries using three-dimensional range trees. To demonstrate its efficacy, the k-NN algorithm is integrated with two use cases of atom probe data analysis methods and the resulting analysis times are shown to speedup by over 20X on a 32-core Cray XC40 node using workloads up to 8 million atoms, which is already beyond the at-scale capabilities of existing atom probe software. Using this k-NN algorithm, we also introduce a novel parameter estimation method for a class of cluster finding methods, called friends-of-friends (FoF) methods, to completely bypass their expensive pre-processing steps. In each case, we validate the results on a variety of control data sets.
KW - atom probe tomography
KW - k-NN search
KW - large-scale data analysis
KW - parallel search
KW - range trees
KW - spatial neighborhood queries
UR - http://www.scopus.com/inward/record.url?scp=85063139327&partnerID=8YFLogxK
U2 - 10.1109/HiPC.2018.00034
DO - 10.1109/HiPC.2018.00034
M3 - Conference contribution
AN - SCOPUS:85063139327
T3 - Proceedings - 25th IEEE International Conference on High Performance Computing, HiPC 2018
SP - 235
EP - 244
BT - Proceedings - 25th IEEE International Conference on High Performance Computing, HiPC 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 25th IEEE International Conference on High Performance Computing, HiPC 2018
Y2 - 17 December 2018 through 20 December 2018
ER -