TY - GEN
T1 - Towards a universal classifier for crystallographic space groups
T2 - 17th Smoky Mountains Computational Sciences and Engineering Conference, SMC 2020
AU - Dash, Sajal
AU - Dasgupta, Archi
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2020.
PY - 2021
Y1 - 2021
N2 - Convergent Beam Electron Diffraction (CBED) images are 2D diffraction patterns created through the interaction between the fired electron and the atoms of a crystalline structure. Due to the absence of geometric mapping between three-dimensional structures and two-dimensional projections in this process, traditional image processing methods cannot classify CBED images into crystallographic space groups with high accuracy. The problem gets exacerbated by the class imbalance in the dataset. To effectively bridge the gaps in our understanding of solid-state crystalline structures, we must build a classifier capable of classifying diffraction patterns such as CBED images into crystallographic space groups while addressing the class imbalance. In this project, we explore the sources and nature of classification difficulties to gather insight into building a robust classifier. We first built some naive classifiers on the subset of classes by augmenting ResNet50 in various schemes. We developed a novel multi-level classification technique, called Trickle Down Classifier (TDC) to address the class imbalance in scientific datasets. TDC consists of multiple levels of subset classifiers. At each level, TDC trains a classifier to allocate the samples into a subset of classes. TDC forwards samples missed by a component classifier at a particular level to the next level classifier. For the top 20 classes, the TDC performs at an estimated 34%34% accuracy compared to a naive classifier’s 14%14% accuracy.
AB - Convergent Beam Electron Diffraction (CBED) images are 2D diffraction patterns created through the interaction between the fired electron and the atoms of a crystalline structure. Due to the absence of geometric mapping between three-dimensional structures and two-dimensional projections in this process, traditional image processing methods cannot classify CBED images into crystallographic space groups with high accuracy. The problem gets exacerbated by the class imbalance in the dataset. To effectively bridge the gaps in our understanding of solid-state crystalline structures, we must build a classifier capable of classifying diffraction patterns such as CBED images into crystallographic space groups while addressing the class imbalance. In this project, we explore the sources and nature of classification difficulties to gather insight into building a robust classifier. We first built some naive classifiers on the subset of classes by augmenting ResNet50 in various schemes. We developed a novel multi-level classification technique, called Trickle Down Classifier (TDC) to address the class imbalance in scientific datasets. TDC consists of multiple levels of subset classifiers. At each level, TDC trains a classifier to allocate the samples into a subset of classes. TDC forwards samples missed by a component classifier at a particular level to the next level classifier. For the top 20 classes, the TDC performs at an estimated 34%34% accuracy compared to a naive classifier’s 14%14% accuracy.
KW - Crystallographic space group
KW - Data imbalance
KW - Deep learning
KW - High-performance computing
UR - http://www.scopus.com/inward/record.url?scp=85107291549&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-63393-6_31
DO - 10.1007/978-3-030-63393-6_31
M3 - Conference contribution
AN - SCOPUS:85107291549
SN - 9783030633929
T3 - Communications in Computer and Information Science
SP - 465
EP - 478
BT - Driving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI - 17th Smoky Mountains Computational Sciences and Engineering Conference, SMC 2020, Revised Selected Papers
A2 - Nichols, Jeffrey
A2 - Maccabe, Arthur ‘Barney’
A2 - Parete-Koon, Suzanne
A2 - Verastegui, Becky
A2 - Hernandez, Oscar
A2 - Ahearn, Theresa
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 26 August 2020 through 28 August 2020
ER -