Towards a universal classifier for crystallographic space groups: A trickle-down approach to handle data imbalance

Sajal Dash, Archi Dasgupta

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Convergent Beam Electron Diffraction (CBED) images are 2D diffraction patterns created through the interaction between the fired electron and the atoms of a crystalline structure. Due to the absence of geometric mapping between three-dimensional structures and two-dimensional projections in this process, traditional image processing methods cannot classify CBED images into crystallographic space groups with high accuracy. The problem gets exacerbated by the class imbalance in the dataset. To effectively bridge the gaps in our understanding of solid-state crystalline structures, we must build a classifier capable of classifying diffraction patterns such as CBED images into crystallographic space groups while addressing the class imbalance. In this project, we explore the sources and nature of classification difficulties to gather insight into building a robust classifier. We first built some naive classifiers on the subset of classes by augmenting ResNet50 in various schemes. We developed a novel multi-level classification technique, called Trickle Down Classifier (TDC) to address the class imbalance in scientific datasets. TDC consists of multiple levels of subset classifiers. At each level, TDC trains a classifier to allocate the samples into a subset of classes. TDC forwards samples missed by a component classifier at a particular level to the next level classifier. For the top 20 classes, the TDC performs at an estimated 34%34% accuracy compared to a naive classifier’s 14%14% accuracy.

Original languageEnglish
Title of host publicationDriving Scientific and Engineering Discoveries Through the Convergence of HPC, Big Data and AI - 17th Smoky Mountains Computational Sciences and Engineering Conference, SMC 2020, Revised Selected Papers
EditorsJeffrey Nichols, Arthur ‘Barney’ Maccabe, Suzanne Parete-Koon, Becky Verastegui, Oscar Hernandez, Theresa Ahearn
PublisherSpringer Science and Business Media Deutschland GmbH
Pages465-478
Number of pages14
ISBN (Print)9783030633929
DOIs
StatePublished - 2021
Externally publishedYes
Event17th Smoky Mountains Computational Sciences and Engineering Conference, SMC 2020 - Virtual, Online
Duration: Aug 26 2020Aug 28 2020

Publication series

NameCommunications in Computer and Information Science
Volume1315 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference17th Smoky Mountains Computational Sciences and Engineering Conference, SMC 2020
CityVirtual, Online
Period08/26/2008/28/20

Keywords

  • Crystallographic space group
  • Data imbalance
  • Deep learning
  • High-performance computing

Fingerprint

Dive into the research topics of 'Towards a universal classifier for crystallographic space groups: A trickle-down approach to handle data imbalance'. Together they form a unique fingerprint.

Cite this