Advanced data science toolkit for non-data scientists – A user guide

Jian Peng, Sangkeun Lee, Andrew Williams, J. Allen Haynes, Dongwon Shin

Research output: Contribution to journalArticlepeer-review

24 Scopus citations

Abstract

Emerging modern data analytics attracts much attention in materials research and shows great potential for enabling data-driven design. Data populated from the high-throughput CALPHAD approach enables researchers to better understand underlying mechanisms and to facilitate novel hypotheses generation, but the increasing volume of data makes the analysis extremely challenging. Herein, we introduce an easy-to-use, versatile, and open-source data analytics frontend, ASCENDS (Advanced data SCiENce toolkit for Non-Data Scientists), designed with the intent of accelerating data-driven materials research and development. The toolkit is also of value beyond materials science as it can analyze the correlation between input features and target values, train machine learning models, and make predictions from the trained surrogate models of any scientific dataset. Various algorithms implemented in ASCENDS allow users performing quantified correlation analyses and supervised machine learning to explore any datasets of interest without extensive computing and data science background. The detailed usage of ASCENDS is introduced with an example of experimental high-temperature alloy data.

Original languageEnglish
Article number101733
JournalCalphad: Computer Coupling of Phase Diagrams and Thermochemistry
Volume68
DOIs
StatePublished - Mar 2020

Funding

Research was sponsored by the Laboratory Directed Research and Development Program of Oak Ridge National Laboratory , managed by UT-Battelle, LLC, for the U. S. Department of Energy and the U. S. Department of Energy, Office of Energy Efficiency and Renewable Energy, Vehicle Technologies Office, Propulsion Materials Program. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725 . Notice:This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).Research was sponsored by the Laboratory Directed Research and Development Program of Oak Ridge National Laboratory, managed by UT-Battelle, LLC, for the U. S. Department of Energy and the U. S. Department of Energy, Office of Energy Efficiency and Renewable Energy, Vehicle Technologies Office, Propulsion Materials Program. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. Notice:This manuscript has been authored by UT-Battelle, LLC , under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan ( http://energy.gov/downloads/doe-public-access-plan ).

Keywords

  • Correlation analysis
  • Machine learning
  • Materials research
  • Modern data analytics
  • Neural network

Fingerprint

Dive into the research topics of 'Advanced data science toolkit for non-data scientists – A user guide'. Together they form a unique fingerprint.

Cite this