ChemML: A machine learning and informatics program package for the analysis, mining, and modeling of chemical and materials data

Mojtaba Haghighatlari, Gaurav Vishwakarma, Doaa Altarawy, Ramachandran Subramanian, Bhargava U. Kota, Aditya Sonpal, Srirangaraj Setlur, Johannes Hachmann

Research output: Contribution to journalArticlepeer-review

48 Scopus citations

Abstract

ChemML is an open machine learning (ML) and informatics program suite that is designed to support and advance the data-driven research paradigm that is currently emerging in the chemical and materials domain. ChemML allows its users to perform various data science tasks and execute ML workflows that are adapted specifically for the chemical and materials context. Key features are automation, general-purpose utility, versatility, and user-friendliness in order to make the application of modern data science a viable and widely accessible proposition in the broader chemistry and materials community. ChemML is also designed to facilitate methodological innovation, and it is one of the cornerstones of the software ecosystem for data-driven in silico research. This article is categorized under: Software > Simulation Methods Computer and Information Science > Chemoinformatics Structure and Mechanism > Computational Materials Science Software > Molecular Modeling.

Original languageEnglish
Article numbere1458
JournalWiley Interdisciplinary Reviews: Computational Molecular Science
Volume10
Issue number4
DOIs
StatePublished - Jul 1 2020
Externally publishedYes

Funding

This work is supported by the National Science Foundation (NSF) CAREER program (grant No. OAC‐1751161), and the New York State Center of Excellence in Materials Informatics (grants No. CMI‐1140384 and CMI‐1148092). Early work on was supported by start‐up funds provided through the University at Buffalo (UB). The deep eutectic solvent application study was funded by the Army Armament Research, Development and Engineering Center (ARDEC) SBIR program (grant No. W15QKN‐17‐C‐0078), and solubility parameter work by Toyota Motor Engineering and Manufacturing North America. is interfaced with the Open Chemistry platform and the MaDE@UB toolkit, and these efforts are supported by the Department of Energy SBIR program (grant No. DE‐SC0017193) and the NSF DIBBs program (grant No. OAC‐1640867), respectively. The DIBBs grant also funded the implementation of several methods of particular interest for MaDE@UB into , such as the Magpie library, the meta data parser, and standard DNNs using Keras. Computing time on the high‐performance computing clusters “,” “,” “,” and “” was provided by the UB Center for Computational Research (CCR). The work presented in this paper is a central part of M.H.'s PhD thesis. M.H. gratefully acknowledges support by Phase‐I and Phase‐II Software Fellowships (grant No. ACI‐1547580‐479590) of the NSF Molecular Sciences Software Institute (grant No. ACI‐1547580) at Virginia Tech. We thank the other members—past and present—of the Hachmann group as well as Profs. Venugopal Govindaraju and Krishna Rajan (both UB) for valuable discussions and insights that have helped guide the development of . ChemML ChemML ChemML Rush Alpha Beta Gamma ChemML Armament Research, Development and Engineering Center, Grant/Award Number: W15QKN‐17‐C‐0078; National Science Foundation, Grant/Award Numbers: ACI‐1547580, OAC‐1640867, OAC‐1751161; New York Center of Excellence in Materials Informatics, Grant/Award Numbers: CMI‐1140384, CMI‐1148092; Office of Science, Grant/Award Number: DE‐SC0017193 Funding information This work is supported by the National Science Foundation (NSF) CAREER program (grant No. OAC-1751161), and the New York State Center of Excellence in Materials Informatics (grants No. CMI-1140384 and CMI-1148092). Early work on ChemML was supported by start-up funds provided through the University at Buffalo (UB). The deep eutectic solvent application study was funded by the Army Armament Research, Development and Engineering Center (ARDEC) SBIR program (grant No. W15QKN-17-C-0078), and solubility parameter work by Toyota Motor Engineering and Manufacturing North America. ChemML is interfaced with the Open Chemistry platform and the MaDE@UB toolkit, and these efforts are supported by the Department of Energy SBIR program (grant No. DE-SC0017193) and the NSF DIBBs program (grant No. OAC-1640867), respectively. The DIBBs grant also funded the implementation of several methods of particular interest for MaDE@UB into ChemML, such as the Magpie library, the meta data parser, and standard DNNs using Keras. Computing time on the high-performance computing clusters ?Rush,? ?Alpha,? ?Beta,? and ?Gamma? was provided by the UB Center for Computational Research (CCR). The work presented in this paper is a central part of M.H.'s PhD thesis. M.H. gratefully acknowledges support by Phase-I and Phase-II Software Fellowships (grant No. ACI-1547580-479590) of the NSF Molecular Sciences Software Institute (grant No. ACI-1547580) at Virginia Tech. We thank the other members?past and present?of the Hachmann group as well as Profs. Venugopal Govindaraju and Krishna Rajan (both UB) for valuable discussions and insights that have helped guide the development of ChemML.

FundersFunder number
Department of Energy SBIRACI‐1547580‐479590, DE‐SC0017193
NSF Molecular Sciences Software Institute
New York Center of Excellence in Materials Informatics
New York State Center of Excellence in Materials InformaticsCMI‐1140384, CMI‐1148092
UB Center for Computational Research
National Science Foundation1640867, OAC‐1640867, OAC‐1751161, 1751161, ACI‐1547580
Office of Science
University at Buffalo
Armament Research, Development and Engineering CenterW15QKN‐17‐C‐0078
Toyota Motor Engineering and Manufacturing North America

    Keywords

    • data science
    • data-driven research
    • informatics
    • machine learning
    • program package

    Fingerprint

    Dive into the research topics of 'ChemML: A machine learning and informatics program package for the analysis, mining, and modeling of chemical and materials data'. Together they form a unique fingerprint.

    Cite this