Efficient graph representation framework for chemical molecule similarity tasks

Jiaji Ma, Seung Hwan Lim

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Graph data has emerged in numerous scientific domains and machine learning techniques have been widely used for analysis and learning of diverse data for prediction and decision. Machine learning techniques can readily address complex problems by leveraging their structural information. But graphs cannot be directly used for existing machine learning algorithms unless encoded as vectors. The problem of efficient representation of graphs is a substantial challenge in graph machine learning. In this paper, we propose a novel two-stage framework for the representation of chemical molecule graphs based on the strengths of Graph Isomorphism Networks (GINs) and Siamese autoencoders. In the first stage, the GIN model is constructed and trained using the structural information of chemical molecule graphs. Node attributes, edge attributes, and edge indices are used as input data, while graph attributes are used as labels. The GIN model effectively captures the structural characteristics of graphs and can accurately predict graph attributes, i.e., molecular properties. It also generates Graph Embeddings, represented as vectors that encode the structural information of graphs. In the second stage, Graph Embedding vectors are further optimized for downstream similarity tasks while preserving the graph structural information. The Siamese autoencoder is constructed and trained, which reduces the dimensionality of the Graph Embedding vectors, while maximizing the preservation of structural information in the original high-dimensional vectors. The resulting low-dimensional Graph Embeddings can be effectively utilized for tasks such as approximate nearest neighbor search. The experimental results demonstrate the effectiveness of our proposed framework in accurately predicting graph similarity.

Original languageEnglish
Title of host publicationProceedings - 2023 IEEE 6th International Conference on Artificial Intelligence and Knowledge Engineering, AIKE 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages113-120
Number of pages8
ISBN (Electronic)9798350331288
DOIs
StatePublished - 2023
Event6th IEEE International Conference on Artificial Intelligence and Knowledge Engineering, AIKE 2023 - Laguna Hills, United States
Duration: Sep 25 2023Sep 27 2023

Publication series

NameProceedings - 2023 IEEE 6th International Conference on Artificial Intelligence and Knowledge Engineering, AIKE 2023

Conference

Conference6th IEEE International Conference on Artificial Intelligence and Knowledge Engineering, AIKE 2023
Country/TerritoryUnited States
CityLaguna Hills
Period09/25/2309/27/23

Funding

This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

Keywords

  • Autoencoder
  • Graph Neural Network
  • Graph representation learning
  • Similarity Learning

Fingerprint

Dive into the research topics of 'Efficient graph representation framework for chemical molecule similarity tasks'. Together they form a unique fingerprint.

Cite this