Development of message passing-based graph convolutional networks for classifying cancer pathology reports

Hong Jun Yoon, Hilda B. Klasky, Andrew E Blanchard, J. Blair Christian, Eric B Durbin, Xiao Cheng Wu, Antoinette Stroup, Jennifer Doherty, Linda Coyle, Lynne Penberthy, Georgia D Tourassi

Research output: Contribution to journalArticlepeer-review

Abstract

Background: Applying graph convolutional networks (GCN) to the classification of free-form natural language texts leveraged by graph-of-words features (TextGCN) was studied and confirmed to be an effective means of describing complex natural language texts. However, the text classification models based on the TextGCN possess weaknesses in terms of memory consumption and model dissemination and distribution. In this paper, we present a fast message passing network (FastMPN), implementing a GCN with message passing architecture that provides versatility and flexibility by allowing trainable node embedding and edge weights, helping the GCN model find the better solution. We applied the FastMPN model to the task of clinical information extraction from cancer pathology reports, extracting the following six properties: main site, subsite, laterality, histology, behavior, and grade. Results: We evaluated the clinical task performance of the FastMPN models in terms of micro- and macro-averaged F1 scores. A comparison was performed with the multi-task convolutional neural network (MT-CNN) model. Results show that the FastMPN model is equivalent to or better than the MT-CNN. Conclusions: Our implementation revealed that our FastMPN model, which is based on the PyTorch platform, can train a large corpus (667,290 training samples) with 202,373 unique words in less than 3 minutes per epoch using one NVIDIA V100 hardware accelerator. Our experiments demonstrated that using this implementation, the clinical task performance scores of information extraction related to tumors from cancer pathology reports were highly competitive.

Original languageEnglish
Article number262
JournalBMC Medical Informatics and Decision Making
Volume24
Issue numberSuppl 5
DOIs
StatePublished - Dec 2024

Funding

The Utah Cancer Registry is funded by the NCI\u2019s SEER Program, Contract No. HHSN261201800016I, and the NPCR, Cooperative Agreement No. NU58DP0063200, with additional support from the University of Utah and Huntsman Cancer Foundation. This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the US Department of Energy (DOE) Office of Science and the National Nuclear Security Administration. This manuscript has been authored by UT-Battelle LLC under Contract No. DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of the manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). This article has been published as part of BMC Medical Informatics and Decision Making Volume 24 Supplement 5, 2024: Fifth and Sixth Computational Approaches for Cancer Workshop. The full contents of the supplement are available online at https://bmcmedinformdecismak.biomedcentral.com/articles/supplements/volume-24-supplement-5. New Jersey State Cancer Registry data were collected using funding from NCI and the SEER) Program (HHSN261201300021I, the (NPCR (NU58DP006279-02-00), and the State of New Jersey and the Rutgers Cancer Institute of New Jersey. This work has been supported in part by the Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) program established by DOE and the NCI of the National Institutes of Health. This work was performed under the auspices of DOE by Argonne National Laboratory under Contract DE-AC02-06-CH11357, Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344, Los Alamos National Laboratory under Contract DE-AC5206NA25396, and Oak Ridge National Laboratory under Contract DE-AC05-00OR22725. Kentucky Cancer Registry data were collected with funding from the NCI SEER Program (HHSN261201800013I), the CDC National Program of Cancer Registries (NPCR) (U58DP00003907) and the Commonwealth of Kentucky. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the DOE Office of Science under Contract No. DE-AC05-00OR22725. Louisiana Tumor Registry data were collected using funding from NCI and the SEER Program (HHSN261201800007I), the NPCR (NU58DP006332-02-00), ands the State of Louisiana. This manuscript has been authored by UT-Battelle LLC under Contract No. DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of the manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan ( http://energy.gov/downloads/doe-public-access-plan ).

FundersFunder number
University of Utah and Huntsman Cancer Foundation
National Nuclear Security Administration
DOE Public Access Plan
Rutgers Cancer Institute of New Jersey
National Institutes of Health
U.S. Department of Energy
CDC National Program of Cancer Registries
Office of Science
State of New Jersey
Lawrence Livermore National LaboratoryDE-AC52-07NA27344
Lawrence Livermore National Laboratory
NPCRNU58DP006332-02-00, U58DP00003907, NU58DP0063200, NU58DP006279-02-00
Argonne National LaboratoryDE-AC02-06-CH11357
Argonne National Laboratory
UT-BattelleDE-AC05-00OR22725
UT-Battelle
Oak Ridge National LaboratoryDE-AC05-00OR22725
Oak Ridge National Laboratory
Los Alamos National LaboratoryDE-AC5206NA25396
Los Alamos National Laboratory
National Cancer InstituteHHSN261201300021I
National Cancer Institute

    Keywords

    • Cancer pathology reports
    • Deep learning
    • Graph
    • Graph convolutional networks
    • Graph of words
    • Information extraction
    • Message passing networks
    • Natural language processing

    Fingerprint

    Dive into the research topics of 'Development of message passing-based graph convolutional networks for classifying cancer pathology reports'. Together they form a unique fingerprint.

    Cite this