Coarse-to-fine multi-task training of convolutional neural networks for automated information extraction from cancer pathology reports

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

18 Scopus citations

Abstract

Information extraction and coding of free-text pathology reports is an important activity for cancer registries to support national cancer surveillance. Cancer registrars must process high volumes of pathology reports on an annual basis. In this study, we investigated an automated approach using a coarse-to-fine training of convolutional neural networks (CNNs) for extracting the primary site, histological grade and laterality from unstructured cancer pathology text reports. Our proposed training scheme consists of two stages. In the first stage, the multi-task learning (MTL) with hard parameter sharing approach is used to train a multi-task MT-CNN model for all the tasks. Then, the TM-CNN model parameters are used to initialize a CNN model for each task to be fine trained individually using its corresponding dataset. The performance of our proposed approach was compared against a state-of-the-art CNN and the commonly used SVM classifier. We observed that the proposed model consistently outperformed the base line models, especially for the less prevalent classes. Specifically, the proposed training approach achieved a micro-F score of 0.7749 over 12 ICD-O-3 topography codes which is a significant improvement as compared with state-of-the-art CNN (0.7101) and the SVM (0.6019) classifiers. Also, the results demonstrate the potential of the proposed method for handling class imbalance within each task. It significantly improves macro-F score by 24% and 12% of the primary site and histology grade tasks, respectively.

Original languageEnglish
Title of host publication2018 IEEE EMBS International Conference on Biomedical and Health Informatics, BHI 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages218-221
Number of pages4
ISBN (Electronic)9781538624050
DOIs
StatePublished - Apr 6 2018
Event2018 IEEE EMBS International Conference on Biomedical and Health Informatics, BHI 2018 - Las Vegas, United States
Duration: Mar 4 2018Mar 7 2018

Publication series

Name2018 IEEE EMBS International Conference on Biomedical and Health Informatics, BHI 2018
Volume2018-January

Conference

Conference2018 IEEE EMBS International Conference on Biomedical and Health Informatics, BHI 2018
Country/TerritoryUnited States
CityLas Vegas
Period03/4/1803/7/18

Funding

ACKNOWLEDGMENT This work has been supported in part by the Joint Design of Advanced Computing Solutions (JDASC4C) program established by the U.S. Department of Energy (DOE) and the National Cancer Institute (NCI) of the National Institutes of Health. The authors wish to thank Valentina Petkov of the Surveillance Research Program from the National Cancer Institute and the SEER registries at HI, KY, CT, NM and Seattle for the de-identified pathology reports used in this investigation. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S., Department of Energy under Contract No. DE-AC05-00OR22725.

FundersFunder number
National Institutes of Health
U.S. Department of EnergyDE-AC05-00OR22725
National Cancer Institute
Office of Science

    Fingerprint

    Dive into the research topics of 'Coarse-to-fine multi-task training of convolutional neural networks for automated information extraction from cancer pathology reports'. Together they form a unique fingerprint.

    Cite this