Semi-supervised information extraction for cancer pathology reports

John X. Qiu, Shang Gao, Mohammed Alawad, Noah Schaefferkoetter, Folami Alamudun, Hong Jun Yoon, Xiao Cheng Wu, Georgia Tourassi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

Pathology reports are a main source of data for cancer surveillance programs. Manual coding of pathology reports is labor-intensive but necessary for obtaining labeled data to train automated information extraction systems. In this study, we investigated semi-supervised deep learning, improving the performance of a multitask information extraction system for automated annotation of pathology reports. We used a set of over 374,000 pathology reports from the Louisiana Tumor Registry and a novel convolutional attention-based auto-encoder. We performed a set of experiments comparing supervised training augmented with unlabeled data at 1%, 5%, 10%, and 50% of the original data size. We also compared the impact of extending text processing to include unlabeled tokens. We find that semi-supervised training consistently improved individual performance with increased micro-Averaged F-scores between 0.012 and 0.064 and increased macro-Averaged F-scores of up to 0.158. This demonstrates that semantic information learned via unsupervised learning can be used to improve supervised clinical task performance.

Original languageEnglish
Title of host publication2019 IEEE EMBS International Conference on Biomedical and Health Informatics, BHI 2019 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781728108483
DOIs
StatePublished - May 2019
Event2019 IEEE EMBS International Conference on Biomedical and Health Informatics, BHI 2019 - Chicago, United States
Duration: May 19 2019May 22 2019

Publication series

Name2019 IEEE EMBS International Conference on Biomedical and Health Informatics, BHI 2019 - Proceedings

Conference

Conference2019 IEEE EMBS International Conference on Biomedical and Health Informatics, BHI 2019
Country/TerritoryUnited States
CityChicago
Period05/19/1905/22/19

Bibliographical note

Publisher Copyright:
© 2019 IEEE.

Keywords

  • Autoencoder
  • Convolutional neural network
  • Natural language processing
  • Semi-supervised learning

Fingerprint

Dive into the research topics of 'Semi-supervised information extraction for cancer pathology reports'. Together they form a unique fingerprint.

Cite this