Abstract
Pathology reports are a main source of data for cancer surveillance programs. Manual coding of pathology reports is labor-intensive but necessary for obtaining labeled data to train automated information extraction systems. In this study, we investigated semi-supervised deep learning, improving the performance of a multitask information extraction system for automated annotation of pathology reports. We used a set of over 374,000 pathology reports from the Louisiana Tumor Registry and a novel convolutional attention-based auto-encoder. We performed a set of experiments comparing supervised training augmented with unlabeled data at 1%, 5%, 10%, and 50% of the original data size. We also compared the impact of extending text processing to include unlabeled tokens. We find that semi-supervised training consistently improved individual performance with increased micro-Averaged F-scores between 0.012 and 0.064 and increased macro-Averaged F-scores of up to 0.158. This demonstrates that semantic information learned via unsupervised learning can be used to improve supervised clinical task performance.
Original language | English |
---|---|
Title of host publication | 2019 IEEE EMBS International Conference on Biomedical and Health Informatics, BHI 2019 - Proceedings |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9781728108483 |
DOIs | |
State | Published - May 2019 |
Event | 2019 IEEE EMBS International Conference on Biomedical and Health Informatics, BHI 2019 - Chicago, United States Duration: May 19 2019 → May 22 2019 |
Publication series
Name | 2019 IEEE EMBS International Conference on Biomedical and Health Informatics, BHI 2019 - Proceedings |
---|
Conference
Conference | 2019 IEEE EMBS International Conference on Biomedical and Health Informatics, BHI 2019 |
---|---|
Country/Territory | United States |
City | Chicago |
Period | 05/19/19 → 05/22/19 |
Bibliographical note
Publisher Copyright:© 2019 IEEE.
Keywords
- Autoencoder
- Convolutional neural network
- Natural language processing
- Semi-supervised learning