Abstract
Convolutional Neural Networks (CNN) have recently demonstrated effective performance in many Natural Language Processing tasks. In this study, we explore a novel approach for pruning a CNN's convolution filters using our new data-driven utility score. We have applied this technique to an information extraction task of classifying a dataset of cancer pathology reports by cancer type, a highly imbalanced dataset. Compared to standard CNN training, our new algorithm resulted in a nearly.07 increase in the micro-averaged F1-score and a strong.22 increase in the macro-averaged F1-score using a model with nearly a third fewer network weights. We show how directly utilizing a network's interpretation of data can result in strong performance gains, particularly with severely imbalanced datasets.
Original language | English |
---|---|
Title of host publication | 2018 IEEE EMBS International Conference on Biomedical and Health Informatics, BHI 2018 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 345-348 |
Number of pages | 4 |
ISBN (Electronic) | 9781538624050 |
DOIs | |
State | Published - Apr 6 2018 |
Event | 2018 IEEE EMBS International Conference on Biomedical and Health Informatics, BHI 2018 - Las Vegas, United States Duration: Mar 4 2018 → Mar 7 2018 |
Publication series
Name | 2018 IEEE EMBS International Conference on Biomedical and Health Informatics, BHI 2018 |
---|---|
Volume | 2018-January |
Conference
Conference | 2018 IEEE EMBS International Conference on Biomedical and Health Informatics, BHI 2018 |
---|---|
Country/Territory | United States |
City | Las Vegas |
Period | 03/4/18 → 03/7/18 |
Funding
This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a nonexclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of the manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). This work has been supported in part by the Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) program established by the U.S. Department of Energy (DOE) and the National Cancer Institute (NCI) of National Institutes of Health. This work was performed under the auspices of the U.S. Department of Energy by Argonne National Laboratory under Contract DE-AC02-06-CH11357, Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344, Los Alamos National Laboratory under Contract DE-AC5206NA25396, and Oak Ridge National Laboratory under Contract DE-AC05-00OR22725.