Abstract
Individual electronic health records (EHRs) and clinical reports are often part of a larger sequence—for example, a single patient may generate multiple reports over the trajectory of a disease. In applications such as cancer pathology reports, it is necessary not only to extract information from individual reports, but also to capture aggregate information regarding the entire cancer case based off case-level context from all reports in the sequence. In this paper, we introduce a simple modular add-on for capturing case-level context that is designed to be compatible with most existing deep learning architectures for text classification on individual reports. We test our approach on a corpus of 431,433 cancer pathology reports, and we show that incorporating case-level context significantly boosts classification accuracy across six classification tasks—site, subsite, laterality, histology, behavior, and grade. We expect that with minimal modifications, our add-on can be applied towards a wide range of other clinical text-based tasks.
Original language | English |
---|---|
Article number | e0232840 |
Journal | PLoS ONE |
Volume | 15 |
Issue number | 5 |
DOIs | |
State | Published - May 2020 |
Funding
Georgia Tourassi (GT) at the Oak Ridge National Laboratory received funding from the Department of Energy (energy.gov) and the National Cancer Institute (cancer.gov). The grant number is 2450-Z301-19. These funds were used to facilitate this study. The provided funding via this grant was used to support of salaries for SG, MA, NS, AR, and GT. In addition to the grant, the National Cancer Institute (NCI) employs or provides funding for authors from NCI (LP), state registries (XCW, and EBD) and Information Management Services Inc (LC) as part of the Surveillance, Epidemiology, and End Results program and authorized their participation in this study. Their efforts included data collection, cleaning, analysis or final review of this study. This work has been supported in part by the Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) program established by the U. S. Department of Energy (DOE) and the National Cancer Institute (NCI) of the National Institutes of Health. This work was performed under the auspices of the U.S. Department of Energy by Argonne National Laboratory under Contract DEAC02-06-CH11357, Lawrence Livermore National Laboratory under Contract DEAC52-07NA27344, Los Alamos National Laboratory under Contract DE-AC5206NA25396, and Oak Ridge National Laboratory under Contract DE-AC05-00OR22725 This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. The funding offices from the DOE and NCI did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of all authors are articulated in the ‘author contributions’ section.
Funders | Funder number |
---|---|
Information Management Services Inc | |
National Institutes of Health | |
U.S. Department of Energy | |
National Cancer Institute | P30CA177558, 2450-Z301-19 |
National Cancer Institute | |
Office of Science | |
Argonne National Laboratory | DEAC02-06-CH11357 |
Argonne National Laboratory | |
Lawrence Livermore National Laboratory | DEAC52-07NA27344 |
Lawrence Livermore National Laboratory | |
Oak Ridge National Laboratory | DE-AC05-00OR22725 |
Oak Ridge National Laboratory | |
Los Alamos National Laboratory | DE-AC5206NA25396 |
Los Alamos National Laboratory |