Attention Mechanisms in Clinical Text Classification: A Comparative Evaluation

  • Christoph S. Metzner
  • , Shang Gao
  • , Drahomira Herrmannova
  • , Elia Lima-Walton
  • , Heidi A. Hanson

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

Attention mechanisms are now a mainstay architecture in neural networks and improve the performance of biomedical text classification tasks. In particular, models that perform automated medical encoding of clinical documents make extensive use of the label-wise attention mechanism. A label-wise attention mechanism increases a model's discriminatory ability by using label-specific reference information. This information can either be implicitly learned during training or explicitly provided through embedded textual code descriptions or information on the code hierarchy; however, contemporary studies arbitrarily select the type of label-specific reference information. To address this shortcoming, we evaluated label-wise attention initialized with either implicit or explicit label-specific reference information against two common baseline methods - target-attention and text-encoder architecture-specific methods - to generate document embeddings across four text-encoder architectures - a convolutional neural network, two recurrent neural networks, and a transformer. We also present an extension of label-wise attention that can embed the information on the code hierarchy. We performed our experiments on the MIMIC III dataset, which is a standard dataset in the clinical text classification domain. Our experiments showed that using pretrained reference information and the hierarchical design helped improve classification performance. These performance improvements had less impact on larger datasets and label spaces across all text-encoder architectures. In our analysis, we used an attention mechanism's energy scores to explain the perceived differences in performance and interpretability between the text-encoder architectures and types of label-attention.

Original languageEnglish
Pages (from-to)2247-2258
Number of pages12
JournalIEEE Journal of Biomedical and Health Informatics
Volume28
Issue number4
DOIs
StatePublished - Apr 1 2024

Funding

This work was supported in part by the Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) program established by the U.S. Department of Energy (DOE), in part by the National Cancer Institute (NCI) of the National Institutes of Health, in part by Argonne National Laboratory under Grant DE-AC02-06-CH11357, in part by Lawrence Livermore National Laboratory under Grant DEAC52-07NA27344, in part by Los Alamos National Laboratory under Grant DE-AC5206NA25396, in part by Oak Ridge National Laboratory (ORNL) under Grant DE-AC05-00OR22725 performed under the auspices of DOE, and in part by the UT-Battelle LCC under Grant DE-ACO5-000R22725 with the DOE.

Keywords

  • Attention
  • natural language processing
  • neural networks
  • text classification
  • transformer

Fingerprint

Dive into the research topics of 'Attention Mechanisms in Clinical Text Classification: A Comparative Evaluation'. Together they form a unique fingerprint.

Cite this