Large-scale deep learning for metastasis detection in pathology reports

  • Patrycja Krawczuk
  • , Zachary R. Fox
  • , Valentina Petkov
  • , Serban Negoita
  • , Jennifer Doherty
  • , Antoinette Stroup
  • , Stephen Schwartz
  • , Lynne Penberthy
  • , Elizabeth Hsu
  • , John Gounley
  • , Heidi A. Hanson

Research output: Contribution to journalArticlepeer-review

Abstract

Objectives: No existing algorithm can reliably identify metastasis from pathology reports across multiple cancer types and the entire US population. In this study, we develop a deep learning model that automatically detects patients with metastatic cancer by using pathology reports from many laboratories and of multiple cancer types. Materials and Methods: We use 60 471 unstructured pathology reports from 4 Surveillance, Epidemiology, and End Results (SEER) registries. The reports were coded into 1 of 3 labels: metastasis negative, metastases positive, or metastasis undetermined. We utilize a task-specific deep neural network trained from scratch and compare its performance with a widely used large language model (LLM). Results: Our deep learning architecture trained on task-specific data outperforms a general-purpose LLM, with a recall of 0.894 compared to 0.824. We quantified model uncertainty and used it to defer reports for human review. We found that retaining 72.9% of reports increased recall from 0.894 to 0.969. Discussion: A smaller deep learning architecture trained on task-specific data outperforms a general LLM. Equally critical to model performance is the incorporation of uncertainty quantification, achieved here through an abstention mechanism. Conclusions : This study’s finding demonstrate the feasibility of developing algorithms to automatically identify metastatic cancer cases from unstructured pathology reports.

Original languageEnglish
Article numberooaf070
JournalJAMIA Open
Volume8
Issue number4
DOIs
StatePublished - Aug 1 2025

Funding

Office of Science of the US Department of Energy: This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan ( http://energy.gov/downloads/doe- public-access-plan ). This work has been supported in part by the US Department of Energy (DOE) and the NCI of the National Institutes of Health. This work was performed under the auspices of the DOE by Oak Ridge National Laboratory under Contract DE-AC05-00OR22725.

Keywords

  • machine learning
  • metastasis
  • natural language processing
  • recurrence

Fingerprint

Dive into the research topics of 'Large-scale deep learning for metastasis detection in pathology reports'. Together they form a unique fingerprint.

Cite this