Decoding substance use disorder severity from clinical notes using a large language model

Maria Mahbub, Gregory M. Dams, Sudarshan Srinivasan, Caitlin Rizy, Ioana Danciu, Jodie Trafton, Katie Knight

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Substance use disorder (SUD) poses a major concern due to its detrimental effects on health and society. SUD identification and treatment depend on a variety of factors such as severity, co-determinants (e.g., withdrawal symptoms), and social determinants of health. Existing diagnostic coding systems used by insurance providers, like the International Classification of Diseases (ICD-10), lack granularity for certain diagnoses, but American clinicians will add this granularity (as that found within the Diagnostic and Statistical Manual of Mental Disorders classification or DSM-5) as supplemental unstructured text in clinical notes. Traditional natural language processing (NLP) methods face limitations in accurately parsing such diverse clinical language. Large language models (LLMs) offer promise in overcoming these challenges by adapting to diverse language patterns. This study investigates the application of LLMs for extracting severity-related information for various SUD diagnoses from clinical notes. We propose a workflow employing zero-shot learning of LLMs with carefully crafted prompts and post-processing techniques. Through experimentation with Flan-T5, an open-source LLM, we demonstrate its superior recall compared to the rule-based approach. Focusing on 11 categories of SUD diagnoses, we show the effectiveness of LLMs in extracting severity information, contributing to improved risk assessment and treatment planning for SUD patients.

Original languageEnglish
Article number5
Journalnpj Mental Health Research
Volume4
Issue number1
DOIs
StatePublished - Dec 2025

Funding

This quality improvement, non-research work was supported by the Department of Veterans Affairs, Office of Mental Health and Office of Suicide Prevention. This initiative used VA-funded computing resources from the Knowledge Discovery Infrastructure (KDI) at Oak Ridge National Laboratory. The KDI resource is also supported by DOE\u2019s Office of Science. The manuscript has been authored by UT-Battelle LLC under contract DE-AC05-00OR22725 with the DOE. The US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce this manuscript or allow others to do so for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). The authors also wish to acknowledge the support of the larger partnership and, most importantly, the veterans who chose to receive their care at the VA. Disclaimer: The views and opinions expressed in this manuscript are those of the authors and do not represent those of the Department of Veterans Affairs, the Department of Energy, or the United States Government.

Fingerprint

Dive into the research topics of 'Decoding substance use disorder severity from clinical notes using a large language model'. Together they form a unique fingerprint.

Cite this