SynthNotes: A Generator Framework for High-volume, High-fidelity Synthetic Mental Health Notes

Edmon Begoli, Kris Brown, Sudarshan Srinivas, Suzanne Tamang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

One of the key, emerging challenges that connects the »Big Data» and the AI domain is the availability of sufficient volumes of training data for AI/Machine Learning tasks. SynthNotes is a framework for generating standards-compliant, realistic mental health progress report notes at the very large, population-level scale, and in a strict privacy-preserving manner. Our framework, inspired by the needs to explore, evaluate, and train computational methods for the emerging mental health crisis in the US, is useful for benchmarking, optimization, and training of biomedical natural language processing, information extraction, and machine learning systems intended to operate at »Big Data» scale (billions of notes). The free text notes generated by SynthNotes are based on the literature and public statistical models allowing for realistic, natural language representation of a patient, and his or her mental health characteristics. Additionally, SynthNotes can partially simulate stylistic, grammatical, and expressive characteristics of a licensed mental health professional. SynthNotes is modular and flexible, allowing for representation of variety of conditions, incorporation of alternative foundational models, and parametrization of the variability of the structure, content, and size of the synthetically generated corpus. In this paper, we report on the initial use and performance characteristics of our SynthNotes framework and on the ongoing work for inclusion of content planning and deep learning-based generative methods trained on real data.

Original languageEnglish
Title of host publicationProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018
EditorsNaoki Abe, Huan Liu, Calton Pu, Xiaohua Hu, Nesreen Ahmed, Mu Qiao, Yang Song, Donald Kossmann, Bing Liu, Kisung Lee, Jiliang Tang, Jingrui He, Jeffrey Saltz
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages951-958
Number of pages8
ISBN (Electronic)9781538650356
DOIs
StatePublished - Jul 2 2018
Event2018 IEEE International Conference on Big Data, Big Data 2018 - Seattle, United States
Duration: Dec 10 2018Dec 13 2018

Publication series

NameProceedings - 2018 IEEE International Conference on Big Data, Big Data 2018

Conference

Conference2018 IEEE International Conference on Big Data, Big Data 2018
Country/TerritoryUnited States
CitySeattle
Period12/10/1812/13/18

Bibliographical note

Publisher Copyright:
© 2018 IEEE.

Funding

ACKNOWLEDGMENT This manuscript has been in part co-authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725, and under a joint program with the Department of Veterans Affairs (MVP CHAMPION and VICTOR). We want to acknowledge our colleague Eduardo Ponce for providing the experimental data collected from text evaluation experiments. We would also like to acknowledge our colleague Josh Arnold who set up our Spark and HDFS infrastructure and provided the scripts for running SynthNotes there.

FundersFunder number
UT-Battelle, LLC

    Keywords

    • Big Data Volume
    • Machine Learning
    • Natural Language Generation
    • Synthetic Data

    Fingerprint

    Dive into the research topics of 'SynthNotes: A Generator Framework for High-volume, High-fidelity Synthetic Mental Health Notes'. Together they form a unique fingerprint.

    Cite this