Adversarial Training for Privacy-Preserving Deep Learning Model Distribution

Mohammed Alawad, Shang Gao, Xiao Cheng Wu, Eric B. Durbin, Linda Coyle, Lynne Penberthy, Georgia Tourassi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Collaboration among cancer registries is essential to develop accurate, robust, and generalizable deep learning models for automated information extraction from cancer pathology reports. Sharing data presents a serious privacy issue, especially in biomedical research and healthcare delivery domains. Distributing pretrained deep learning (DL) models has been proposed to avoid critical data sharing. However, there is growing recognition that collaboration among clinical institutes through DL model distribution exposes new security and privacy vulnerabilities. These vulnerabilities increase in natural language processing (NLP) applications, in which the dataset vocabulary with word vector representations needs to be associated with the other model parameters. In this paper, we propose a novel privacy-preserving DL model distribution across cancer registries for information extraction from cancer pathology reports with privacy and confidentiality considerations. The proposed approach exploits the adversarial training framework to distinguish private features from shared features among different datasets. It only shares registry-invariant model parameters, without sharing raw data nor registry-specific model parameters among cancer registries. Thus, it protects both the data and the trained model simultaneously. We compare our proposed approach to single-registry models, and a model trained on centrally hosted data from different cancer registries. The results show that the proposed approach significantly outperforms the single-registry models and achieves statistically indistinguishable micro and macro F1-score as compared to the centralized model.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE International Conference on Big Data, Big Data 2019
EditorsChaitanya Baru, Jun Huan, Latifur Khan, Xiaohua Tony Hu, Ronay Ak, Yuanyuan Tian, Roger Barga, Carlo Zaniolo, Kisung Lee, Yanfang Fanny Ye
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5705-5710
Number of pages6
ISBN (Electronic)9781728108582
DOIs
StatePublished - Dec 2019
Event2019 IEEE International Conference on Big Data, Big Data 2019 - Los Angeles, United States
Duration: Dec 9 2019Dec 12 2019

Publication series

NameProceedings - 2019 IEEE International Conference on Big Data, Big Data 2019

Conference

Conference2019 IEEE International Conference on Big Data, Big Data 2019
Country/TerritoryUnited States
CityLos Angeles
Period12/9/1912/12/19

Funding

This manuscript has been authored by UT - Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a nonexclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of the manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan). This work has been supported in part by the Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) program established by the U.S. Department of Energy (DOE) and the National Cancer Institute (NCI) of the National Institutes of Health. This work was performed under the auspices of the U.S. Department of Energy by Argonne National Laboratory under Contract DE-AC02-06-CH11357, Lawrence Livermore National Laboratory under Contract DEAC52-07NA27344, Los Alamos National Laboratory under Contract DE-AC5206NA25396, and Oak Ridge National Laboratory under Contract DE-AC05-00OR22725.

FundersFunder number
National Institutes of Health
U.S. Department of Energy
National Cancer Institute
Argonne National LaboratoryDE-AC02-06-CH11357
Lawrence Livermore National LaboratoryDEAC52-07NA27344
Oak Ridge National LaboratoryDE-AC05-00OR22725
Los Alamos National LaboratoryDE-AC5206NA25396

    Keywords

    • Privacy-preserving
    • convolutional neural network
    • information extraction.
    • natural language processing

    Fingerprint

    Dive into the research topics of 'Adversarial Training for Privacy-Preserving Deep Learning Model Distribution'. Together they form a unique fingerprint.

    Cite this