Computationally Efficient Learning of Quality Controlled Word Embeddings for Natural Language Processing

Mohammed Alawad, Georgia Tourassi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

Deep learning (DL) has been used for many natural language processing (NLP) tasks due to its superior performance as compared to traditional machine learning approaches. In DL models for NLP, words are represented using word embeddings, which capture both semantic and syntactic information in text. However, 90-95% of the DL trainable parameters are associated with the word embeddings, resulting in a large storage or memory footprint. Therefore, reducing the number of word embedding parameters is critical, especially with the increase of vocabulary size. In this work, we propose a novel approximate word embeddings approach for convolutional neural networks (CNNs) used for text classification tasks. The proposed approach significantly reduces the number of model trainable parameters without noticeably sacrificing in computing performance accuracy. Compared to other techniques, our proposed word embeddings technique does not require modifications to the DL model architecture. We evaluate the performance of the the proposed word embeddings on three classification tasks using two datasets, composed of Yelp and Amazon reviews. The results show that the proposed method can reduce the number of word embeddings parameters by 98% and 99% for the Yelp and Amazon datasets respectively, with no drop in computing accuracy.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2019
PublisherIEEE Computer Society
Pages134-139
Number of pages6
ISBN (Electronic)9781538670996
DOIs
StatePublished - Jul 2019
Event18th IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2019 - Miami, United States
Duration: Jul 15 2019Jul 17 2019

Publication series

NameProceedings of IEEE Computer Society Annual Symposium on VLSI, ISVLSI
Volume2019-July
ISSN (Print)2159-3469
ISSN (Electronic)2159-3477

Conference

Conference18th IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2019
Country/TerritoryUnited States
CityMiami
Period07/15/1907/17/19

Funding

This manuscript has been authored by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a nonexclusive, paidup, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

FundersFunder number
National Institutes of Health
U.S. Department of Energy
National Cancer Institute
Argonne National LaboratoryDE-AC02-06-CH11357
Lawrence Livermore National LaboratoryDEAC52-07NA27344
Oak Ridge National LaboratoryDE-AC05-00OR22725
Laboratory Directed Research and Development8339
Los Alamos National LaboratoryDE-AC5206NA25396

    Keywords

    • Approximate computing
    • convolutional neural networks
    • natural language processing
    • word embeddings

    Fingerprint

    Dive into the research topics of 'Computationally Efficient Learning of Quality Controlled Word Embeddings for Natural Language Processing'. Together they form a unique fingerprint.

    Cite this