Abstract
Recent work in machine translation has demonstrated that self-attention mechanisms can be used in place of recurrent neural networks to increase training speed without sacrificing model accuracy. We propose combining this approach with the benefits of convolutional filters and a hierarchical structure to create a document classification model that is both highly accurate and fast to train - we name our method Hierarchical Convolutional Attention Networks. We demonstrate the effectiveness of this architecture by surpassing the accuracy of the current state-of-the-art on several classification tasks while being twice as fast to train.
Original language | English |
---|---|
Title of host publication | ACL 2018 - Representation Learning for NLP, Proceedings of the 3rd Workshop |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 11-23 |
Number of pages | 13 |
ISBN (Electronic) | 9781948087438 |
State | Published - 2018 |
Event | 3rd Workshop on Representation Learning for NLP, RepL4NLP 2018 at the 56th Annual Meeting of the Association for Computational Linguistics ACL 2018 - Melbourne, Australia Duration: Jul 20 2018 → … |
Publication series
Name | Proceedings of the Annual Meeting of the Association for Computational Linguistics |
---|---|
ISSN (Print) | 0736-587X |
Conference
Conference | 3rd Workshop on Representation Learning for NLP, RepL4NLP 2018 at the 56th Annual Meeting of the Association for Computational Linguistics ACL 2018 |
---|---|
Country/Territory | Australia |
City | Melbourne |
Period | 07/20/18 → … |
Funding
This work has been supported in part by the Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) program established by the U.S. Department of Energy (DOE) and the National Cancer Institute (NCI) of the National Institutes of Health. This work was performed under the auspices of the U.S. Department of Energy by Argonne National Laboratory under Contract DE-AC02-06-CH11357, Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344, Los Alamos National Laboratory under Contract DE-AC5206NA25396, and Oak Ridge National Laboratory under Contract DE-AC05-00OR22725. This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.