Abstract
Detecting the novelty or freshness of an entire document is essential in this age of data duplication and semanticlevel redundancy all across the web. Current techniques for the problem mostly root on handcrafted similarity and divergence based measures to classify a document as novel or nonnovel. However, document-level novelty detection is relatively less explored in literature if compared to its sentence-level counterpart. In this work, we present a deep neural architecture to automatically predict the amount of new information contained in a document in the form of a novelty score. Along with, we offer a dataset of more than 7500 documents, annotated at the sentence-level to facilitate further research. Our approach which learns the notion of novelty and redundancy only from the data achieves significant performance improvement over the existing methods and adopted baselines (@17% error reduction). Also, our approach complies with the Two-Stage theory of human recall essential to comprehend new information.
Original language | English |
---|---|
Title of host publication | 2019 International Joint Conference on Neural Networks, IJCNN 2019 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9781728119854 |
DOIs | |
State | Published - Jul 2019 |
Externally published | Yes |
Event | 2019 International Joint Conference on Neural Networks, IJCNN 2019 - Budapest, Hungary Duration: Jul 14 2019 → Jul 19 2019 |
Publication series
Name | Proceedings of the International Joint Conference on Neural Networks |
---|---|
Volume | 2019-July |
Conference
Conference | 2019 International Joint Conference on Neural Networks, IJCNN 2019 |
---|---|
Country/Territory | Hungary |
City | Budapest |
Period | 07/14/19 → 07/19/19 |
Funding
VIII. ACKNOWLEDGEMENT The first author and Asif Ekbal acknowledge the Visves-varaya PhD scheme for Electronics and IT and Visvesvaraya YFRF respectively under Ministry of Electronics and Information Technology (MeitY), Government of India for support.
Keywords
- document classification
- document-level novelty
- novelty score