Abstract
Detecting, whether a document contains sufficient new information to be deemed as novel, is of immense significance in this age of data duplication. Existing techniques for document-level novelty detection mostly perform at the lexical level and are unable to address the semantic-level redundancy. These techniques usually rely on handcrafted features extracted from the documents in a rule-based or traditional feature-based machine learning setup. Here, we present an effective approach based on neural attention mechanism to detect document-level novelty without any manual feature engineering. We contend that the simple alignment of texts between the source and target document(s) could identify the state of novelty of a target document. Our deep neural architecture elicits inference knowledge from a large-scale natural language inference dataset, which proves crucial to the novelty detection task. Our approach is effective and outperforms the standard baselines and recent work on document-level novelty detection by a margin of 3% in terms of accuracy.
Original language | English |
---|---|
Pages (from-to) | 427-454 |
Number of pages | 28 |
Journal | Natural Language Engineering |
Volume | 27 |
Issue number | 4 |
DOIs | |
State | Published - Jul 2021 |
Externally published | Yes |
Funding
The first author, Tirthankar Ghosal, acknowledges Visvesvaraya PhD Scheme for Electronics and IT, an initiative of Ministry of Electronics and Information Technology (MeitY), Government of India for fellowship support. The third author, Asif Ekbal, acknowledges Young Faculty Research Fellowship (YFRF), supported by Visvesvaraya PhD scheme for Electronics and IT, Ministry of Electronics and Information Technology (MeitY), Government of India, being implemented by Digital India Corporation (formerly Media Lab Asia). The authors also thank Elsevier Center of Excellence for Natural Language Processing, Indian Institute of Technology Patna for adequate infrastructural support to carry out this research. Finally, the authors appreciate the anonymous reviewers for their critical evaluation of our work and suggestions to carry forward from here.
Funders | Funder number |
---|---|
Digital India Corporation | |
Ministry of Electronics and Information technology |
Keywords
- Decomposable Attention
- Document Classification
- Document-Level Novelty Detection
- Natural Language Inference