Is your document novel? Let attention guide you. An attention-based model for document-level novelty detection

Tirthankar Ghosal, Vignesh Edithal, Asif Ekbal, Pushpak Bhattacharyya, Srinivasa Satya Sameer Kumar Chivukula, George Tsatsaronis

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

Detecting, whether a document contains sufficient new information to be deemed as novel, is of immense significance in this age of data duplication. Existing techniques for document-level novelty detection mostly perform at the lexical level and are unable to address the semantic-level redundancy. These techniques usually rely on handcrafted features extracted from the documents in a rule-based or traditional feature-based machine learning setup. Here, we present an effective approach based on neural attention mechanism to detect document-level novelty without any manual feature engineering. We contend that the simple alignment of texts between the source and target document(s) could identify the state of novelty of a target document. Our deep neural architecture elicits inference knowledge from a large-scale natural language inference dataset, which proves crucial to the novelty detection task. Our approach is effective and outperforms the standard baselines and recent work on document-level novelty detection by a margin of 3% in terms of accuracy.

Original languageEnglish
Pages (from-to)427-454
Number of pages28
JournalNatural Language Engineering
Volume27
Issue number4
DOIs
StatePublished - Jul 2021
Externally publishedYes

Funding

The first author, Tirthankar Ghosal, acknowledges Visvesvaraya PhD Scheme for Electronics and IT, an initiative of Ministry of Electronics and Information Technology (MeitY), Government of India for fellowship support. The third author, Asif Ekbal, acknowledges Young Faculty Research Fellowship (YFRF), supported by Visvesvaraya PhD scheme for Electronics and IT, Ministry of Electronics and Information Technology (MeitY), Government of India, being implemented by Digital India Corporation (formerly Media Lab Asia). The authors also thank Elsevier Center of Excellence for Natural Language Processing, Indian Institute of Technology Patna for adequate infrastructural support to carry out this research. Finally, the authors appreciate the anonymous reviewers for their critical evaluation of our work and suggestions to carry forward from here.

FundersFunder number
Digital India Corporation
Ministry of Electronics and Information technology

    Keywords

    • Decomposable Attention
    • Document Classification
    • Document-Level Novelty Detection
    • Natural Language Inference

    Fingerprint

    Dive into the research topics of 'Is your document novel? Let attention guide you. An attention-based model for document-level novelty detection'. Together they form a unique fingerprint.

    Cite this