Novelty goes deep. A deep neural solution to document level novelty detection

Tirthankar Ghosal, Vignesh Edithal, Asif Ekbal, Pushpak Bhattacharyya, George Tsatsaronis, Srinivasa Satya Sameer Kumar Chivukula

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

25 Scopus citations

Abstract

The rapid growth of documents across the web has necessitated finding means of discarding redundant documents and retaining novel ones. Capturing redundancy is challenging as it may involve investigating at a deep semantic level. Techniques for detecting such semantic redundancy at the document level are scarce. In this work we propose a deep Convolutional Neural Network (CNN) based model to classify a document as novel or redundant with respect to a set of relevant documents already seen by the system. The system is simple and does not require manual feature engineering. Our novel scheme encodes relevant and relative information from both source and target texts to generate an intermediate representation for which we coin the name Relative Document Vector (RDV). The proposed method outperforms the existing benchmark on two document-level novelty detection datasets by a margin of ∼5% in terms of accuracy. We further demonstrate the effectiveness of our approach on a standard paraphrase detection dataset where the paraphrased passages closely resembles semantically redundant documents.

Original languageEnglish
Title of host publicationCOLING 2018 - 27th International Conference on Computational Linguistics, Proceedings
EditorsEmily M. Bender, Leon Derczynski, Pierre Isabelle
PublisherAssociation for Computational Linguistics (ACL)
Pages2802-2813
Number of pages12
ISBN (Electronic)9781948087506
StatePublished - 2018
Externally publishedYes
Event27th International Conference on Computational Linguistics, COLING 2018 - Santa Fe, United States
Duration: Aug 20 2018Aug 26 2018

Publication series

NameCOLING 2018 - 27th International Conference on Computational Linguistics, Proceedings

Conference

Conference27th International Conference on Computational Linguistics, COLING 2018
Country/TerritoryUnited States
CitySanta Fe
Period08/20/1808/26/18

Funding

The first author, Tirthankar Ghosal, acknowledges Visvesvaraya PhD Scheme for Electronics and IT, an initiative of Ministry of Electronics and Information Technology (MeitY), Government of India for fellowship support. Asif Ekbal acknowledges Young Faculty Research Fellowship (YFRF), supported by Visvesvaraya PhD scheme for Electronics and IT, Ministry of Electronics and Information Technology (MeitY), Government of India, being implemented by Digital India Corporation (formerly Media Lab Asia). We thank the anonymous reviewers for their valuable feedback and Prof. Donia Scott, University of Sussex for her advice in the Writing Mentoring Program as part of COLING 2018. We also thank Elsevier Center of Excellence for Natural Language Processing, Indian Institute of Technology Patna for adequate help and support to carry out this research.

FundersFunder number
Digital India Corporation
Ministry of Electronics and Information technology

    Fingerprint

    Dive into the research topics of 'Novelty goes deep. A deep neural solution to document level novelty detection'. Together they form a unique fingerprint.

    Cite this