SciND: a new triplet-based dataset for scientific novelty detection via knowledge graphs

Komal Gupta, Ammaar Ahmad, Tirthankar Ghosal, Asif Ekbal

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Detecting texts that contain semantic-level new information is not straightforward. The problem becomes more challenging for research articles. Over the years, many datasets and techniques have been developed to attempt automatic novelty detection. However, the majority of the existing textual novelty detection investigations are targeted toward general domains like newswire. A comprehensive dataset for scientific novelty detection is not available in the literature. In this paper, we present a new triplet-based corpus (SciND) for scientific novelty detection from research articles via knowledge graphs. The proposed dataset consists of three types of triples (i) triplet for the knowledge graph, (ii) novel triplets, and (iii) non-novel triplets. We build a scientific knowledge graph for research articles using triplets across several natural language processing (NLP) domains and extract novel triplets from the paper published in the year 2021. For the non-novel articles, we use blog post summaries of the research articles. Our knowledge graph is domain-specific. We build the knowledge graph for seven NLP domains. We further use a feature-based novelty detection scheme from the research articles as a baseline. Moreover, we show the applicability of our proposed dataset using our baseline novelty detection algorithm. Our algorithm yields a baseline F1 score of 72%. We show analysis and discuss the future scope using our proposed dataset. To the best of our knowledge, this is the very first dataset for scientific novelty detection via a knowledge graph. We make our codes and dataset publicly available at https://github.com/92Komal/Scientific_Novelty_Detection .

Original languageEnglish
JournalInternational Journal on Digital Libraries
DOIs
StateAccepted/In press - 2024

Keywords

  • Data preparation
  • Information extraction
  • Novelty detection
  • Scientific knowledge graph

Fingerprint

Dive into the research topics of 'SciND: a new triplet-based dataset for scientific novelty detection via knowledge graphs'. Together they form a unique fingerprint.

Cite this