ScienceQA: a novel resource for question answering on scholarly articles

Tanik Saikh, Tirthankar Ghosal, Amish Mittal, Asif Ekbal, Pushpak Bhattacharyya

Research output: Contribution to journalArticlepeer-review

30 Scopus citations

Abstract

Machine Reading Comprehension (MRC) of a document is a challenging problem that requires discourse-level understanding. Information extraction from scholarly articles nowadays is a critical use case for researchers to understand the underlying research quickly and move forward, especially in this age of infodemic. MRC on research articles can also provide helpful information to the reviewers and editors. However, the main bottleneck in building such models is the availability of human-annotated data. In this paper, firstly, we introduce a dataset to facilitate question answering (QA) on scientific articles. We prepare the dataset in a semi-automated fashion having more than 100k human-annotated context–question–answer triples. Secondly, we implement one baseline QA model based on Bidirectional Encoder Representations from Transformers (BERT). Additionally, we implement two models: the first one is based on Science BERT (SciBERT), and the second is the combination of SciBERT and Bi-Directional Attention Flow (Bi-DAF). The best model (i.e., SciBERT) obtains an F1 score of 75.46%. Our dataset is novel, and our work opens up a new avenue for scholarly document processing research by providing a benchmark QA dataset and standard baseline. We make our dataset and codes available here at https://github.com/TanikSaikh/Scientific-Question-Answering.

Original languageEnglish
Pages (from-to)289-301
Number of pages13
JournalInternational Journal on Digital Libraries
Volume23
Issue number3
DOIs
StatePublished - Sep 2022

Funding

On behalf of all the co-authors, Mr. Tanik Saikh would like to acknowledge ”Elsevier Centre of Excellence for Natural Language Processing” at the Department of Computer Science and Engineering at Indian Institute of Technology Patna for supporting research work carried out in this paper.

Keywords

  • Automatic article review system
  • BERT
  • BiDAF
  • Machine reading comprehension
  • Question answering
  • Scholarly articles
  • SciBERT

Fingerprint

Dive into the research topics of 'ScienceQA: a novel resource for question answering on scholarly articles'. Together they form a unique fingerprint.

Cite this