Automating Bibliometric Analysis with Sentence Transformers and Retrieval-Augmented Generation (RAG): A Pilot Study in Semantic and Contextual Search for Customized Literature Characterization for High-Impact Urban Research

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

Bibliometric analysis is essential for understanding research trends, scope, and impact in urban science, especially in high-impact journals, such Nature Portfolios. However, traditional methods, relying on key-word searches and basic NLP techniques, often fail to uncover valuable insights not explicitly stated in article titles or key-words. These approaches are unable to perform semantic searches and contextual understanding, limiting their effectiveness in classifying topics and characterizing studies. In this paper, we address these limitations by leveraging Generative AI models, specifically transformers and Retrieval-Augmented Generation (RAG), to automate and enhance bibliometric analysis. We developed a technical work-flow that integrates a vector database, Sentence Transformers, a Gaussian Mixture Model (GMM), Retrieval Agent, and Large Language Models (LLMs) to enable contextual search, topic ranking, and characterization of research using customized prompt templates. A pilot study analyzing 223 urban science-related articles published in Nature Communications over the past decade highlights the effectiveness of our approach in generating insightful summary statistics on the quality, scope, and characteristics of papers in high-impact journals. This study introduces a new paradigm for enhancing bibliometric analysis and knowledge retrieval in urban research, positioning an AI agent as a powerful tool for advancing research evaluation and understanding.

Original languageEnglish
Title of host publicationUrban-AI 2024 - Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Advances in Urban-AI
EditorsOlufemi A. Omitaomu, Ali Mostafavi, Sukanya Randhawa, Haoran Niu
PublisherAssociation for Computing Machinery, Inc
Pages43-49
Number of pages7
ISBN (Electronic)9798400711565
DOIs
StatePublished - Oct 29 2024
Event2nd ACM SIGSPATIAL International Workshop on Advances in Urban-AI, Urban-AI 2024 - Atlanta, United States
Duration: Oct 29 2024 → …

Publication series

NameUrban-AI 2024 - Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Advances in Urban-AI

Conference

Conference2nd ACM SIGSPATIAL International Workshop on Advances in Urban-AI, Urban-AI 2024
Country/TerritoryUnited States
CityAtlanta
Period10/29/24 → …

Funding

This work was supported by the U.S. Department of Energy (U.S DOE), Advanced Research Projects Agency–Energy (ARPA-E) under the project #DE-AR0001780. We thank our collaborators from the University of Tennessee Knoxville.

Keywords

  • Bibliometrics Analysis
  • Large Language Models
  • Retrieval-Augmented Generation
  • Transformers

Fingerprint

Dive into the research topics of 'Automating Bibliometric Analysis with Sentence Transformers and Retrieval-Augmented Generation (RAG): A Pilot Study in Semantic and Contextual Search for Customized Literature Characterization for High-Impact Urban Research'. Together they form a unique fingerprint.

Cite this