VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images

  • M. Maruf
  • , Arka Daw
  • , Kazi Sajeed Mehrab
  • , Harish Babu Manogaran
  • , Abhilash Neog
  • , Medha Sawhney
  • , Mridul Khurana
  • , James P. Balhoff
  • , Yasin Bakış
  • , Bahadir Altintas
  • , Matthew J. Thompson
  • , Elizabeth G. Campolongo
  • , Josef C. Uyeda
  • , Hilmar Lapp
  • , Henry L. Bart
  • , Paula M. Mabee
  • , Yu Su
  • , Wei Lun Chao
  • , Charles Stewart
  • , Tanya Berger-Wolf
  • Wasila Dahdul, Anuj Karpatne

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations

Abstract

Images are increasingly becoming the currency for documenting biodiversity on the planet, providing novel opportunities for accelerating scientific discoveries in the field of organismal biology, especially with the advent of large vision-language models (VLMs). We ask if pre-trained VLMs can aid scientists in answering a range of biologically relevant questions without any additional fine-tuning. In this paper, we evaluate the effectiveness of 12 state-of-the-art (SOTA) VLMs in the field of organismal biology using a novel dataset, VLM4Bio, consisting of 469K question-answer pairs involving 30K images from three groups of organisms: fishes, birds, and butterflies, covering five biologically relevant tasks. We also explore the effects of applying prompting techniques and tests for reasoning hallucination on the performance of VLMs, shedding new light on the capabilities of current SOTA VLMs in answering biologically relevant questions using images.

Original languageEnglish
JournalAdvances in Neural Information Processing Systems
Volume37
StatePublished - 2024
Event38th Conference on Neural Information Processing Systems, NeurIPS 2024 - Vancouver, Canada
Duration: Dec 9 2024Dec 15 2024

Funding

This research is supported by National Science Foundation (NSF) award for the HDR Imageomics Institute (OAC-2118240). We are thankful for the support of computational resources provided by the Advanced Research Computing (ARC) Center at Virginia Tech. This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains, and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript or allow others to do so for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (https://www.energy.gov/doe-public-access-plan).

Fingerprint

Dive into the research topics of 'VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images'. Together they form a unique fingerprint.

Cite this