Can Automated Metadata Extraction Make Scientific Data More Navigable?

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

FAIR principles require that scientific data be findable, discoverable, and reusable by users. To enable FAIRness, practioners of a science repository will often construct a rich, searchable index of metadata derived from the data. Unfortunately, manual metadata annotation methods do not scale to the many data files generated by many projects; and instead automated extraction systems are needed to scalably parse these files - often with nonstandard schema requiring specialized parsing strategies - and deposit representative metadata into a search index. In this work, we evaluate whether, and the extent to which, automatically extracted metadata make research repositories more navigable. We present a two-part user study conducted with scientists at two U.S. national laboratories from projects spanning spectroscopy and battery modeling. We constructed research indexes automatically by using the Xtract metadata extraction system. In the first part of our study, we learned about each user's role and identified key navigation concerns for scientists. We found that participants wished to navigate for purposes of discovery, retrieval, and organization. In the second part, participants completed simulated research data navigation tasks crafted to reflect real-world navigability concerns. We found that regardless of the interface used, participants consistently solved navigation tasks with high degrees of confidence and correctness, and significantly (1.2X-50×) faster than via their alternative methods (e.g., manual directory scans or designing a customized navigational tool).

Original languageEnglish
Title of host publicationProceedings 2023 IEEE 19th International Conference on e-Science, e-Science 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350322231
DOIs
StatePublished - 2023
Event19th IEEE International Conference on e-Science, e-Science 2023 - Limassol, Cyprus
Duration: Oct 9 2023Oct 14 2023

Publication series

NameProceedings 2023 IEEE 19th International Conference on e-Science, e-Science 2023

Conference

Conference19th IEEE International Conference on e-Science, e-Science 2023
Country/TerritoryCyprus
CityLimassol
Period10/9/2310/14/23

Funding

We gratefully acknowledge the computing resources provided and operated by the the Joint Laboratory for System Evaluation (JLSE) and the Advanced Leadership Computing Facility (ALCF) at Argonne National Laboratory. This work was performed under financial award 70NANB19H005 from the U.S. Dept. of Commerce, National Institute of Standards and Technology as part of the Center for Hierarchical Materials Design (CHiMaD), the National Science Foundation under Grant No. 2004894, and the U.S. Dept. of Energy, Office of Science, Advanced Scientific Computing Research (ASCR), under contract DE-AC02-06CH11357. This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The publisher, by accepting the article for publication, acknowledges that the U.S. Government retains a non-exclusive, paid up, irrevocable, worldwide license to publish or reproduce the published form of the manuscript, or allow others to do so, for U.S. Government purposes. The DOE will provide public access to these results in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

Keywords

  • file storage
  • information extraction
  • metadata

Fingerprint

Dive into the research topics of 'Can Automated Metadata Extraction Make Scientific Data More Navigable?'. Together they form a unique fingerprint.

Cite this