Abstract
FAIR principles require that scientific data be findable, discoverable, and reusable by users. To enable FAIRness, practioners of a science repository will often construct a rich, searchable index of metadata derived from the data. Unfortunately, manual metadata annotation methods do not scale to the many data files generated by many projects; and instead automated extraction systems are needed to scalably parse these files - often with nonstandard schema requiring specialized parsing strategies - and deposit representative metadata into a search index. In this work, we evaluate whether, and the extent to which, automatically extracted metadata make research repositories more navigable. We present a two-part user study conducted with scientists at two U.S. national laboratories from projects spanning spectroscopy and battery modeling. We constructed research indexes automatically by using the Xtract metadata extraction system. In the first part of our study, we learned about each user's role and identified key navigation concerns for scientists. We found that participants wished to navigate for purposes of discovery, retrieval, and organization. In the second part, participants completed simulated research data navigation tasks crafted to reflect real-world navigability concerns. We found that regardless of the interface used, participants consistently solved navigation tasks with high degrees of confidence and correctness, and significantly (1.2X-50×) faster than via their alternative methods (e.g., manual directory scans or designing a customized navigational tool).
| Original language | English |
|---|---|
| Title of host publication | Proceedings 2023 IEEE 19th International Conference on e-Science, e-Science 2023 |
| Publisher | Institute of Electrical and Electronics Engineers Inc. |
| ISBN (Electronic) | 9798350322231 |
| DOIs | |
| State | Published - 2023 |
| Event | 19th IEEE International Conference on e-Science, e-Science 2023 - Limassol, Cyprus Duration: Oct 9 2023 → Oct 14 2023 |
Publication series
| Name | Proceedings 2023 IEEE 19th International Conference on e-Science, e-Science 2023 |
|---|
Conference
| Conference | 19th IEEE International Conference on e-Science, e-Science 2023 |
|---|---|
| Country/Territory | Cyprus |
| City | Limassol |
| Period | 10/9/23 → 10/14/23 |
Funding
We gratefully acknowledge the computing resources provided and operated by the the Joint Laboratory for System Evaluation (JLSE) and the Advanced Leadership Computing Facility (ALCF) at Argonne National Laboratory. This work was performed under financial award 70NANB19H005 from the U.S. Dept. of Commerce, National Institute of Standards and Technology as part of the Center for Hierarchical Materials Design (CHiMaD), the National Science Foundation under Grant No. 2004894, and the U.S. Dept. of Energy, Office of Science, Advanced Scientific Computing Research (ASCR), under contract DE-AC02-06CH11357. This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The publisher, by accepting the article for publication, acknowledges that the U.S. Government retains a non-exclusive, paid up, irrevocable, worldwide license to publish or reproduce the published form of the manuscript, or allow others to do so, for U.S. Government purposes. The DOE will provide public access to these results in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).
Keywords
- file storage
- information extraction
- metadata