yProv4ML: Effortless provenance tracking for machine learning systems

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

The rapid growth in interest in deep learning and foundation models (FMs) in particular, has attracted the attention of a diverse range of researchers thanks to their generalization ability. However, the advent of these techniques has also brought to light the lack of transparency and rigor in the way development is pursued. In particular, the inability to determine the number of epochs and other hyperparameters in advance presents challenges in identifying the best model. To address this challenge, machine learning frameworks such as MLFlow can automate the collection of this type of information. However, these tools capture data using proprietary formats and pose little attention to lineage. This paper proposes yProv4ML, a framework that captures provenance information generated during machine learning processes in PROV-JSON format, with minimal code modification.

Original languageEnglish
Article number102298
JournalSoftwareX
Volume31
DOIs
StatePublished - Sep 2025

Funding

This work was partially funded under the National Recovery and Resilience Plan (NRRP), Mission 4 Component 2 Investment 1.4 - Call for tender No. 1031 of 17/06/2022 of Italian Ministry for University and Research funded by the European Union – NextGenerationEU (proj. nr. CN_00000013) and the EU InterTwin project (Grant Agreement 101058386). Moreover this research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

Keywords

  • Machine learning
  • PROV-JSON
  • Provenance
  • Provenance graph
  • yProv4ML

Fingerprint

Dive into the research topics of 'yProv4ML: Effortless provenance tracking for machine learning systems'. Together they form a unique fingerprint.

Cite this