Provenance Tracking in Large-Scale Machine Learning Systems

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

As the demand for large-scale AI models continues to grow, the optimization of their training to balance computational efficiency, execution time, accuracy and energy consumption represents a critical multidimensional challenge. Achieving this balance requires not only innovative algorithmic techniques and hardware architectures but also comprehensive tools for monitoring, analyzing, and understanding the underlying processes involved in model training and deployment. Provenance data - information about the origins, context, and transformations of data and processes - has become a key component in this pursuit. By leveraging provenance, researchers and engineers can gain insights into resource usage patterns, identify inefficiencies, and ensure reproducibility and accountability in AI development workflows. For this reason, the question of how distributed resources can be optimally utilized to scale large AI models in an energy-efficient manner is a fundamental one. To support this effort, we introduce the yProv4ML library, a tool designed to collect provenance data in JSON format, compliant with the W3C PROV and ProvML standards. yProv4ML focuses on flexibility and extensibility, and enables users to integrate additional data collection tools via plugins. The library is fully integrated with the yProv framework, allowing for higher level pairing in tasks run also through workflow management systems.

Original languageEnglish
Title of host publication54th International Conference on Parallel Processing, ICPP 2025 - Workshops Proceedings
PublisherAssociation for Computing Machinery, Inc
Pages167-174
Number of pages8
ISBN (Electronic)9798400721090
DOIs
StatePublished - Dec 20 2025
Event54th International Conference on Parallel Processing Workshop, ICPP 2025 - San Diego, United States
Duration: Sep 8 2025Sep 11 2025

Publication series

Name54th International Conference on Parallel Processing, ICPP 2025 - Workshops Proceedings

Conference

Conference54th International Conference on Parallel Processing Workshop, ICPP 2025
Country/TerritoryUnited States
CitySan Diego
Period09/8/2509/11/25

Funding

This work was partially funded by the EU InterTwin project (Grant Agreement 101058386) and the RI-SCALE project Grant Agreement 101188168). Morover this work was also funded under the National Recovery and Resilience Plan (NRRP), Mission 4 Component 2 Investment 1.4 - Call for tender No. 1031 of 17/06/2022 of Italian Ministry for University and Research funded by the European Union - NextGenerationEU (proj. nr. CN-00000013). Furthermore this research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

Keywords

  • Machine Learning
  • PROV-JSON
  • Provenance
  • RO-Crate
  • yProv4ML

Fingerprint

Dive into the research topics of 'Provenance Tracking in Large-Scale Machine Learning Systems'. Together they form a unique fingerprint.

Cite this