Abstract
The scarcity of publicly available operational data from High Performance Computing (HPC) systems hinders research in critical areas like system dependability and resource optimization. While recent efforts, such as the Atlas project, have increased the availability of cluster traces, these datasets often lack fine-grained operational metrics linked to comprehensive job-level attributes across diverse environments. To address this gap, we introduce Fresco, a dataset containing data from 20.9 million jobs spanning 75 months collected from three major academic supercomputing clusters: Purdue’s Anvil and Conte systems, and Texas Advanced Computing Center’s Stampede. Fresco uniquely captures six key performance metrics alongside many job-level attributes such as resource allocations and execution outcomes. We detail our data integration process that transforms and standardizes the heterogeneous data sources into a consistent format. The resulting dataset enables researchers to investigate the relationships between job characteristics, resource consumption patterns, and system performance in academic HPC environments. We make this resource open source at https://www.frescodata.xyz. Our expectation is that this public release will facilitate research and operational improvements that had previously been impossible due to the unavailability of such data.
| Original language | English |
|---|---|
| Title of host publication | PEARC 2025 - Practice and Experience in Advanced Research Computing 2025 |
| Subtitle of host publication | The Power of Collaboration |
| Publisher | Association for Computing Machinery, Inc |
| ISBN (Electronic) | 9798400713989 |
| DOIs | |
| State | Published - Jul 18 2025 |
| Externally published | Yes |
| Event | 2025 Practice and Experience in Advanced Research Computing, PEARC 2025 - Columbus, United States Duration: Jul 20 2025 → Jul 24 2025 |
Publication series
| Name | PEARC 2025 - Practice and Experience in Advanced Research Computing 2025: The Power of Collaboration |
|---|
Conference
| Conference | 2025 Practice and Experience in Advanced Research Computing, PEARC 2025 |
|---|---|
| Country/Territory | United States |
| City | Columbus |
| Period | 07/20/25 → 07/24/25 |
Funding
This work was ably supported by Carol Song of Purdue’s Information Technology Department, Stephen Harrell of the Texas Advanced Computing Center (TACC). This material is based in part upon work supported by the National Science Foundation under Grant Numbers CNS-2016704 and CCF-2140139. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the sponsor.
Keywords
- Computer system dependability
- Computer system usage
- Data repository