Abstract
High throughput computing (HTC) has aided the scientific community in the analysis of vast amounts of data and computational jobs in distributed environments. To manage these large workloads, several systems have been developed to efficiently allocate and provide access to distributed resources. Many of these systems rely on job characteristics estimates (e.g., job runtime) to characterize the workload behavior, which in practice is hard to obtain. In this work, we perform an exploratory analysis of the CMS experiment workload using the statistical recursive partitioning method and conditional inference trees to identify patterns that characterize particular behaviors of the workload. We then propose an estimation process to predict job characteristics based on the collected data. Experimental results show that our process estimates job runtime with 75% of accuracy on average, and produces nearly optimal predictions for disk and memory consumption.
Original language | English |
---|---|
Pages (from-to) | 39-48 |
Number of pages | 10 |
Journal | Procedia Computer Science |
Volume | 51 |
Issue number | 1 |
DOIs | |
State | Published - 2015 |
Externally published | Yes |
Event | International Conference on Computational Science, ICCS 2002 - Amsterdam, Netherlands Duration: Apr 21 2002 → Apr 24 2002 |
Funding
This work was funded by DOE under the contract number ER26110, “dV/dt - Accelerating the Rate of Progress Towards Extreme Scale Collaborative Science”. We also thank Jeff Dost and Greg Thain for their valuable help, and the Open Science Grid (OSG).
Funders | Funder number |
---|---|
U.S. Department of Energy | ER26110 |
U.S. Department of Energy |
Keywords
- High throughput computing
- Job characteristics estimation
- Workload characterization