TY - GEN
T1 - Exploring the Frontiers of Energy Efficiency using Power Management at System Scale
AU - Karimi, Ahmad Maroof
AU - Maiterth, Matthias
AU - Shin, Woong
AU - Sattar, Naw Safrin
AU - Lu, Hao
AU - Wang, Feiyi
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - In the face of surging power demands for exascale HPC systems, this work tackles the critical challenge of understanding the impact of software-driven power management techniques like Dynamic Voltage and Frequency Scaling (DVFS) and Power Capping. These techniques have been actively developed over the past few decades. By combining insights from GPU benchmarking to understand application power profiles, we present a telemetry data-driven approach for deriving energy savings projections. This approach has been demonstrably applied to the Frontier supercomputer at scale. Our findings based on three months of telemetry data indicate that, for certain resource-constrained jobs, significant energy savings (up to 8.5%) can be achieved without compromising performance. This translates to a substantial cost reduction, equivalent to 1438 MWh of energy saved. The key contribution of this work lies in the methodology for establishing an upper limit for these best-case scenarios and its successful application. This work enables HPC professionals to optimize the power-performance trade-off within constrained power budgets, not only for the exascale era but also beyond.
AB - In the face of surging power demands for exascale HPC systems, this work tackles the critical challenge of understanding the impact of software-driven power management techniques like Dynamic Voltage and Frequency Scaling (DVFS) and Power Capping. These techniques have been actively developed over the past few decades. By combining insights from GPU benchmarking to understand application power profiles, we present a telemetry data-driven approach for deriving energy savings projections. This approach has been demonstrably applied to the Frontier supercomputer at scale. Our findings based on three months of telemetry data indicate that, for certain resource-constrained jobs, significant energy savings (up to 8.5%) can be achieved without compromising performance. This translates to a substantial cost reduction, equivalent to 1438 MWh of energy saved. The key contribution of this work lies in the methodology for establishing an upper limit for these best-case scenarios and its successful application. This work enables HPC professionals to optimize the power-performance trade-off within constrained power budgets, not only for the exascale era but also beyond.
KW - Energy Projection
KW - HPC Energy Efficiency
KW - HPC Job Power Consumption
UR - http://www.scopus.com/inward/record.url?scp=85217181651&partnerID=8YFLogxK
U2 - 10.1109/SCW63240.2024.00230
DO - 10.1109/SCW63240.2024.00230
M3 - Conference contribution
AN - SCOPUS:85217181651
T3 - Proceedings of SC 2024-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis
SP - 1835
EP - 1844
BT - Proceedings of SC 2024-W
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC Workshops 2024
Y2 - 17 November 2024 through 22 November 2024
ER -