TY - GEN
T1 - Performance Impact and Trade-Offs for Tuning Key Architectural Parameters on CPU+GPU Systems
AU - Asifuzzaman, Kazi
AU - Miniskar, Narasinga Rao
AU - Godoy, William
AU - Hernandez Mendoza, Oscar
AU - Vetter, Jeffrey
N1 - Publisher Copyright:
© 2025 Copyright held by the owner/author(s).
PY - 2025/5/13
Y1 - 2025/5/13
N2 - In this work, we performed an initial design space exploration of an accelerated processing unit (APU) - a hybrid CPU+GPU architecture that integrates both compute units (CUs) and memory into a unified system. This integration aims to reduce data movement, enhance memory locality, and improve energy efficiency by enabling the CPU and GPU to share memory directly. This effort focused on the interplay of key design components - cache line size, the number of CUs, and main memory technology - and the trade-offs of each configuration were analyzed. This paper highlights the various configurations' impact on memory accesses, data reuse, and power utilization. The results provide valuable insights that can be leveraged to optimize APU architectures for high-performance and energy-efficient computing and thus create a balanced architecture. This optimization can be achieved by adopting dynamic cache management, runtime CU scaling, and advanced memory integration, highlighting the potential of APUs to address critical challenges in compute, data movement, and memory power consumption.
AB - In this work, we performed an initial design space exploration of an accelerated processing unit (APU) - a hybrid CPU+GPU architecture that integrates both compute units (CUs) and memory into a unified system. This integration aims to reduce data movement, enhance memory locality, and improve energy efficiency by enabling the CPU and GPU to share memory directly. This effort focused on the interplay of key design components - cache line size, the number of CUs, and main memory technology - and the trade-offs of each configuration were analyzed. This paper highlights the various configurations' impact on memory accesses, data reuse, and power utilization. The results provide valuable insights that can be leveraged to optimize APU architectures for high-performance and energy-efficient computing and thus create a balanced architecture. This optimization can be achieved by adopting dynamic cache management, runtime CU scaling, and advanced memory integration, highlighting the potential of APUs to address critical challenges in compute, data movement, and memory power consumption.
KW - General purpose GPU processing
KW - Performance benchmarking
KW - System simulation
UR - http://www.scopus.com/inward/record.url?scp=105007284660&partnerID=8YFLogxK
U2 - 10.1145/3725798.3725805
DO - 10.1145/3725798.3725805
M3 - Conference contribution
AN - SCOPUS:105007284660
T3 - GPGPU 2025 - 17th Workshop on General Purpose Processing Using GPU
SP - 42
EP - 47
BT - GPGPU 2025 - 17th Workshop on General Purpose Processing Using GPU
PB - Association for Computing Machinery, Inc
T2 - 17th Workshop on General Purpose Processing Using GPU, GPGPU 2025
Y2 - 1 March 2025
ER -