TY - GEN
T1 - Designing Algorithms for the EMU Migrating-threads-based Architecture
AU - Belviranli, Mehmet E.
AU - Lee, Seyong
AU - Vetter, Jeffrey S.
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/11/26
Y1 - 2018/11/26
N2 - The decades-old memory bottleneck problem for data-intensive applications is getting worse as the processor core counts continue to increase. Workloads with sparse memory access characteristics only achieve a fraction of a system's total memory bandwidth. EMU architecture provides a radical approach to the issue by migrating the computational threads to the location where the data resides. The system enables access to a large PGAS-type memory for hundreds of nodes via a Cilk-based multi-threaded execution scheme. EMU architecture brings brand new challenges in application design and development. Data distribution and thread creation strategies play a crucial role in achieving optimal performance in the EMU platform. In this work, we identify several design considerations that need to be taken care of while developing applications for the new architecture and we evaluate their performance effects on the EMU-chick hardware. We also present a modified BFS algorithm for the EMU system and give experimental results for its execution on the platform.
AB - The decades-old memory bottleneck problem for data-intensive applications is getting worse as the processor core counts continue to increase. Workloads with sparse memory access characteristics only achieve a fraction of a system's total memory bandwidth. EMU architecture provides a radical approach to the issue by migrating the computational threads to the location where the data resides. The system enables access to a large PGAS-type memory for hundreds of nodes via a Cilk-based multi-threaded execution scheme. EMU architecture brings brand new challenges in application design and development. Data distribution and thread creation strategies play a crucial role in achieving optimal performance in the EMU platform. In this work, we identify several design considerations that need to be taken care of while developing applications for the new architecture and we evaluate their performance effects on the EMU-chick hardware. We also present a modified BFS algorithm for the EMU system and give experimental results for its execution on the platform.
UR - http://www.scopus.com/inward/record.url?scp=85058654827&partnerID=8YFLogxK
U2 - 10.1109/HPEC.2018.8547571
DO - 10.1109/HPEC.2018.8547571
M3 - Conference contribution
AN - SCOPUS:85058654827
T3 - 2018 IEEE High Performance Extreme Computing Conference, HPEC 2018
BT - 2018 IEEE High Performance Extreme Computing Conference, HPEC 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE High Performance Extreme Computing Conference, HPEC 2018
Y2 - 25 September 2018 through 27 September 2018
ER -