TY - GEN
T1 - NoCMsg
T2 - 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2014
AU - Zimmer, Christopher
AU - Mueller, Frank
PY - 2014
Y1 - 2014
N2 - Current processor design with ever more cores may ensure that theoretical compute performance still follows past increases (resting from Moore's law), but they also increasingly present a challenge to hardware and software alike. As the core count increases, the network-on-chip(NoC) topology has changed from buses over rings and fully connected meshes to 2D meshes. The question is which programming paradigm provides the scalability needed to ensure performance is close to theoretical peak, where 2D meshes provide the most scalable design to date. This work contributes NoCMsg, a low-level message passing abstraction over NoCs. NoCMsg is specifically designed for large core counts in2D meshes. Its design ensures deadlock free messaging for wormhole Manhattan-path routing over the NoC. Experimental results on the Tile Pro hardware platform show that NoCMsg can significantly reduce communication times by up to 86% for single packet messages and up to40% for larger messages compared to other NoC-based message approaches. Results further demonstrate the potential of NoC messaging to outperform shared memory abstractions by up to 93% as core counts and inter-process communication increase, i.e., we observe that shared memory scales up to about 16 cores while message passing performs well beyond that threshold on this platform. To the best of our knowledge, this is the first head-on comparison of shared memory and advanced message passing specifically designed for NoCs on an actual hardware platform with larger core counts on a single socket.
AB - Current processor design with ever more cores may ensure that theoretical compute performance still follows past increases (resting from Moore's law), but they also increasingly present a challenge to hardware and software alike. As the core count increases, the network-on-chip(NoC) topology has changed from buses over rings and fully connected meshes to 2D meshes. The question is which programming paradigm provides the scalability needed to ensure performance is close to theoretical peak, where 2D meshes provide the most scalable design to date. This work contributes NoCMsg, a low-level message passing abstraction over NoCs. NoCMsg is specifically designed for large core counts in2D meshes. Its design ensures deadlock free messaging for wormhole Manhattan-path routing over the NoC. Experimental results on the Tile Pro hardware platform show that NoCMsg can significantly reduce communication times by up to 86% for single packet messages and up to40% for larger messages compared to other NoC-based message approaches. Results further demonstrate the potential of NoC messaging to outperform shared memory abstractions by up to 93% as core counts and inter-process communication increase, i.e., we observe that shared memory scales up to about 16 cores while message passing performs well beyond that threshold on this platform. To the best of our knowledge, this is the first head-on comparison of shared memory and advanced message passing specifically designed for NoCs on an actual hardware platform with larger core counts on a single socket.
KW - Message Passing
KW - Multicore Architectures
KW - Shared Memory
UR - http://www.scopus.com/inward/record.url?scp=84904556052&partnerID=8YFLogxK
U2 - 10.1109/CCGrid.2014.19
DO - 10.1109/CCGrid.2014.19
M3 - Conference contribution
AN - SCOPUS:84904556052
SN - 9781479927838
T3 - Proceedings - 14th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2014
SP - 186
EP - 195
BT - Proceedings - 14th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2014
PB - IEEE Computer Society
Y2 - 26 May 2014 through 29 May 2014
ER -