TY - GEN
T1 - Heterogeneous streaming
AU - Newburn, Chris J.
AU - Bansal, Gaurav
AU - Wood, Michael
AU - Crivelli, Luis
AU - Planas, Judit
AU - Duran, Alejandro
AU - Souza, Paulo
AU - Borges, Leonardo
AU - Luszczek, Piotr
AU - Tomov, Stanimire
AU - Dongarra, Jack
AU - Anzt, Hartwig
AU - Gates, Mark
AU - Haidar, Azzam
AU - Jia, Yulu
AU - Kabir, Khairul
AU - Yamazaki, Ichitaro
AU - Labarta, Jesus
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/7/18
Y1 - 2016/7/18
N2 - This paper introduces a new heterogeneous streaminglibrary called hetero Streams (hStreams). We show how asimple FIFO streaming model can be applied to heterogeneoussystems that include manycore coprocessors and multicore CPUs. This model supports concurrency across nodes, among taskswithin a node, and between data transfers and computation. Wegive examples for different approaches, show how the implementation can be layered, analyze overheads among layers, and apply those models to parallelize applications using simple, intuitive interfaces. We compare the features and versatility of hStreams, OpenMP, CUDA Streams and OmpSs. We show how the use of hStreams makes it easier for scientists to identify tasks and easily expose concurrency among them, and how it enables tuning experts and runtime systems to tailor execution for differentheterogeneous targets. Practical application examples are takenfrom the field of numerical linear algebra, commercial structuralsimulation software, and a seismic processing application.
AB - This paper introduces a new heterogeneous streaminglibrary called hetero Streams (hStreams). We show how asimple FIFO streaming model can be applied to heterogeneoussystems that include manycore coprocessors and multicore CPUs. This model supports concurrency across nodes, among taskswithin a node, and between data transfers and computation. Wegive examples for different approaches, show how the implementation can be layered, analyze overheads among layers, and apply those models to parallelize applications using simple, intuitive interfaces. We compare the features and versatility of hStreams, OpenMP, CUDA Streams and OmpSs. We show how the use of hStreams makes it easier for scientists to identify tasks and easily expose concurrency among them, and how it enables tuning experts and runtime systems to tailor execution for differentheterogeneous targets. Practical application examples are takenfrom the field of numerical linear algebra, commercial structuralsimulation software, and a seismic processing application.
KW - Concurrency
KW - Heterogeneous
KW - Offload
KW - Streaming
KW - Task parallelism
UR - http://www.scopus.com/inward/record.url?scp=84991721689&partnerID=8YFLogxK
U2 - 10.1109/IPDPSW.2016.217
DO - 10.1109/IPDPSW.2016.217
M3 - Conference contribution
AN - SCOPUS:84991721689
T3 - Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016
SP - 611
EP - 620
BT - Proceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 30th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016
Y2 - 23 May 2016 through 27 May 2016
ER -