Abstract
The objective of the PULSAR project was to design a programming model suitable for largescale machines with complex memory hierarchies, and to deliver a prototype implementation of a runtime system supporting that model. PULSAR tackled the challenge by proposing a programming model based on systolic processing and virtualization. The PULSAR programming model is quite simple, with point-to-point channels as the main communication abstraction. The runtime implementation is very lightweight and fully distributed, and provides multithreading, messagepassing and multi-GPU offload capabilities. Performance evaluation shows good scalability up to one thousand nodes with one thousand GPU accelerators.
Original language | English |
---|---|
Pages (from-to) | 4-26 |
Number of pages | 23 |
Journal | Supercomputing Frontiers and Innovations |
Volume | 4 |
Issue number | 1 |
DOIs | |
State | Published - 2017 |
Externally published | Yes |
Funding
This work has been supported by the National Science Foundation, under grant SHF-1117062, Parallel Unified Linear algebra with Systolic ARrays (PULSAR). The authors would also like to thank the National Institute for Computational Sciences, the Georgia Institute of Technology and the Oak Ridge National Laboratory for generous computer allocations on their supercomputers. Yves Robert has been supported by Institut Universitaire de France. This paper is distributed under the terms of the Creative Commons Attribution-Non Commercial 3.0 License which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is properly cited.
Funders | Funder number |
---|---|
National Institute for Computational Sciences | |
National Science Foundation | SHF-1117062 |
Oak Ridge National Laboratory | |
Georgia Institute of Technology | |
Institut universitaire de France | |
National Science Foundation |
Keywords
- Dataflow scheduling
- Distributed computing
- Hardware accelerators
- Massively parallel computing
- Multicore processors
- Runtime scheduling
- Systolic arrays
- Virtualization