Abstract
This paper presents an investigation of the performance of different multigroup Monte Carlo transport algorithms on GPUs with a discussion of both history-based and event-based approaches. Several algorithmic improvements are introduced for both approaches. By modifying the history-based algorithm that is traditionally favored in CPU-based MC codes to occasionally filter out dead particles to reduce thread divergence, performance exceeds that of either the pure history-based or event-based approaches. The impacts of several algorithmic choices are discussed, including performance studies on Kepler and Pascal generation NVIDIA GPUs for fixed source and eigenvalue calculations. Single-device performance equivalent to 20–40 CPU cores on the K40 GPU and 60–80 CPU cores on the P100 GPU is achieved. In addition, nearly perfect multi-device parallel weak scaling is demonstrated on more than 16,000 nodes of the Titan supercomputer.
Original language | English |
---|---|
Pages (from-to) | 506-518 |
Number of pages | 13 |
Journal | Annals of Nuclear Energy |
Volume | 113 |
DOIs | |
State | Published - Mar 2018 |
Bibliographical note
Publisher Copyright:© 2017 Elsevier Ltd
Keywords
- GPU
- Monte Carlo
- Radiation transport