TY - GEN
T1 - An Empirical Analysis of Parallel Random Permutation Algorithms on SMPs
AU - Cong, Guojing
AU - Bader, David A.
N1 - Publisher Copyright:
© 2005 18th ISCA International Conference on Parallel and Distributed Computing Systems 2005, PDCS 2005. All rights reserved.
PY - 2005
Y1 - 2005
N2 - We compare parallel algorithms for random permutation generation on symmetric multiprocessors (SMPs). Algorithms considered are the sorting-based algorithm, Anderson’s shuffling algorithm, the dart-throwing algorithm, and Sanders’ algorithm. We investigate the impact of synchronization method, memory access pattern, cost of generating random numbers and other parameters on the performance of the algorithms. Within the range of inputs used and processors employed, Anderson’s algorithm is preferable due to its simplicity when random number generation is relatively costly, while Sanders’ algorithm has superior performance due to good cache performance when a fast random number generator is available. There is no definite winner across all settings. In fact we predict our new dart-throwing algorithm performs best when synchronization among processors becomes costly and memory access is relatively fast. We also compare the performance of our parallel implementations with the sequential implementation. It is unclear without extensive experimental studies whether fast parallel algorithms beat efficient sequential algorithms due to mismatch between model and architecture. Our implementations achieve speedups up to 6 with 12 processors on the Sun E4500.
AB - We compare parallel algorithms for random permutation generation on symmetric multiprocessors (SMPs). Algorithms considered are the sorting-based algorithm, Anderson’s shuffling algorithm, the dart-throwing algorithm, and Sanders’ algorithm. We investigate the impact of synchronization method, memory access pattern, cost of generating random numbers and other parameters on the performance of the algorithms. Within the range of inputs used and processors employed, Anderson’s algorithm is preferable due to its simplicity when random number generation is relatively costly, while Sanders’ algorithm has superior performance due to good cache performance when a fast random number generator is available. There is no definite winner across all settings. In fact we predict our new dart-throwing algorithm performs best when synchronization among processors becomes costly and memory access is relatively fast. We also compare the performance of our parallel implementations with the sequential implementation. It is unclear without extensive experimental studies whether fast parallel algorithms beat efficient sequential algorithms due to mismatch between model and architecture. Our implementations achieve speedups up to 6 with 12 processors on the Sun E4500.
KW - High-Performance Algorithm Engineering
KW - Parallel Algorithms
KW - Random Permutation
KW - Shared Memory
UR - http://www.scopus.com/inward/record.url?scp=80053042068&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:80053042068
T3 - 18th ISCA International Conference on Parallel and Distributed Computing Systems 2005, PDCS 2005
SP - 27
EP - 34
BT - 18th ISCA International Conference on Parallel and Distributed Computing Systems 2005, PDCS 2005
PB - International Society for Computers and Their Applications (ISCA)
T2 - 18th International Conference on Parallel and Distributed Computing Systems, PDCS 2005
Y2 - 12 September 2005 through 14 September 2005
ER -