TY - GEN
T1 - On the architectural requirements for efficient execution of graph algorithms
AU - Bader, David A.
AU - Cong, Guojing
AU - Feo, John
PY - 2005
Y1 - 2005
N2 - Combinatorial problems such as those from graph theory pose serious challenges for parallel machines due to non-contiguous, concurrent accesses to global data structures with low degrees of locality. The hierarchical memory systems of symmetric multiprocessor (SMP) clusters optimize for local, contiguous memory accesses, and so are inefficient platforms for such algorithms. Few parallel graph algorithms outperform their best sequential implementation on SMP clusters due to long memory latencies and high synchronization costs. In this paper, we consider the performance and scalability of two graph algorithms, list ranking and connected components, on two classes of sharedmemory computers: symmetric multiprocessors such as the Sun Enterprise servers and multithreaded architectures (MTA) such as the Cray MTA-2. While previous studies have shown that parallel graph algorithms can speedup on SMPs, the systems' reliance on cache microprocessors limits performance. The MTA 's latency tolerant processors and hardware support for fine-grain synchronization makes performance a function of parallelism. Since parallel graph algorithms have an abundance of parallelism, they perform and scale significantly better on the MTA. We describe and give a performance model for each architecture. We analyze the performance of the two algorithms and discuss how the features of each architecture affects algorithm- development, ease of programming, performance, and scalability.
AB - Combinatorial problems such as those from graph theory pose serious challenges for parallel machines due to non-contiguous, concurrent accesses to global data structures with low degrees of locality. The hierarchical memory systems of symmetric multiprocessor (SMP) clusters optimize for local, contiguous memory accesses, and so are inefficient platforms for such algorithms. Few parallel graph algorithms outperform their best sequential implementation on SMP clusters due to long memory latencies and high synchronization costs. In this paper, we consider the performance and scalability of two graph algorithms, list ranking and connected components, on two classes of sharedmemory computers: symmetric multiprocessors such as the Sun Enterprise servers and multithreaded architectures (MTA) such as the Cray MTA-2. While previous studies have shown that parallel graph algorithms can speedup on SMPs, the systems' reliance on cache microprocessors limits performance. The MTA 's latency tolerant processors and hardware support for fine-grain synchronization makes performance a function of parallelism. Since parallel graph algorithms have an abundance of parallelism, they perform and scale significantly better on the MTA. We describe and give a performance model for each architecture. We analyze the performance of the two algorithms and discuss how the features of each architecture affects algorithm- development, ease of programming, performance, and scalability.
KW - Connected Components
KW - Graph Algorithms
KW - List ranking
KW - Multithreading
KW - Shared memory
UR - http://www.scopus.com/inward/record.url?scp=33745125067&partnerID=8YFLogxK
U2 - 10.1109/ICPP.2005.55
DO - 10.1109/ICPP.2005.55
M3 - Conference contribution
AN - SCOPUS:33745125067
SN - 0769523803
SN - 9780769523804
T3 - Proceedings of the International Conference on Parallel Processing
SP - 547
EP - 556
BT - Proceedings - 2005 International Conference on Parallel Processing
T2 - 2005 International Conference on Parallel Processing
Y2 - 14 June 2005 through 17 June 2005
ER -