TY - GEN
T1 - On-line automated performance diagnosis on thousands of processes
AU - Roth, Philip C.
AU - Miller, Barton P.
PY - 2006
Y1 - 2006
N2 - Performance analysis tools are critical for the effective use of large parallel computing resources, but existing tools have failed to address three problems that limit their scalability: (1) management and processing of the volume of performance data generated when monitoring a large number of application processes, (2) communication between a large number of tool components, and (3) presentation of performance data and analysis results for applications with a large number of processes. In this paper, we present a novel approach for finding performance problems in applications with a large number of processes that leverages our multicast and data aggregation infrastructure to address these three performance tool scalability barriers. First, we show how to design a scalable, distributed performance diagnosis facility. We demonstrate this design with an on-line, automated strategy for finding performance bottlenecks. Our strategy uses distributed, independent bottleneck search agents located in the tool agent processes that monitor running application processes. Second, we present a technique for constructing compact displays of the results of our bottleneck detection strategy. This technique, called the Sub-Graph Folding Algorithm, presents bottleneck search results using dynamic graphs that record the refinement of a bottleneck search. The complexity of the results graph is controlled by combining sub-graphs showing similar local application behavior into a composite sub-graph. Using an approach that combines these two synergistic parts, we performed bottleneck searches on programs with up to 1024 processes with no sign of tool resource saturation. With 1024 application processes, our visualization technique reduced a search results graph containing over 30,000 nodes to a single composite 44-node graph sub-graph showing the same qualitative performance information as the original graph.
AB - Performance analysis tools are critical for the effective use of large parallel computing resources, but existing tools have failed to address three problems that limit their scalability: (1) management and processing of the volume of performance data generated when monitoring a large number of application processes, (2) communication between a large number of tool components, and (3) presentation of performance data and analysis results for applications with a large number of processes. In this paper, we present a novel approach for finding performance problems in applications with a large number of processes that leverages our multicast and data aggregation infrastructure to address these three performance tool scalability barriers. First, we show how to design a scalable, distributed performance diagnosis facility. We demonstrate this design with an on-line, automated strategy for finding performance bottlenecks. Our strategy uses distributed, independent bottleneck search agents located in the tool agent processes that monitor running application processes. Second, we present a technique for constructing compact displays of the results of our bottleneck detection strategy. This technique, called the Sub-Graph Folding Algorithm, presents bottleneck search results using dynamic graphs that record the refinement of a bottleneck search. The complexity of the results graph is controlled by combining sub-graphs showing similar local application behavior into a composite sub-graph. Using an approach that combines these two synergistic parts, we performed bottleneck searches on programs with up to 1024 processes with no sign of tool resource saturation. With 1024 application processes, our visualization technique reduced a search results graph containing over 30,000 nodes to a single composite 44-node graph sub-graph showing the same qualitative performance information as the original graph.
KW - Automation
KW - Paradyn
KW - Performance diagnosis
KW - Scalability
KW - Tools
UR - http://www.scopus.com/inward/record.url?scp=33750231377&partnerID=8YFLogxK
U2 - 10.1145/1122971.1122984
DO - 10.1145/1122971.1122984
M3 - Conference contribution
AN - SCOPUS:33750231377
SN - 1595931899
SN - 9781595931894
T3 - Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP
SP - 69
EP - 80
BT - Proceedings of the 2006 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP'06
PB - Association for Computing Machinery (ACM)
T2 - 2006 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP'06
Y2 - 29 March 2006 through 31 March 2006
ER -