TY - JOUR
T1 - A methodology for developing high fidelity communication models for large-scale applications targeted on multicore systems
AU - Lively, Charles W.
AU - Taylor, Valerie E.
AU - Alam, Sadaf R.
AU - Vetter, Jeffrey S.
PY - 2008
Y1 - 2008
N2 - Resource sharing and implementation of software stack for emerging multicore processors introduce performance and scaling challenges for large-scale scientific applications, particularly on systems with thousands of processing elements. Traditional performance optimization, tuning and modeling techniques that rely on uniform representation of computation and communication requirements are only partially useful due to the complexity of applications and underlying systems and software architecture. In this paper, we propose a workload modeling methodology that allows application developers to capture and represent hierarchical decomposition and distribution of their applications thereby allowing them to explore and identify optimal mapping of a workload on a target system. We demonstrate the proposed methodology on a Teraflops-scale fusion application that is developed using message-passing (MPI) programming paradigm. Using our analysis and projection results, we obtain insight into the performance characteristics of the application on a quad-core system and also identify optimal mapping on a Teraflops-scale platform.
AB - Resource sharing and implementation of software stack for emerging multicore processors introduce performance and scaling challenges for large-scale scientific applications, particularly on systems with thousands of processing elements. Traditional performance optimization, tuning and modeling techniques that rely on uniform representation of computation and communication requirements are only partially useful due to the complexity of applications and underlying systems and software architecture. In this paper, we propose a workload modeling methodology that allows application developers to capture and represent hierarchical decomposition and distribution of their applications thereby allowing them to explore and identify optimal mapping of a workload on a target system. We demonstrate the proposed methodology on a Teraflops-scale fusion application that is developed using message-passing (MPI) programming paradigm. Using our analysis and projection results, we obtain insight into the performance characteristics of the application on a quad-core system and also identify optimal mapping on a Teraflops-scale platform.
UR - http://www.scopus.com/inward/record.url?scp=58049165089&partnerID=8YFLogxK
U2 - 10.1109/SBAC-PAD.2008.27
DO - 10.1109/SBAC-PAD.2008.27
M3 - Conference article
AN - SCOPUS:58049165089
SN - 1550-6533
SP - 55
EP - 62
JO - Proceedings - Symposium on Computer Architecture and High Performance Computing
JF - Proceedings - Symposium on Computer Architecture and High Performance Computing
M1 - 4685728
T2 - 20th International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2008
Y2 - 29 October 2008 through 1 November 2008
ER -