TY - GEN
T1 - Python for development of OpenMP and CUDA kernels for multidimensional data
AU - Vacaliuc, Bogdan
AU - Patlolla, Dilip R.
AU - D'Azevedo, Ed
AU - Davidson, Greg G.
AU - Munro, John K.
AU - Evans, Thomas M.
AU - Joubert, Wayne
AU - Bell, Zane W.
PY - 2011
Y1 - 2011
N2 - Design of data structures for high performance computing (HPC) is one of the principal challenges facing researchers looking to utilize heterogeneous computing machinery. Heterogeneous systems derive cost, power, and speed efficiency by being composed of the appropriate hardware for the task. Yet, each type of processor requires a specific organization of the application state in order to achieve peak performance. Discovering this and refactoring the code can be a challenging and time-consuming task for the researcher, as the data structures and the computational model must be co-designed. We present a methodology that uses Python as the environment for which to explore tradeoffs in both the data structure design as well as the code executing on the computation accelerator. Our method enables multidimensional arrays to be used effectively in any target environment. We have chosen to focus on OpenMP and CUDA environments, thus exploring the development of optimized kernels for the two most common classes of computing hardware available today: multi-core CPU and GPU. Python's large palette of file and network access routines, its associative indexing syntax and support for common HPC environments makes it relevant for diverse hardware ranging from laptops through computing clusters to the highest performance supercomputers. Our work enables researchers to accelerate the development of their codes on the computing hardware of their choice.
AB - Design of data structures for high performance computing (HPC) is one of the principal challenges facing researchers looking to utilize heterogeneous computing machinery. Heterogeneous systems derive cost, power, and speed efficiency by being composed of the appropriate hardware for the task. Yet, each type of processor requires a specific organization of the application state in order to achieve peak performance. Discovering this and refactoring the code can be a challenging and time-consuming task for the researcher, as the data structures and the computational model must be co-designed. We present a methodology that uses Python as the environment for which to explore tradeoffs in both the data structure design as well as the code executing on the computation accelerator. Our method enables multidimensional arrays to be used effectively in any target environment. We have chosen to focus on OpenMP and CUDA environments, thus exploring the development of optimized kernels for the two most common classes of computing hardware available today: multi-core CPU and GPU. Python's large palette of file and network access routines, its associative indexing syntax and support for common HPC environments makes it relevant for diverse hardware ranging from laptops through computing clusters to the highest performance supercomputers. Our work enables researchers to accelerate the development of their codes on the computing hardware of their choice.
UR - http://www.scopus.com/inward/record.url?scp=80055024300&partnerID=8YFLogxK
U2 - 10.1109/SAAHPC.2011.26
DO - 10.1109/SAAHPC.2011.26
M3 - Conference contribution
AN - SCOPUS:80055024300
SN - 9780769544489
T3 - Proceedings - 2011 Symposium on Application Accelerators in High-Performance Computing, SAAHPC 2011
SP - 159
EP - 167
BT - Proceedings - 2011 Symposium on Application Accelerators in High-Performance Computing, SAAHPC 2011
T2 - 2011 Symposium on Application Accelerators in High-Performance Computing, SAAHPC 2011
Y2 - 19 July 2011 through 20 July 2011
ER -