TY - GEN
T1 - Mixed-tool performance analysis on hybrid multicore architectures
AU - Du, Peng
AU - Luszczek, Piotr
AU - Tomov, Stanimire
AU - Dongarra, Jack
PY - 2010
Y1 - 2010
N2 - This paper proposes a triangular solve algorithm with variable block size for graphics processing unit (GPU). By using diagonal blocks inversion with recursion, this algorithm works with tunable block size to achieve the best performance. Various methods are shown on how to make use of existing profiling tools to successfully measure and analyze performance of this algorithm.We use some of the most popular CPU and GPU profiling tools for their advantages and overcome their disadvantages with several new techniques to analyze the performance and relationship of different components of applications. With the presented methodologies, insight information is produced which helps to understand and tune the proposed algorithm and considerably improve the performance of the solver itself as well as the application using it.
AB - This paper proposes a triangular solve algorithm with variable block size for graphics processing unit (GPU). By using diagonal blocks inversion with recursion, this algorithm works with tunable block size to achieve the best performance. Various methods are shown on how to make use of existing profiling tools to successfully measure and analyze performance of this algorithm.We use some of the most popular CPU and GPU profiling tools for their advantages and overcome their disadvantages with several new techniques to analyze the performance and relationship of different components of applications. With the presented methodologies, insight information is produced which helps to understand and tune the proposed algorithm and considerably improve the performance of the solver itself as well as the application using it.
UR - http://www.scopus.com/inward/record.url?scp=78649877182&partnerID=8YFLogxK
U2 - 10.1109/ICPPW.2010.41
DO - 10.1109/ICPPW.2010.41
M3 - Conference contribution
AN - SCOPUS:78649877182
SN - 9780769541570
T3 - Proceedings of the International Conference on Parallel Processing Workshops
SP - 236
EP - 244
BT - Proceedings - 2010 39th International Conference on Parallel Processing Workshops, ICPPW 2010
T2 - 2010 39th International Conference on Parallel Processing Workshops, ICPPW 2010
Y2 - 13 September 2010 through 16 September 2010
ER -