TY - JOUR
T1 - Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures
AU - Haidar, Azzam
AU - Ltaief, Hatem
AU - Yarkhan, Asim
AU - Dongarra, Jack
PY - 2012/3/10
Y1 - 2012/3/10
N2 - The objective of this paper is to analyze the dynamic scheduling of dense linear algebra algorithms on shared-memory, multicore architectures. Current numerical libraries (e.g., linear algebra package) show clear limitations on such emerging systems mainly because of their coarse granularity tasks. Thus, many numerical algorithms need to be redesigned to better fit the architectural design of the multicore platform. The parallel linear algebra for scalable multicore architectures library developed at the University of Tennessee tackles this challenge by using tile algorithms to achieve a finer task granularity. These tile algorithms can then be represented by directed acyclic graphs, where nodes are the tasks and edges are the dependencies between the tasks. The paramount key to achieve high performance is to implement a runtime environment to efficiently schedule the execution of the directed acyclic graph across the multicore platform. This paper studies the impact on the overall performance of some parameters, both at the level of the scheduler (e.g., window size and locality) and the algorithms (e.g., left-looking and right-looking variants). We conclude that some commonly accepted rules for dense linear algebra algorithms may need to be revisited.
AB - The objective of this paper is to analyze the dynamic scheduling of dense linear algebra algorithms on shared-memory, multicore architectures. Current numerical libraries (e.g., linear algebra package) show clear limitations on such emerging systems mainly because of their coarse granularity tasks. Thus, many numerical algorithms need to be redesigned to better fit the architectural design of the multicore platform. The parallel linear algebra for scalable multicore architectures library developed at the University of Tennessee tackles this challenge by using tile algorithms to achieve a finer task granularity. These tile algorithms can then be represented by directed acyclic graphs, where nodes are the tasks and edges are the dependencies between the tasks. The paramount key to achieve high performance is to implement a runtime environment to efficiently schedule the execution of the directed acyclic graph across the multicore platform. This paper studies the impact on the overall performance of some parameters, both at the level of the scheduler (e.g., window size and locality) and the algorithms (e.g., left-looking and right-looking variants). We conclude that some commonly accepted rules for dense linear algebra algorithms may need to be revisited.
KW - dense linear algebra
KW - dynamic scheduling
KW - looking variants
KW - multicore architectures
UR - http://www.scopus.com/inward/record.url?scp=84860412769&partnerID=8YFLogxK
U2 - 10.1002/cpe.1829
DO - 10.1002/cpe.1829
M3 - Article
AN - SCOPUS:84860412769
SN - 1532-0626
VL - 24
SP - 305
EP - 321
JO - Concurrency and Computation: Practice and Experience
JF - Concurrency and Computation: Practice and Experience
IS - 3
ER -