TY - GEN
T1 - Out-of-core wavefront computations with reduced synchronization
AU - Clauss, Pierre Nicolas
AU - Gustedt, Jens
AU - Suter, Frédéric
PY - 2008
Y1 - 2008
N2 - Matrix computation algorithms often exhibit dependencies between neighboring elements inside loop nests such that the frontier between computed elements and those to be computed wanders inform of a 'wave' through the matrix. Macro-pipelining techniques can achieve an efficient parallelization of such algorithms by overlapping communication and computation. Usually these techniques are limited to situations where all the data to be processed fits into main memory, whereas for larger data the I/O usage pattern for external storage requires special attention. The work [5] presented a first extension of the wavefront framework to these so-called out-of-core problems. The present paper proposes a redesign of their algorithm that minimizes both overhead and perturbations coming from communications. To tackle the issue of non-contiguous I/O, we also propose an optimized data layout. These two major modifications of the original algorithm eventually allow us to present a third improvement as our implementation shortens the transition phase between two consecutive iterations of the wavefront algorithm. Experiments performed with the PARXXL library show that we can significantly reduce the time lost during inefficient I/O operations and thus obtain faster computations.
AB - Matrix computation algorithms often exhibit dependencies between neighboring elements inside loop nests such that the frontier between computed elements and those to be computed wanders inform of a 'wave' through the matrix. Macro-pipelining techniques can achieve an efficient parallelization of such algorithms by overlapping communication and computation. Usually these techniques are limited to situations where all the data to be processed fits into main memory, whereas for larger data the I/O usage pattern for external storage requires special attention. The work [5] presented a first extension of the wavefront framework to these so-called out-of-core problems. The present paper proposes a redesign of their algorithm that minimizes both overhead and perturbations coming from communications. To tackle the issue of non-contiguous I/O, we also propose an optimized data layout. These two major modifications of the original algorithm eventually allow us to present a third improvement as our implementation shortens the transition phase between two consecutive iterations of the wavefront algorithm. Experiments performed with the PARXXL library show that we can significantly reduce the time lost during inefficient I/O operations and thus obtain faster computations.
UR - http://www.scopus.com/inward/record.url?scp=47349124882&partnerID=8YFLogxK
U2 - 10.1109/PDP.2008.30
DO - 10.1109/PDP.2008.30
M3 - Conference contribution
AN - SCOPUS:47349124882
SN - 0769530893
SN - 9780769530895
T3 - Proceedings of the 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing, PDP 2008
SP - 293
EP - 300
BT - Proceedings of the 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing, PDP 2008
T2 - 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing, PDP 2008
Y2 - 13 February 2008 through 15 February 2008
ER -