TY - GEN
T1 - Cholesky Factorization on Heterogeneous CPU and GPU Systems
AU - Chen, Jieyang
AU - Chen, Zizhong
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/10/30
Y1 - 2015/10/30
N2 - General-purpose graphics processing units (GPGPUs) could bring huge performance improvements in scientific and numerical fields. We presented two approaches utilizing hybrid CPU/GPU system in Cholesky factorization. First, we analyzed the implementation of Cholesky factorization in MAGMA and identified the bottleneck of the current implementation, which is the use of fixed block size without considering any factors in the computing environment. So, we designed an algorithm, which could determine the optimal block size of Cholesky factorization based on multiple factors (input matrix size, CPU/GPU performance, and CPU/GPU bandwidth, etc.). Then, we presented a new improvement on MAGMA's implementation utilize the algorithm. Test results showed that our approach is more efficient than MAGMA's fixed block size implementation under some circumstance. After combining our implementation with MAGMA's implementation, the new hybrid implementation could outperform the current MAGMA implementation. Second, we identified that all the implementations of Cholesky factorization, to our best knowledge, that utilized the GPU do not fully utilized the multicore CPU. So, after studied other researchers approaches, we designed a new algorithm that could utilize multicore CPU and GPU simultaneously in Cholesky factorization. Our approach could keep the block size and workload distribution between CPU and GPU dynamically. Testing results showed the optimal data distribution ratio for our current implementation.
AB - General-purpose graphics processing units (GPGPUs) could bring huge performance improvements in scientific and numerical fields. We presented two approaches utilizing hybrid CPU/GPU system in Cholesky factorization. First, we analyzed the implementation of Cholesky factorization in MAGMA and identified the bottleneck of the current implementation, which is the use of fixed block size without considering any factors in the computing environment. So, we designed an algorithm, which could determine the optimal block size of Cholesky factorization based on multiple factors (input matrix size, CPU/GPU performance, and CPU/GPU bandwidth, etc.). Then, we presented a new improvement on MAGMA's implementation utilize the algorithm. Test results showed that our approach is more efficient than MAGMA's fixed block size implementation under some circumstance. After combining our implementation with MAGMA's implementation, the new hybrid implementation could outperform the current MAGMA implementation. Second, we identified that all the implementations of Cholesky factorization, to our best knowledge, that utilized the GPU do not fully utilized the multicore CPU. So, after studied other researchers approaches, we designed a new algorithm that could utilize multicore CPU and GPU simultaneously in Cholesky factorization. Our approach could keep the block size and workload distribution between CPU and GPU dynamically. Testing results showed the optimal data distribution ratio for our current implementation.
KW - Cholesky Factorization
KW - GPU
KW - Multicore
KW - Numerical algorithm
KW - Parallel algorithm
UR - http://www.scopus.com/inward/record.url?scp=84961753583&partnerID=8YFLogxK
U2 - 10.1109/FCST.2015.58
DO - 10.1109/FCST.2015.58
M3 - Conference contribution
AN - SCOPUS:84961753583
T3 - Proceedings - 2015 9th International Conference on Frontier of Computer Science and Technology, FCST 2015
SP - 19
EP - 26
BT - Proceedings - 2015 9th International Conference on Frontier of Computer Science and Technology, FCST 2015
A2 - Jia, Xiaohua
A2 - Zhang, Yong
A2 - Dillion, Tharam
A2 - Kato, Nei
A2 - Zhang, Yunquan
A2 - Li, Kuan Ching
A2 - Wu, Kui
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 9th International Conference on Frontier of Computer Science and Technology, FCST 2015
Y2 - 26 August 2015 through 28 August 2015
ER -