TY - GEN
T1 - Efficient parallelization of batch pattern training algorithm on many-core and cluster architectures
AU - Turchenko, Volodymyr
AU - Bosilca, George
AU - Bouteiller, Aurelien
AU - Dongarra, Jack
PY - 2013
Y1 - 2013
N2 - The experimental research of the parallel batch pattern back propagation training algorithm on the example of recirculation neural network on many-core high performance computing systems is presented in this paper. The choice of recirculation neural network among the multilayer perceptron, recurrent and radial basis neural networks is proved. The model of a recirculation neural network and usual sequential batch pattern algorithm of its training are theoretically described. An algorithmic description of the parallel version of the batch pattern training method is presented. The experimental research is fulfilled using the Open MPI, Mvapich and Intel MPI message passing libraries. The results obtained on many-core AMD system and Intel MIC are compared with the results obtained on a cluster system. Our results show that the parallelization efficiency is about 95% on 12 cores located inside one physical AMD processor for the considered minimum and maximum scenarios. The parallelization efficiency is about 70-75% on 48 AMD cores for the minimum and maximum scenarios. These results are higher by 15-36% (depending on the version of MPI library) in comparison with the results obtained on 48 cores of a cluster system. The parallelization efficiency obtained on Intel MIC architecture is surprisingly low, asking for deeper analysis.
AB - The experimental research of the parallel batch pattern back propagation training algorithm on the example of recirculation neural network on many-core high performance computing systems is presented in this paper. The choice of recirculation neural network among the multilayer perceptron, recurrent and radial basis neural networks is proved. The model of a recirculation neural network and usual sequential batch pattern algorithm of its training are theoretically described. An algorithmic description of the parallel version of the batch pattern training method is presented. The experimental research is fulfilled using the Open MPI, Mvapich and Intel MPI message passing libraries. The results obtained on many-core AMD system and Intel MIC are compared with the results obtained on a cluster system. Our results show that the parallelization efficiency is about 95% on 12 cores located inside one physical AMD processor for the considered minimum and maximum scenarios. The parallelization efficiency is about 70-75% on 48 AMD cores for the minimum and maximum scenarios. These results are higher by 15-36% (depending on the version of MPI library) in comparison with the results obtained on 48 cores of a cluster system. The parallelization efficiency obtained on Intel MIC architecture is surprisingly low, asking for deeper analysis.
KW - many-core system
KW - parallel batch pattern training
KW - parallelization efficiency
KW - recirculation neural network
UR - http://www.scopus.com/inward/record.url?scp=84892656652&partnerID=8YFLogxK
U2 - 10.1109/IDAACS.2013.6663014
DO - 10.1109/IDAACS.2013.6663014
M3 - Conference contribution
AN - SCOPUS:84892656652
SN - 9781479914265
T3 - Proceedings of the 2013 IEEE 7th International Conference on Intelligent Data Acquisition and Advanced Computing Systems, IDAACS 2013
SP - 692
EP - 698
BT - Proceedings of the 2013 IEEE 7th International Conference on Intelligent Data Acquisition and Advanced Computing Systems, IDAACS 2013
T2 - 2013 IEEE 7th International Conference on Intelligent Data Acquisition and Advanced Computing Systems, IDAACS 2013
Y2 - 12 September 2013 through 14 September 2013
ER -