TY - JOUR
T1 - NUMERICAL ANALYSIS FOR CONVERGENCE OF A SAMPLE-WISE BACKPROPAGATION METHOD FOR TRAINING STOCHASTIC NEURAL NETWORKS
AU - Archibald, Richard
AU - Bao, Feng
AU - Cao, Yanzhao
AU - Sun, Hui
N1 - Publisher Copyright:
© 2024 Society for Industrial and Applied Mathematics Publications. All rights reserved.
PY - 2024
Y1 - 2024
N2 - The aim of this paper is to carry out convergence analysis and algorithm implementation of a novel sample-wise backpropagation method for training a class of stochastic neural networks (SNNs). The preliminary discussion on such an SNN framework was first introduced in [Archibald et al., Discrete Contin. Dyn. Syst. Ser. S, 15 (2022), pp. 2807-2835]. The structure of the SNN is formulated as a discretization of a stochastic differential equation (SDE). A stochastic optimal control framework is introduced to model the training procedure, and a sample-wise approximation scheme for the adjoint backward SDE is applied to improve the efficiency of the stochastic optimal control solver, which is equivalent to the backpropagation for training the SNN. The convergence analysis is derived by introducing a novel joint conditional expectation for the gradient process. Under the convexity assumption, our result indicates that the number of SNN training steps should be proportional to the square of the number of layers in the convex optimization case. In the implementation of the sample-based SNN algorithm with the benchmark MNIST dataset, we adopt the convolution neural network (CNN) architecture and demonstrate that our sample-based SNN algorithm is more robust than the conventional CNN.
AB - The aim of this paper is to carry out convergence analysis and algorithm implementation of a novel sample-wise backpropagation method for training a class of stochastic neural networks (SNNs). The preliminary discussion on such an SNN framework was first introduced in [Archibald et al., Discrete Contin. Dyn. Syst. Ser. S, 15 (2022), pp. 2807-2835]. The structure of the SNN is formulated as a discretization of a stochastic differential equation (SDE). A stochastic optimal control framework is introduced to model the training procedure, and a sample-wise approximation scheme for the adjoint backward SDE is applied to improve the efficiency of the stochastic optimal control solver, which is equivalent to the backpropagation for training the SNN. The convergence analysis is derived by introducing a novel joint conditional expectation for the gradient process. Under the convexity assumption, our result indicates that the number of SNN training steps should be proportional to the square of the number of layers in the convex optimization case. In the implementation of the sample-based SNN algorithm with the benchmark MNIST dataset, we adopt the convolution neural network (CNN) architecture and demonstrate that our sample-based SNN algorithm is more robust than the conventional CNN.
KW - backward stochastic differential equations
KW - convergence analysis
KW - probabilistic learning
KW - stochastic gradient descent
KW - stochastic neural networks
UR - http://www.scopus.com/inward/record.url?scp=85161345917&partnerID=8YFLogxK
U2 - 10.1137/22m1523765
DO - 10.1137/22m1523765
M3 - Article
AN - SCOPUS:85161345917
SN - 0036-1429
VL - 62
SP - 593
EP - 621
JO - SIAM Journal on Numerical Analysis
JF - SIAM Journal on Numerical Analysis
IS - 2
ER -