Abstract
The aim of this paper is to carry out convergence analysis and algorithm implementation of a novel sample-wise backpropagation method for training a class of stochastic neural networks (SNNs). The preliminary discussion on such an SNN framework was first introduced in [Archibald et al., Discrete Contin. Dyn. Syst. Ser. S, 15 (2022), pp. 2807-2835]. The structure of the SNN is formulated as a discretization of a stochastic differential equation (SDE). A stochastic optimal control framework is introduced to model the training procedure, and a sample-wise approximation scheme for the adjoint backward SDE is applied to improve the efficiency of the stochastic optimal control solver, which is equivalent to the backpropagation for training the SNN. The convergence analysis is derived by introducing a novel joint conditional expectation for the gradient process. Under the convexity assumption, our result indicates that the number of SNN training steps should be proportional to the square of the number of layers in the convex optimization case. In the implementation of the sample-based SNN algorithm with the benchmark MNIST dataset, we adopt the convolution neural network (CNN) architecture and demonstrate that our sample-based SNN algorithm is more robust than the conventional CNN.
Original language | English |
---|---|
Pages (from-to) | 593-621 |
Number of pages | 29 |
Journal | SIAM Journal on Numerical Analysis |
Volume | 62 |
Issue number | 2 |
DOIs | |
State | Published - 2024 |
Funding
\\ast Received by the editors September 21, 2022; accepted for publication (in revised form) October 18, 2023; published electronically March 1, 2024. https://doi.org/10.1137/22M1523765 Funding: The first author's research was partially supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Scientific Discovery through Advanced Computing (SciDAC) program through the FASTMath Institute under contract DE-AC02-05CH11231. The second author's research was partially supported by the U.S. National Science Foundation through project DMS-2142672 and support from the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under grant DE-SC0022297. The third author's research was partially supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Applied Mathematics program under grant DE-SC0022253.
Keywords
- backward stochastic differential equations
- convergence analysis
- probabilistic learning
- stochastic gradient descent
- stochastic neural networks