NUMERICAL ANALYSIS FOR CONVERGENCE OF A SAMPLE-WISE BACKPROPAGATION METHOD FOR TRAINING STOCHASTIC NEURAL NETWORKS

Richard Archibald, Feng Bao, Yanzhao Cao, Hui Sun

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

The aim of this paper is to carry out convergence analysis and algorithm implementation of a novel sample-wise backpropagation method for training a class of stochastic neural networks (SNNs). The preliminary discussion on such an SNN framework was first introduced in [Archibald et al., Discrete Contin. Dyn. Syst. Ser. S, 15 (2022), pp. 2807-2835]. The structure of the SNN is formulated as a discretization of a stochastic differential equation (SDE). A stochastic optimal control framework is introduced to model the training procedure, and a sample-wise approximation scheme for the adjoint backward SDE is applied to improve the efficiency of the stochastic optimal control solver, which is equivalent to the backpropagation for training the SNN. The convergence analysis is derived by introducing a novel joint conditional expectation for the gradient process. Under the convexity assumption, our result indicates that the number of SNN training steps should be proportional to the square of the number of layers in the convex optimization case. In the implementation of the sample-based SNN algorithm with the benchmark MNIST dataset, we adopt the convolution neural network (CNN) architecture and demonstrate that our sample-based SNN algorithm is more robust than the conventional CNN.

Original languageEnglish
Pages (from-to)593-621
Number of pages29
JournalSIAM Journal on Numerical Analysis
Volume62
Issue number2
DOIs
StatePublished - 2024

Keywords

  • backward stochastic differential equations
  • convergence analysis
  • probabilistic learning
  • stochastic gradient descent
  • stochastic neural networks

Fingerprint

Dive into the research topics of 'NUMERICAL ANALYSIS FOR CONVERGENCE OF A SAMPLE-WISE BACKPROPAGATION METHOD FOR TRAINING STOCHASTIC NEURAL NETWORKS'. Together they form a unique fingerprint.

Cite this