Revisiting the problem of learning long-term dependencies in recurrent neural networks

Liam Johnston, Vivak Patel, Yumian Cui, Prasanna Balaprakash

Research output: Contribution to journalArticlepeer-review

Abstract

Recurrent neural networks (RNNs) are an important class of models for learning sequential behavior. However, training RNNs to learn long-term dependencies is a tremendously difficult task, and this difficulty is widely attributed to the vanishing and exploding gradient (VEG) problem. Since it was first characterized 30 years ago, the belief that if VEG occurs during optimization then RNNs learn long-term dependencies poorly has become a central tenet in the RNN literature and has been steadily cited as motivation for a wide variety of research advancements. In this work, we revisit and interrogate this belief using a large factorial experiment where more than 40,000 RNNs were trained, and provide evidence contradicting this belief. Motivated by these findings, we re-examine the original discussion that analyzed latching behavior in RNNs by way of hyperbolic attractors, and ultimately demonstrate that these dynamics do not fully capture the learned characteristics of RNNs. Our findings suggest that these models are fully capable of learning dynamics that do not correspond to hyperbolic attractors, and that the choice of hyper-parameters, namely learning rate, has a substantial impact on the likelihood of whether an RNN will be able to learn long-term dependencies.

Original languageEnglish
Article number106887
JournalNeural Networks
Volume183
DOIs
StatePublished - Mar 2025

Funding

Liam Johnston, Yumian Cui and Vivak Patel are supported by the Wisconsin Alumni Research Foundation. Prasanna Balaprakash is supported by the Advanced Scientific Computing Research; U.S.Department of Energy.

Keywords

  • Recurrent neural network
  • Vanishing and exploding gradient problem

Fingerprint

Dive into the research topics of 'Revisiting the problem of learning long-term dependencies in recurrent neural networks'. Together they form a unique fingerprint.

Cite this