Video action recognition with an additional end-to-end trained temporal stream

Guojing Cong, Giacomo Domeniconi, Chih Chieh Yang, Joshua Shapiro, Barry Chen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

Detecting actions in videos requires understanding the temporal relationships among frames. Typical action recognition approaches rely on optical flow estimation methods to convey temporal information to a CNN. Recent studies employ 3D convolutions in addition to optical flow to process the temporal information. While these models achieve slightly better results than two-stream 2D convolutional approaches, they are significantly more complex, requiring more data and time to be trained. We propose an efficient, adaptive batch size distributed training algorithm with customized optimizations for training the two 2D streams. We introduce a new 2D convolutional temporal stream that is trained end-to-end with a neural network. The flexibility to freeze some network layers from training in this temporal stream brings the possibility of ensemble learning with more than one temporal streams. Our architecture that combines three streams achieves the highest accuracies as we know of on UCF101 and HMDB51 by systems that do not pretrain on much larger datasets (e.g., Kinetics). We achieve these results while keeping our spatial and temporal streams 4.67x faster to train than the 3D convolution approaches.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages51-60
Number of pages10
ISBN (Electronic)9781728119755
DOIs
StatePublished - Mar 4 2019
Externally publishedYes
Event19th IEEE Winter Conference on Applications of Computer Vision, WACV 2019 - Waikoloa Village, United States
Duration: Jan 7 2019Jan 11 2019

Publication series

NameProceedings - 2019 IEEE Winter Conference on Applications of Computer Vision, WACV 2019

Conference

Conference19th IEEE Winter Conference on Applications of Computer Vision, WACV 2019
Country/TerritoryUnited States
CityWaikoloa Village
Period01/7/1901/11/19

Funding

Part of this work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344 and was supported by the LLNL-LDRD Program under Project No. 17-SI-003.

FundersFunder number
Lawrence Livermore National Laboratory
U.S. Department of Energy
Lawrence Livermore National LaboratoryDE-AC52-07NA27344, 17-SI-003

    Fingerprint

    Dive into the research topics of 'Video action recognition with an additional end-to-end trained temporal stream'. Together they form a unique fingerprint.

    Cite this