Abstract
We adopt and analyze a synchronous K-step averaging stochastic gradient descent algorithm which we call K-AVG for solving large scale machine learning problems. We establish the convergence results of K-AVG for nonconvex objectives. Our analysis of K-AVG applies to many existing variants of synchronous SGD. We explain why the K-step delay is necessary and leads to better performance than traditional parallel stochastic gradient descent which is equivalent to K-AVG with K = 1. We also show that K-AVG scales better with the number of learners than asynchronous stochastic gradient descent (ASGD). Another advantage of K-AVG over ASGD is that it allows larger stepsizes and facilitates faster convergence. On a cluster of 128 GPUs, K-AVG is faster than ASGD implementations and achieves better accuracies and faster convergence for training with the CIFAR-10 dataset.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 27th International Joint Conference on Artificial Intelligence, IJCAI 2018 |
| Editors | Jerome Lang |
| Publisher | International Joint Conferences on Artificial Intelligence |
| Pages | 3219-3227 |
| Number of pages | 9 |
| ISBN (Electronic) | 9780999241127 |
| DOIs | |
| State | Published - 2018 |
| Externally published | Yes |
| Event | 27th International Joint Conference on Artificial Intelligence, IJCAI 2018 - Stockholm, Sweden Duration: Jul 13 2018 → Jul 19 2018 |
Publication series
| Name | IJCAI International Joint Conference on Artificial Intelligence |
|---|---|
| Volume | 2018-July |
| ISSN (Print) | 1045-0823 |
Conference
| Conference | 27th International Joint Conference on Artificial Intelligence, IJCAI 2018 |
|---|---|
| Country/Territory | Sweden |
| City | Stockholm |
| Period | 07/13/18 → 07/19/18 |
Funding
∗Supported in part by NSF Grants DMS-1509739 and CCF-1523768