Towards Noise-adaptive, Problem-adaptive Stochastic Gradient Descent - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2021

Towards Noise-adaptive, Problem-adaptive Stochastic Gradient Descent

Résumé

We design step-size schemes that make stochastic gradient descent (SGD) adaptive to (i) the noise σ 2 in the stochastic gradients and (ii) problem-dependent constants. When minimizing smooth, stronglyconvex functions with condition number κ, we first prove that T iterations of SGD with Nesterov acceleration and exponentially decreasing step-sizes can achieve a near-optimalÕ exp (−T / √ κ) + σ 2 /T convergence rate. Under a relaxed assumption on the noise, with the same step-size scheme and knowledge of the smoothness, we prove that SGD can achieve anÕ exp (−T /κ) + σ 2 /T rate. In order to be adaptive to the smoothness, we use a stochastic line-search (SLS) and show (via upper and lower-bounds) that SGD converges at the desired rate, but only to a neighbourhood of the solution. Next, we use SGD with an offline estimate of the smoothness, and prove convergence to the minimizer. However, its convergence is slowed down proportional to the estimation error and we prove a lower-bound justifying this slowdown. Compared to other step-size schemes, we empirically demonstrate the effectiveness of exponential step-sizes coupled with a novel variant of SLS.
Fichier principal
Vignette du fichier
Parameter_free.pdf (664.28 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03456663 , version 1 (30-11-2021)

Identifiants

  • HAL Id : hal-03456663 , version 1

Citer

Sharan Vaswani, Benjamin Dubois-Taine, Reza Babanezhad. Towards Noise-adaptive, Problem-adaptive Stochastic Gradient Descent. 2021. ⟨hal-03456663⟩
64 Consultations
39 Téléchargements

Partager

Gmail Facebook X LinkedIn More