SGDR: Stochastic Gradient Descent with Warm Restarts. Restart techniques are common in gradient-free optimization to deal with multimodal functions. Partial warm restarts are also gaining popularity in gradient-based optimization to improve the rate of convergence in accelerated gradient schemes to deal with ill-conditioned functions. In this paper, we propose a simple warm restart technique for stochastic gradient descent to improve its anytime performance when training deep neural networks. We empirically study its performance on the CIFAR-10 and CIFAR-100 datasets, where we demonstrate new state-of-the-art results at 3.14% and 16.21%, respectively. We also demonstrate its advantages on a dataset of EEG recordings and on a downsampled version of the ImageNet dataset. Our source code is available at

References in zbMATH (referenced in 15 articles )

Showing results 1 to 15 of 15.
Sorted by year (citations)

  1. Bakhtin, Anton; Deng, Yuntian; Gross, Sam; Ott, Myle; Ranzato, Marc’aurelio; Szlam, Arthur: Residual energy-based models for text (2021)
  2. Chatigny, Philippe; Patenaude, Jean-Marc; Wang, Shengrui: Spatiotemporal adaptive neural network for long-term forecasting of financial time series (2021)
  3. Gajek, Sebastian; Schneider, Matti; Böhlke, Thomas: An FE-DMN method for the multiscale analysis of short fiber reinforced plastic components (2021)
  4. Qingzhong Wang, Pengfei Zhang, Haoyi Xiong, Jian Zhao: Face.evoLVe: A High-Performance Face Recognition Library (2021) arXiv
  5. Theresa Eimer, André Biedenkapp, Maximilian Reimer, Steven Adriaensen, Frank Hutter, Marius Lindauer: DACBench: A Benchmark Library for Dynamic Algorithm Configuration (2021) arXiv
  6. Yeo, Kyongmin; Grullon, Dylan E. C.; Sun, Fan-Keng; Boning, Duane S.; Kalagnanam, Jayant R.: Variational inference formulation for a model-free simulation of a dynamical system with unknown parameters by a recurrent neural network (2021)
  7. Banert, Sebastian; Ringh, Axel; Adler, Jonas; Karlsson, Johan; Öktem, Ozan: Data-driven nonsmooth optimization (2020)
  8. Chen, Yiming; Pan, Tianci; He, Cheng; Cheng, Ran: Efficient evolutionary deep neural architecture search (NAS) by noisy network morphism mutation (2020)
  9. Kang, Dongseok; Ahn, Chang Wook: Efficient neural network space with genetic search (2020)
  10. Kihyuk Sohn, David Berthelot, Chun-Liang Li, Zizhao Zhang, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Han Zhang, Colin Raffel: FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence (2020) arXiv
  11. Mohamed, Shakir; Rosca, Mihaela; Figurnov, Michael; Mnih, Andriy: Monte Carlo gradient estimation in machine learning (2020)
  12. Sun, Ruo-Yu: Optimization for deep learning: an overview (2020)
  13. Tan, Hao; He, Cheng; Tang, Dexuan; Cheng, Ran: Efficient evolutionary neural architecture search (NAS) by modular inheritable crossover (2020)
  14. Kaiyang Zhou, Tao Xiang: Torchreid: A Library for Deep Learning Person Re-Identification in Pytorch (2019) arXiv
  15. Dan Hendrycks, Kevin Gimpel: Gaussian Error Linear Units (GELUs) (2016) arXiv