Adam: A Method for Stochastic Optimization. We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

References in zbMATH (referenced in 48 articles )

Showing results 1 to 20 of 48.
Sorted by year (citations)

1 2 3 next

  1. Li, Lingge; Holbrook, Andrew; Shahbaba, Babak; Baldi, Pierre: Neural network gradient Hamiltonian Monte Carlo (2019)
  2. Powell, Warren B.: A unified framework for stochastic optimization (2019)
  3. Alaa, Ahmed M.; van der Schaar, Mihaela: A hidden absorbing semi-Markov model for informatively censored temporal data: learning and inference (2018)
  4. Albert Zeyer, Tamer Alkhouli, Hermann Ney: RETURNN as a Generic Flexible Neural Toolkit with Application to Translation and Speech Recognition (2018) arXiv
  5. Bausch, Johannes: Classifying data using near-term quantum devices (2018)
  6. Baydin, Atılım Güneş; Pearlmutter, Barak A.; Radul, Alexey Andreyevich; Siskind, Jeffrey Mark: Automatic differentiation in machine learning: a survey (2018)
  7. Bottou, Léon; Curtis, Frank E.; Nocedal, Jorge: Optimization methods for large-scale machine learning (2018)
  8. Canyu Le; Xin Li: JigsawNet: Shredded Image Reassembly using Convolutional Neural Network and Loop-based Composition (2018) arXiv
  9. Chan, Shing; Elsheikh, Ahmed H.: A machine learning approach for efficient uncertainty quantification using multiscale methods (2018)
  10. Dai, Bin; Wang, Yu; Aston, John; Hua, Gang; Wipf, David: Connections with robust PCA and the role of emergent sparsity in variational autoencoder models (2018)
  11. de Bruin, Tim; Kober, Jens; Tuyls, Karl; Babuška, Robert: Experience selection in deep reinforcement learning for control (2018)
  12. Diveev, A. I.; Konstantinov, S. V.: Study of the practical convergence of evolutionary algorithms for the optimal program control of a wheeled robot (2018)
  13. Donner, Christian; Opper, Manfred: Efficient Bayesian inference of sigmoidal Gaussian Cox processes (2018)
  14. Dos Santos, Priscila G. M.; Sousa, Rodrigo S.; Araujo, Ismael C. S.; Da Silva, Adenilton J.: Quantum enhanced cross-validation for near-optimal neural networks architecture selection (2018)
  15. E, Weinan; Yu, Bing: The Deep Ritz Method: a deep learning-based numerical algorithm for solving variational problems (2018)
  16. Hubara, Itay; Courbariaux, Matthieu; Soudry, Daniel; El-Yaniv, Ran; Bengio, Yoshua: Quantized neural networks: training neural networks with low precision weights and activations (2018)
  17. Kojima, Ryosuke; Sato, Taisuke: Learning to rank in PRISM (2018)
  18. Lee, Seunghye; Ha, Jingwan; Zokhirova, Mehriniso; Moon, Hyeonjoon; Lee, Jaehong: Background information of deep learning for structural engineering (2018)
  19. Liao, Minghui; Shi, Baoguang; Bai, Xiang: TextBoxes++: a single-shot oriented scene text detector (2018)
  20. Li, Qianxiao; Chen, Long; Tai, Cheng; E, Weinan: Maximum principle based algorithms for deep learning (2018)

1 2 3 next