Adam: A Method for Stochastic Optimization. We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

References in zbMATH (referenced in 59 articles )

Showing results 1 to 20 of 59.
Sorted by year (citations)

1 2 3 next

  1. Ioannis T. Christou: Popt4jlib: A Parallel/Distributed Optimization Library for Java (2019) arXiv
  2. Kaiyang Zhou, Tao Xiang: Torchreid: A Library for Deep Learning Person Re-Identification in Pytorch (2019) arXiv
  3. Levy-Jurgenson, Alona; Tekpli, Xavier; Kristensen, Vessela N.; Yakhini, Zohar: Predicting methylation from sequence and gene expression using deep learning with attention (2019)
  4. Li, Lingge; Holbrook, Andrew; Shahbaba, Babak; Baldi, Pierre: Neural network gradient Hamiltonian Monte Carlo (2019)
  5. Powell, Warren B.: A unified framework for stochastic optimization (2019)
  6. Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Jamie Brew: HuggingFace’s Transformers: State-of-the-art Natural Language Processing (2019) arXiv
  7. Xue, Wang; Rujing, Wang; Yuanyuan, Wei; Yuanmiao, Gui: A novel conjoint triad auto covariance (CTAC) coding method for predicting protein-protein interaction based on amino acid sequence (2019)
  8. Yeo, Kyongmin; Melnyk, Igor: Deep learning algorithm for data-driven simulation of noisy dynamical system (2019)
  9. Yunqi Shao, Matti Hellström, Pavlin D. Mitev, Lisanne Knijff, Chao Zhang: PiNN: A Python Library for Building Atomic Neural Networks of Molecules and Materials (2019) arXiv
  10. Alaa, Ahmed M.; van der Schaar, Mihaela: A hidden absorbing semi-Markov model for informatively censored temporal data: learning and inference (2018)
  11. Albert Zeyer, Tamer Alkhouli, Hermann Ney: RETURNN as a Generic Flexible Neural Toolkit with Application to Translation and Speech Recognition (2018) arXiv
  12. Bausch, Johannes: Classifying data using near-term quantum devices (2018)
  13. Baydin, Atılım Güneş; Pearlmutter, Barak A.; Radul, Alexey Andreyevich; Siskind, Jeffrey Mark: Automatic differentiation in machine learning: a survey (2018)
  14. Bottou, Léon; Curtis, Frank E.; Nocedal, Jorge: Optimization methods for large-scale machine learning (2018)
  15. Canyu Le; Xin Li: JigsawNet: Shredded Image Reassembly using Convolutional Neural Network and Loop-based Composition (2018) arXiv
  16. Chan, Shing; Elsheikh, Ahmed H.: A machine learning approach for efficient uncertainty quantification using multiscale methods (2018)
  17. Dai, Bin; Wang, Yu; Aston, John; Hua, Gang; Wipf, David: Connections with robust PCA and the role of emergent sparsity in variational autoencoder models (2018)
  18. de Bruin, Tim; Kober, Jens; Tuyls, Karl; Babuška, Robert: Experience selection in deep reinforcement learning for control (2018)
  19. Diveev, A. I.; Konstantinov, S. V.: Study of the practical convergence of evolutionary algorithms for the optimal program control of a wheeled robot (2018)
  20. Donner, Christian; Opper, Manfred: Efficient Bayesian inference of sigmoidal Gaussian Cox processes (2018)

1 2 3 next