Adam: A Method for Stochastic Optimization. We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

References in zbMATH (referenced in 59 articles )

Showing results 21 to 40 of 59.
Sorted by year (citations)
  1. Dos Santos, Priscila G. M.; Sousa, Rodrigo S.; Araujo, Ismael C. S.; Da Silva, Adenilton J.: Quantum enhanced cross-validation for near-optimal neural networks architecture selection (2018)
  2. E, Weinan; Yu, Bing: The Deep Ritz Method: a deep learning-based numerical algorithm for solving variational problems (2018)
  3. Hubara, Itay; Courbariaux, Matthieu; Soudry, Daniel; El-Yaniv, Ran; Bengio, Yoshua: Quantized neural networks: training neural networks with low precision weights and activations (2018)
  4. Kojima, Ryosuke; Sato, Taisuke: Learning to rank in PRISM (2018)
  5. Lee, Seunghye; Ha, Jingwan; Zokhirova, Mehriniso; Moon, Hyeonjoon; Lee, Jaehong: Background information of deep learning for structural engineering (2018)
  6. Liao, Minghui; Shi, Baoguang; Bai, Xiang: TextBoxes++: a single-shot oriented scene text detector (2018)
  7. Lin, Xing; Rivenson, Yair; Yardimci, Nezih T.; Veli, Muhammed; Luo, Yi; Jarrahi, Mona; Ozcan, Aydogan: All-optical machine learning using diffractive deep neural networks (2018)
  8. Li, Qianxiao; Chen, Long; Tai, Cheng; E, Weinan: Maximum principle based algorithms for deep learning (2018)
  9. Li, Wuchen; Montúfar, Guido: Natural gradient via optimal transport (2018)
  10. Nolle, Timo; Luettgen, Stefan; Seeliger, Alexander; Mühlhäuser, Max: Analyzing business process anomalies using autoencoders (2018)
  11. Pan, Shaowu; Duraisamy, Karthik: Data-driven discovery of closure models (2018)
  12. Shikhar Bhardwaj, Ryan R. Curtin, Marcus Edel, Yannis Mentekidis, Conrad Sanderson: ensmallen: a flexible C++ library for efficient function optimization (2018) arXiv
  13. Soudry, Daniel; Hoffer, Elad; Nacson, Mor Shpigel; Gunasekar, Suriya; Srebro, Nathan: The implicit bias of gradient descent on separable data (2018)
  14. Ueltzhöffer, Kai: Deep active inference (2018)
  15. Vlachas, Pantelis R.; Byeon, Wonmin; Wan, Zhong Y.; Sapsis, Themistoklis P.; Koumoutsakos, Petros: Data-driven forecasting of high-dimensional chaotic systems with long short-term memory networks (2018)
  16. Wong, Wee Chin; Chee, Ewan; Li, Jiali; Wang, Xiaonan: Recurrent neural network-based model predictive control for continuous pharmaceutical manufacturing (2018)
  17. Xu, Xiangyu; Pan, Jinshan; Zhang, Yu-Jin; Yang, Ming-Hsuan: Motion blur kernel estimation via deep learning (2018)
  18. Ye, Jong Chul; Han, Yoseob; Cha, Eunju: Deep convolutional framelets: a general deep learning framework for inverse problems (2018)
  19. Yoo, JaeJun; Wahab, Abdul; Ye, Jong Chul: A mathematical framework for deep learning in elastic source imaging (2018)
  20. Zhang, Junbo; Zheng, Yu; Qi, Dekang; Li, Ruiyuan; Yi, Xiuwen; Li, Tianrui: Predicting citywide crowd flows using deep spatio-temporal residual networks (2018)