Adam: A Method for Stochastic Optimization. We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

References in zbMATH (referenced in 59 articles )

Showing results 41 to 59 of 59.
Sorted by year (citations)
  1. Zhu, Yinhao; Zabaras, Nicholas: Bayesian deep convolutional encoder-decoder networks for surrogate modeling and uncertainty quantification (2018)
  2. Bui, Thang D.; Yan, Josiah; Turner, Richard E.: A unifying framework for Gaussian process pseudo-point approximations using power expectation propagation (2017)
  3. Dery, Lucio Mwinmaarong; Nachman, Benjamin; Rubbo, Francesco; Schwartzman, Ariel: Weakly supervised classification in high energy physics (2017)
  4. E, Weinan; Han, Jiequn; Jentzen, Arnulf: Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations (2017)
  5. Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar, Artem Sokolov, Ann Clifton, Matt Post: Sockeye: A Toolkit for Neural Machine Translation (2017) arXiv
  6. Han Wang, Linfeng Zhang, Jiequn Han, Weinan E: DeePMD-kit: A deep learning package for many-body potential energy representation and molecular dynamics (2017) arXiv
  7. Hasenclever, Leonard; Webb, Stefan; Lienart, Thibaut; Vollmer, Sebastian; Lakshminarayanan, Balaji; Blundell, Charles; Teh, Yee Whye: Distributed Bayesian learning with stochastic natural gradient expectation propagation and the posterior server (2017)
  8. Komiske, Patrick T.; Metodiev, Eric M.; Schwartz, Matthew D.: Deep learning in color: towards automated quark/gluon jet discrimination (2017)
  9. Lauly, Stanislas; Zheng, Yin; Allauzen, Alexandre; Larochelle, Hugo: Document neural autoregressive distribution estimation (2017)
  10. Loos, Sarah; Irving, Geoffrey; Szegedy, Christian; Kaliszyk, Cezary: Deep network guided proof search (2017)
  11. Mahsereci, Maren; Hennig, Philipp: Probabilistic line searches for stochastic optimization (2017)
  12. Mohammad Emtiyaz Khan, Zuozhu Liu, Voot Tangkaratt, Yarin Gal: Vprop: Variational Inference using RMSprop (2017) arXiv
  13. Ozan Caglayan, Mercedes García-Martínez, Adrien Bardet, Walid Aransa, Fethi Bougares, Loïc Barrault: NMTPY: A Flexible Toolkit for Advanced Neural Machine Translation Systems (2017) arXiv
  14. Rico Sennrich, Orhan Firat, Kyunghyun Cho, Alexandra Birch, Barry Haddow, Julian Hitschler, Marcin Junczys-Dowmunt, Samuel Läubli, Antonio Valerio Miceli Barone, Jozef Mokry, Maria Nădejde: Nematus: a Toolkit for Neural Machine Translation (2017) arXiv
  15. Ochs, Peter; Ranftl, René; Brox, Thomas; Pock, Thomas: Techniques for gradient-based bilevel optimization with non-smooth lower level problems (2016)
  16. Patrick Doetsch, Albert Zeyer, Paul Voigtlaender, Ilya Kulikov, Ralf Schlüter, Hermann Ney: RETURNN: The RWTH Extensible Training framework for Universal Recurrent Neural Networks (2016) arXiv
  17. Seppo Enarvi, Mikko Kurimo: TheanoLM - An Extensible Toolkit for Neural Network Language Modeling (2016) arXiv
  18. Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, Jeffrey Dean: Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation (2016) arXiv
  19. Diederik P. Kingma, Jimmy Ba: Adam: A Method for Stochastic Optimization (2014) arXiv