ADAGRAD: adaptive gradient algorithm; Adaptive subgradient methods for online learning and stochastic optimization. We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning. Metaphorically, the adaptation allows us to find needles in haystacks in the form of very predictive but rarely seen features. Our paradigm stems from recent advances in stochastic optimization and online learning which employ proximal functions to control the gradient steps of the algorithm. We describe and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal function that can be chosen in hindsight. We give several efficient algorithms for empirical risk minimization problems with common and important regularization functions and domain constraints. We experimentally study our theoretical analysis and show that adaptive subgradient methods outperform state-of-the-art, yet non-adaptive, subgradient algorithms.

References in zbMATH (referenced in 53 articles , 1 standard article )

Showing results 1 to 20 of 53.
Sorted by year (citations)

1 2 3 next

  1. Powell, Warren B.: A unified framework for stochastic optimization (2019)
  2. Yang, Shuoguang; Wang, Mengdi; Fang, Ethan X.: Multilevel stochastic gradient methods for nested composition optimization (2019)
  3. Achab, Massil; Bacry, Emmanuel; Gaïffas, Stéphane; Mastromatteo, Iacopo; Muzy, Jean-François: Uncovering causality from multivariate Hawkes integrated cumulants (2018)
  4. Baydin, Atılım Güneş; Pearlmutter, Barak A.; Radul, Alexey Andreyevich; Siskind, Jeffrey Mark: Automatic differentiation in machine learning: a survey (2018)
  5. Bottou, Léon; Curtis, Frank E.; Nocedal, Jorge: Optimization methods for large-scale machine learning (2018)
  6. Chan, Shing; Elsheikh, Ahmed H.: A machine learning approach for efficient uncertainty quantification using multiscale methods (2018)
  7. Chen, R.; Menickelly, M.; Scheinberg, K.: Stochastic optimization using a trust-region method and random models (2018)
  8. Duchi, John C.; Ruan, Feng: Stochastic methods for composite and weakly convex optimization problems (2018)
  9. Hu, Jiang; Milzarek, Andre; Wen, Zaiwen; Yuan, Yaxiang: Adaptive quadratically regularized Newton method for Riemannian optimization (2018)
  10. Lee, Seunghye; Ha, Jingwan; Zokhirova, Mehriniso; Moon, Hyeonjoon; Lee, Jaehong: Background information of deep learning for structural engineering (2018)
  11. Li, Qianxiao; Chen, Long; Tai, Cheng; E, Weinan: Maximum principle based algorithms for deep learning (2018)
  12. Liu, Zhe; Forouzanfar, Fahim: Ensemble clustering for efficient robust optimization of naturally fractured reservoirs (2018)
  13. Orabona, Francesco; Pál, Dávid: Scale-free online learning (2018)
  14. Peng, Xuan; Gao, Xunzhang; Li, Xiang: On better training the infinite restricted Boltzmann machines (2018)
  15. Shamir, Ohad: Distribution-specific hardness of learning neural networks (2018)
  16. Soudry, Daniel; Hoffer, Elad; Nacson, Mor Shpigel; Gunasekar, Suriya; Srebro, Nathan: The implicit bias of gradient descent on separable data (2018)
  17. Tan, Linda S. L.; Nott, David J.: Gaussian variational approximation with sparse precision matrices (2018)
  18. Yang, Jiyan; Chow, Yin-Lam; Ré, Christopher; Mahoney, Michael W.: Weighted SGD for (\ell_p) regression with randomized preconditioning (2018)
  19. Agarwal, Naman; Bullins, Brian; Hazan, Elad: Second-order stochastic optimization for machine learning in linear time (2017)
  20. Chen, Ziyan; Huang, Yu; Liang, Yuexian; Wang, Yang; Fu, Xingyu; Fu, Kun: RGloVe: an improved approach of global vectors for distributional entity relation representation (2017)

1 2 3 next