• Pegasos

  • Referenced in 103 articles [sw08752]
  • analyze a simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem ... example. In contrast, previous analyses of stochastic gradient descent methods for SVMs require...
  • HOGWILD

  • Referenced in 65 articles [sw28396]
  • Lock-Free Approach to Parallelizing Stochastic Gradient Descent. Stochastic Gradient Descent (SGD) is a popular...
  • ADADELTA

  • Referenced in 59 articles [sw39429]
  • dimension learning rate method for gradient descent called ADADELTA. The method dynamically adapts over time ... minimal computational overhead beyond vanilla stochastic gradient descent. The method requires no manual tuning...
  • SGD-QN

  • Referenced in 28 articles [sw19411]
  • careful quasi-Newton stochastic gradient descent. The SGD-QN algorithm is a stochastic gradient descent ... fast as a first-order stochastic gradient descent but requires less iterations to achieve...
  • SGDR

  • Referenced in 17 articles [sw30752]
  • SGDR: Stochastic Gradient Descent with Warm Restarts. Restart techniques are common in gradient-free optimization ... gradient-based optimization to improve the rate of convergence in accelerated gradient schemes to deal ... simple warm restart technique for stochastic gradient descent to improve its anytime performance when training...
  • CNTK

  • Referenced in 9 articles [sw21056]
  • recurrent networks (RNNs/LSTMs). It implements stochastic gradient descent (SGD, error backpropagation) learning with automatic differentiation...
  • LargeVis

  • Referenced in 6 articles [sw34905]
  • effectively optimized through asynchronous stochastic gradient descent with a linear time complexity. The whole procedure...
  • BudgetedSVM

  • Referenced in 4 articles [sw10893]
  • rank Linearization SVM, and Budgeted Stochastic Gradient Descent. BudgetedSVM trains models with accuracy comparable...
  • ADMM-Softmax

  • Referenced in 4 articles [sw32744]
  • Krylov, a quasi Newton, and a stochastic gradient descent method...
  • gradDescentR

  • Referenced in 1 article [sw38962]
  • partially to reduce the computation load. Stochastic Gradient Descent (SGD), which is an optimization ... based algorithm to minimize stochastic step to average. Momentum Gradient Descent (MGD), which ... gradient-descent-based algorithm that mean and variance moment to do adaptive learning. Stochastic Variance ... converging by reducing the gradient. Semi Stochastic Gradient Descent (SSGD),which is a SGD-based...
  • DeepTrack

  • Referenced in 2 articles [sw27576]
  • accumulation. Second, we enhance the ordinary Stochastic Gradient Descent approach in CNN training with...
  • Jensen

  • Referenced in 1 article [sw26651]
  • algorithms (including Gradient Descent, L-BFGS, Stochastic Gradient Descent, Conjugate Gradient, etc.), and a family...
  • libFM

  • Referenced in 1 article [sw29652]
  • implementation for factorization machines that features stochastic gradient descent (SGD) and alternating least squares...
  • deepNN

  • Referenced in 1 article [sw38663]
  • perceptron, different activation functions, regularisation strategies, stochastic gradient descent and dropout. Thanks...
  • ProPPR

  • Referenced in 1 article [sw32915]
  • learning can be performed using parallel stochastic gradient descent with a supervised personalized PageRank algorithm...
  • MLitB

  • Referenced in 1 article [sw30254]
  • deep neural networks with synchronized, distributed stochastic gradient descent. MLitB offers several important opportunities...
  • SINE

  • Referenced in 1 article [sw32344]
  • missing information on representation learning. A stochastic gradient descent based online algorithm is derived...
  • DSelect-k

  • Referenced in 1 article [sw41672]
  • using first-order methods, such as stochastic gradient descent, and offers explicit control over...
  • FIt-SNE

  • Referenced in 2 articles [sw34901]
  • Interpolation-based t-SNE (FIt-SNE). t-Stochastic Neighborhood Embedding (t-SNE) is a highly ... algorithm to approximate the gradient at each iteration of gradient descent. We accelerated this implementation...
  • MetaGrad

  • Referenced in 1 article [sw40373]
  • also various types of stochastic and non-stochastic functions without any curvature. We prove this ... adapts automatically to the size of the gradients. Its main feature is that it simultaneously ... which they consistently outperform both online gradient descent and AdaGrad...