HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent. Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve state-of-the-art performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require performance-destroying memory locking and synchronization. This work aims to show using novel theoretical analysis, algorithms, and implementation that SGD can be implemented without any locking. We present an update scheme called HOGWILD! which allows processors access to shared memory with the possibility of overwriting each other’s work. We show that when the associated optimization problem is sparse, meaning most gradient updates only modify small parts of the decision variable, then HOGWILD! achieves a nearly optimal rate of convergence. We demonstrate experimentally that HOGWILD! outperforms alternative schemes that use locking by an order of magnitude.

References in zbMATH (referenced in 39 articles , 1 standard article )

Showing results 1 to 20 of 39.
Sorted by year (citations)

1 2 next

  1. Mishchenko, Konstantin; Iutzeler, Franck; Malick, Jérôme: A distributed flexible delay-tolerant proximal gradient algorithm (2020)
  2. Richtárik, Peter; Takáč, Martin: Stochastic reformulations of linear systems: algorithms and convergence theory (2020)
  3. Devarakonda, Aditya; Fountoulakis, Kimon; Demmel, James; Mahoney, Michael W.: Avoiding communication in primal and dual block coordinate descent methods (2019)
  4. Gao, Bin; Liu, Xin; Yuan, Ya-Xiang: Parallelizable algorithms for optimization problems with orthogonality constraints (2019)
  5. Karakus, Can; Sun, Yifan; Diggavi, Suhas; Yin, Wotao: Redundancy techniques for straggler mitigation in distributed optimization and learning (2019)
  6. Peng, Zhimin; Xu, Yangyang; Yan, Ming; Yin, Wotao: On the convergence of asynchronous parallel iteration with unbounded delays (2019)
  7. Xiao, Lin; Yu, Adams Wei; Lin, Qihang; Chen, Weizhu: DSCOVR: randomized primal-dual block coordinate algorithms for asynchronous distributed optimization (2019)
  8. Xu, Yangyang: Asynchronous parallel primal-dual block coordinate update methods for affinely constrained convex programs (2019)
  9. Bottou, Léon; Curtis, Frank E.; Nocedal, Jorge: Optimization methods for large-scale machine learning (2018)
  10. Chen, Ke; Li, Qin; Liu, Jian-Guo: Online learning in optical tomography: a stochastic approach (2018)
  11. Dutta, Haimonti; Srinivasan, Ashwin: Consensus-based modeling using distributed feature construction with ILP (2018)
  12. Hannah, Robert; Yin, Wotao: On unbounded delays in asynchronous parallel fixed-point algorithms (2018)
  13. Hook, James; Dingle, Nicholas: Performance analysis of asynchronous parallel Jacobi (2018)
  14. Jain, Prateek; Kakade, Sham M.; Kidambi, Rahul; Netrapalli, Praneeth; Sidford, Aaron: Parallelizing stochastic gradient descent for least squares regression: mini-batching, averaging, and model misspecification (2018)
  15. Leblond, Rémi; Pedregosa, Fabian; Lacoste-Julien, Simon: Improved asynchronous parallel optimization analysis for stochastic incremental methods (2018)
  16. Patrascu, Andrei; Necoara, Ion: Nonasymptotic convergence of stochastic proximal point methods for constrained convex optimization (2018)
  17. Shikhar Bhardwaj, Ryan R. Curtin, Marcus Edel, Yannis Mentekidis, Conrad Sanderson: ensmallen: a flexible C++ library for efficient function optimization (2018) arXiv
  18. Smith, Virginia; Forte, Simone; Ma, Chenxin; Takáč, Martin; Jordan, Michael I.; Jaggi, Martin: CoCoA: a general framework for communication-efficient distributed optimization (2018)
  19. Vanli, N. D.; Gürbüzbalaban, Mert; Ozdaglar, A.: Global convergence rate of proximal incremental aggregated gradient methods (2018)
  20. Yang, Jiyan; Chow, Yin-Lam; Ré, Christopher; Mahoney, Michael W.: Weighted SGD for (\ell_p) regression with randomized preconditioning (2018)

1 2 next