HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent. Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve state-of-the-art performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require performance-destroying memory locking and synchronization. This work aims to show using novel theoretical analysis, algorithms, and implementation that SGD can be implemented without any locking. We present an update scheme called HOGWILD! which allows processors access to shared memory with the possibility of overwriting each other’s work. We show that when the associated optimization problem is sparse, meaning most gradient updates only modify small parts of the decision variable, then HOGWILD! achieves a nearly optimal rate of convergence. We demonstrate experimentally that HOGWILD! outperforms alternative schemes that use locking by an order of magnitude.

References in zbMATH (referenced in 65 articles , 1 standard article )

Showing results 1 to 20 of 65.
Sorted by year (citations)

1 2 3 4 next

  1. Cenedese, Carlo; Belgioioso, Giuseppe; Grammatico, Sergio; Cao, Ming: An asynchronous distributed and scalable generalized Nash equilibrium seeking algorithm for strongly monotone games (2021)
  2. Hendrikx, Hadrien; Bach, Francis; Massoulié, Laurent: An optimal algorithm for decentralized finite-sum optimization (2021)
  3. Ma, Chenxin; Jaggi, Martin; Curtis, Frank E.; Srebro, Nathan; Takáč, Martin: An accelerated communication-efficient primal-dual optimization framework for structured machine learning (2021)
  4. Moorman, Jacob D.; Tu, Thomas K.; Molitor, Denali; Needell, Deanna: Randomized Kaczmarz with averaging (2021)
  5. Ramezani-Kebrya, Ali; Faghri, Fartash; Markov, Ilya; Aksenov, Vitalii; Alistarh, Dan; Roy, Daniel M.: NUQSGD: provably communication-efficient data-parallel SGD via nonuniform quantization (2021)
  6. Suda, Martin: Vampire with a brain is a good ITP hammer (2021)
  7. Vorontsova, E. A.; Gasnikov, A. V.; Dvurechensky, P. E.; Ivanova, A. S.; Pasechnyuk, D. A.: Numerical methods for the resource allocation problem in a computer network (2021)
  8. Xiao, Danyang; Mei, Yuan; Kuang, Di; Chen, Mengqiang; Guo, Binbin; Wu, Weigang: EGC: entropy-based gradient compression for distributed deep learning (2021)
  9. Zhao, Xing; Papagelis, Manos; An, Aijun; Chen, Bao Xin; Liu, Junfeng; Hu, Yonggang: Zipline: an optimized algorithm for the elastic bulk synchronous parallel model (2021)
  10. Boffi, Nicholas M.; Slotine, Jean-Jacques E.: A continuous-time analysis of distributed stochastic gradient (2020)
  11. Cannelli, Loris; Facchinei, Francisco; Kungurtsev, Vyacheslav; Scutari, Gesualdo: Asynchronous parallel algorithms for nonconvex optimization (2020)
  12. Erway, Jennifer B.; Griffin, Joshua; Marcia, Roummel F.; Omheni, Riadh: Trust-region algorithms for training responses: machine learning methods using indefinite Hessian approximations (2020)
  13. Feng, Weiming; Sun, Yuxin; Yin, Yitong: What can be sampled locally? (2020)
  14. Hong, Mingyi; Chang, Tsung-Hui; Wang, Xiangfeng; Razaviyayn, Meisam; Ma, Shiqian; Luo, Zhi-Quan: A block successive upper-bound minimization method of multipliers for linearly constrained convex optimization (2020)
  15. Kallus, Nathan; Udell, Madeleine: Dynamic assortment personalization in high dimensions (2020)
  16. Lee, Dongha; Oh, Jinoh; Yu, Hwanjo: \textscOCam: out-of-core coordinate descent algorithm for matrix completion (2020)
  17. Li, Boyue; Cen, Shicong; Chen, Yuxin; Chi, Yuejie: Communication-efficient distributed optimization in networks with gradient tracking and variance reduction (2020)
  18. Li, Huan; Lin, Zhouchen: Revisiting EXTRA for smooth distributed optimization (2020)
  19. Liu, Yangwei; Ding, Hu; Huang, Ziyun; Xu, Jinhui: Distributed and robust support vector machine (2020)
  20. Mishchenko, Konstantin; Iutzeler, Franck; Malick, Jérôme: A distributed flexible delay-tolerant proximal gradient algorithm (2020)

1 2 3 4 next