MLlib

MLlib: machine learning in apache spark. Apache Spark is a popular open-source platform for large-scale data processing that is well-suited for iterative machine learning tasks. In this paper we present MLlib, Spark’s open-source distributed machine learning library. MLlib provides efficient functionality for a wide range of learning settings and includes several underlying statistical, optimization, and linear algebra primitives. Shipped with Spark, MLlib supports several languages and provides a high-level API that leverages Spark’s rich ecosystem to simplify the development of end-to-end machine learning pipelines. MLlib has experienced a rapid growth due to its vibrant open-source community of over 140 contributors, and includes extensive documentation to support further growth and to let users quickly get up to speed.


References in zbMATH (referenced in 19 articles , 1 standard article )

Showing results 1 to 19 of 19.
Sorted by year (citations)

  1. David B. Dahl: Integration of R and Scala Using rscala (2020) not zbMATH
  2. Salehi, Abbas; Masoumi, Behrooz: KATZ centrality with biogeography-based optimization for influence maximization problem (2020)
  3. Raissi, Maziar; Babaee, Hessam; Karniadakis, George Em: Parametric Gaussian process regression for big data (2019)
  4. Tsamardinos, Ioannis; Borboudakis, Giorgos; Katsogridakis, Pavlos; Pratikakis, Polyvios; Christophides, Vassilis: A greedy feature selection algorithm for big data of high dimensionality (2019)
  5. Xiao, Lin; Yu, Adams Wei; Lin, Qihang; Chen, Weizhu: DSCOVR: randomized primal-dual block coordinate algorithms for asynchronous distributed optimization (2019)
  6. Yu, Hong; Chen, Yun; Lingras, Pawan; Wang, Guoyin: A three-way cluster ensemble approach for large-scale data (2019)
  7. Gudivada, Venkat N.; Arbabifard, Kamyar: Open-source libraries, application frameworks, and workflow systems for NLP (2018)
  8. Kordík, Pavel; Černý, Jan; Frýda, Tomáš: Discovering predictive ensembles for transfer learning and meta-learning (2018)
  9. Pelucchi, Mauro; Psaila, Giuseppe; Toccu, Maurizio: Hadoop vs. Spark: impact on performance of the Hammer query engine for open data corpora (2018)
  10. Smith, Virginia; Forte, Simone; Ma, Chenxin; Takáč, Martin; Jordan, Michael I.; Jaggi, Martin: CoCoA: a general framework for communication-efficient distributed optimization (2018)
  11. Andrea Esuli, Tiziano Fagni, Alejandro Moreo Fernandez: JaTeCS an open-source JAva TExt Categorization System (2017) arXiv
  12. Bacciu, Davide; Carta, Antonio; Gnesi, Stefania; Semini, Laura: An experience in using machine learning for short-term predictions in smart transportation systems (2017)
  13. Ghesmoune, Mohammed; Azzag, Hanene; Benbernou, Salima; Lebbah, Mustapha; Duong, Tarn; Ouziri, Mourad: Big Data: from collection to visualization (2017)
  14. Kanavos, Andreas; Nodarakis, Nikolaos; Sioutas, Spyros; Tsakalidis, Athanasios; Tsolis, Dimitrios; Tzimas, Giannis: Large scale implementations for Twitter sentiment classification (2017)
  15. Masegosa, Andrés R.; Martinez, Ana M.; Langseth, Helge; Nielsen, Thomas D.; Salmerón, Antonio; Ramos-López, Darío; Madsen, Anders L.: Scaling up Bayesian variational inference using distributed computing clusters (2017)
  16. Ralf Mikut, Andreas Bartschat, Wolfgang Doneit, Jorge Angel Gonzalez Ordiano, Benjamin Schott, Johannes Stegmaier, Simon Waczowicz, Markus Reischl: The MATLAB Toolbox SciXMiner: User’s Manual and Programmer’s Guide (2017) arXiv
  17. Iwen, M. A.; Ong, B. W.: A distributed and incremental SVD algorithm for agglomerative data analysis on large networks (2016)
  18. Meng, Xiangrui; Bradley, Joseph; Yavuz, Burak; Sparks, Evan; Venkataraman, Shivaram; Liu, Davies; Freeman, Jeremy; Tsai, Db; Amde, Manish; Owen, Sean; Xin, Doris; Xin, Reynold; Franklin, Michael J.; Zadeh, Reza; Zaharia, Matei; Talwalkar, Ameet: MLlib: machine learning in Apache Spark (2016)
  19. Tianqi Chen, Carlos Guestrin: XGBoost: A Scalable Tree Boosting System (2016) arXiv