MapReduce is a new parallel programming model initially developed for large-scale web content processing. Data analysis meets the issue of how to do calculation over extremely large datasets. The arrival of MapReduce provides a chance to utilize commodity hardware for massively parallel data analysis applications. The translation and optimization from relational algebra operators to MapReduce programs is still an open and dynamic research field. In this paper, we focus on a special type of data analysis query, namely multiple group by query. We first study the communication cost of the MapReduce model, then we give an initial implementation of multiple group by query. We then propose an optimized version which addresses and improves the communication cost issues. Our optimized version shows a better accelerating ability and a better scalability than the other version

References in zbMATH (referenced in 253 articles , 1 standard article )

Showing results 1 to 20 of 253.
Sorted by year (citations)

1 2 3 ... 11 12 13 next

  1. Brandt, Sebastian; Fischer, Manuela; Uitto, Jara: Breaking the linear-memory barrier in (\mathsfMPC): fast (\mathsfMIS) on trees with strongly sublinear memory (2021)
  2. Dobriban, Edgar; Sheng, Yue: Distributed linear regression by averaging (2021)
  3. Hao, Rong-Xia; Tian, Zengxian: The vertex-pancyclicity of data center networks (2021)
  4. Zhang, Tonglin; Yang, Baijian: Accounting for factor variables in big data regression (2021)
  5. Ahmadi, Saba; Khuller, Samir; Purohit, Manish; Yang, Sheng: On scheduling coflows (2020)
  6. Audrito, Giorgio; Beal, Jacob; Damiani, Ferruccio; Pianini, Danilo; Viroli, Mirko: Field-based coordination with the share operator (2020)
  7. Czumaj, Artur; Łącki, Jakub; Mądry, Aleksander; Mitrović, Slobodan; Onak, Krzysztof; Sankowski, Piotr: Round compression for parallel matching algorithms (2020)
  8. Fotakis, Dimitris; Milis, Ioannis; Papadigenopoulos, Orestis; Vassalos, Vasilis; Zois, Georgios: Scheduling MapReduce jobs on identical and unrelated processors (2020)
  9. Genuzio, Marco; Ottaviano, Giuseppe; Vigna, Sebastiano: Fast scalable construction of ([compressed] static | minimal perfect hash) functions (2020)
  10. Ketsman, Bas; Albarghouthi, Aws; Koutris, Paraschos: Distribution policies for Datalog (2020)
  11. Montealegre, P.; Perez-Salazar, S.; Rapaport, I.; Todinca, I.: Graph reconstruction in the congested clique (2020)
  12. Saikia, Parikshit; Karmakar, Sushanta: Distributed approximation algorithms for Steiner tree in the CONGESTED CLIQUE (2020)
  13. Sambasivan, Rajiv; Das, Sourish; Sahu, Sujit K.: A Bayesian perspective of statistical machine learning for big data (2020)
  14. Tang, Lu; Zhou, Ling; Song, Peter X.-K.: Distributed simultaneous inference in generalized linear models via confidence distribution (2020)
  15. Zhang, Longxin; Zhou, Liqian; Salah, Ahmad: Efficient scientific workflow scheduling for deadline-constrained parallel tasks in cloud computing environments (2020)
  16. Agapito, Giuseppe; Guzzi, Pietro Hiram; Cannataro, Mario: Parallel extraction of association rules from genomics data (2019)
  17. Ali, Syed Muhammad Fawad; Mey, Johannes; Thiele, Maik: Parallelizing user-defined functions in the ETL workflow using orchestration style sheets (2019)
  18. Atar, Rami; Keslassy, Isaac; Mendelson, Gal: Subdiffusive load balancing in time-varying queueing systems (2019)
  19. Aydin, Kevin; Bateni, Mohammadhossein; Mirrokni, Vahab: Distributed balanced partitioning via linear embedding (2019)
  20. Biletskyy, Borys: Distributed Bayesian machine learning procedures (2019)

1 2 3 ... 11 12 13 next