A unified sparse matrix data format for efficient general sparse matrix-vector multiplication on modern processors with wide SIMD units. Sparse matrix-vector multiplication (spMVM) is the most time-consuming kernel in many numerical algorithms and has been studied extensively on all modern processor and accelerator architectures. However, the optimal sparse matrix data storage format is highly hardware-specific, which could become an obstacle when using heterogeneous systems. Also, it is as yet unclear how the wide single instruction multiple data (SIMD) units in current multi- and many-core processors should be used most efficiently if there is no structure in the sparsity pattern of the matrix. We suggest SELL-C-σ, a variant of Sliced ELLPACK, as a SIMD-friendly data format which combines long-standing ideas from general-purpose graphics processing units and vector computer programming. We discuss the advantages of SELL-C-σ compared to established formats like compressed row storage and ELLPACK and show its suitability on a variety of hardware platforms (Intel Sandy Bridge, Intel Xeon Phi, and Nvidia Tesla K20) for a wide range of test matrices from different application areas. Using appropriate performance models we develop deep insight into the data transfer properties of the SELL-C-σ spMVM kernel. SELL-C-σ comes with two tuning parameters whose performance impact across the range of test matrices is studied and for which reasonable choices are proposed. This leads to a hardware-independent (“catch-all”) sparse matrix format, which achieves very high efficiency for all test matrices across all hardware platforms.
Keywords for this software
References in zbMATH (referenced in 11 articles , 1 standard article )
Showing results 1 to 11 of 11.
- Alvermann, Andreas; Basermann, Achim; Bungartz, Hans-Joachim; Carbogno, Christian; Ernst, Dominik; Fehske, Holger; Futamura, Yasunori; Galgon, Martin; Hager, Georg; Huber, Sarah; Huckle, Thomas; Ida, Akihiro; Imakura, Akira; Kawai, Masatoshi; Köcher, Simone; Kreutzer, Moritz; Kus, Pavel; Lang, Bruno; Lederer, Hermann; Manin, Valeriy; Marek, Andreas; Nakajima, Kengo; Nemec, Lydia; Reuter, Karsten; Rippl, Michael; Röhrig-Zöllner, Melven; Sakurai, Tetsuya; Scheffler, Matthias; Scheurer, Christoph; Shahzad, Faisal; Simoes Brambila, Danilo; Thies, Jonas; Wellein, Gerhard: Benefits from using mixed precision computations in the ELPA-AEO and ESSEX-II eigensolver projects (2019)
- Krasnopolsky, B. I.: Optimal strategy for modelling turbulent flows with ensemble averaging on high performance computing systems (2018)
- Pikle, Nileshchandra K.; Sathe, Shailesh R.; Vyavhare, Arvind Y.: GPGPU-based parallel computing applied in the FEM using the conjugate gradient algorithm: a review (2018)
- Bauer, S.; Mohr, M.; Rüde, U.; Weismüller, J.; Wittmann, M.; Wohlmuth, B.: A two-scale approach for efficient on-the-fly operator assembly in massively parallel high performance multigrid codes (2017)
- Bernaschi, Massimo; Bisson, Mauro; Fantozzi, Carlo; Janna, Carlo: A factored sparse approximate inverse preconditioned conjugate gradient solver on graphics processing units (2016)
- Gao, Jiaquan; Qi, Panpan; He, Guixia: Efficient CSR-based sparse matrix-vector multiplication on GPU (2016)
- He, Guixia; Gao, Jiaquan: A novel CSR-based sparse matrix-vector multiplication on GPUs (2016)
- Rupp, Karl; Tillet, Philippe; Rudolf, Florian; Weinbub, Josef; Morhammer, Andreas; Grasser, Tibor; Jüngel, Ansgar; Selberherr, Siegfried: ViennaCL-linear algebra library for multi- and many-core architectures (2016)
- Mironowicz, P.; Dziekonski, A.; Mrozowski, M.: A task-scheduling approach for efficient sparse symmetric matrix-vector multiplication on a GPU (2015)
- Röhrig-Zöllner, Melven; Thies, Jonas; Kreutzer, Moritz; Alvermann, Andreas; Pieper, Andreas; Basermann, Achim; Hager, Georg; Wellein, Gerhard; Fehske, Holger: Increasing the performance of the Jacobi-Davidson method by blocking (2015)
- Kreutzer, Moritz; Hager, Georg; Wellein, Gerhard; Fehske, Holger; Bishop, Alan R.: A unified sparse matrix data format for efficient general sparse matrix-vector multiplication on modern processors with wide SIMD units (2014)