SUMMA
In this paper, we give a straight forward, highly efficient, scalable implementation of common matrix multiplication operations. The algorithms are much simpler than previously published methods, yield better performance, and require less work space. MPI implementations are given, as are performance results on the Intel Paragon system.
Keywords for this software
References in zbMATH (referenced in 30 articles )
Showing results 1 to 20 of 30.
Sorted by year (- Gorman, Christopher; Chávez, Gustavo; Ghysels, Pieter; Mary, Théo; Rouet, François-Henry; Li, Xiaoye Sherry: Robust and accurate stopping criteria for adaptive randomized sampling in matrix-free hierarchically semiseparable construction (2019)
- Huang, Jianyu; Matthews, Devin A.; van de Geijn, Robert A.: Strassen’s algorithm for tensor contraction (2018)
- Azad, Ariful; Ballard, Grey; Buluç, Aydin; Demmel, James; Grigori, Laura; Schwartz, Oded; Toledo, Sivan; Williams, Samuel: Exploiting multiple levels of parallelism in sparse matrix-matrix multiplication (2016)
- Ballard, Grey; Siefert, Christopher; Hu, Jonathan: Reducing communication costs for sparse matrix multiplication within algebraic multigrid (2016)
- Beliakov, Gleb; Matiyasevich, Yuri: A parallel algorithm for calculation of determinants and minors using arbitrary precision arithmetic (2016)
- Bock, Nicolas; Challacombe, Matt; Kalé, Laxmikant V.: Solvers for (\mathcalO(N)) electronic structure in the strong scaling limit (2016)
- Rouet, François-Henry; Li, Xiaoye S.; Ghysels, Pieter; Napov, Artem: A distributed-memory package for dense hierarchically semi-separable matrix computations using randomization (2016)
- Schatz, Martin D.; van de Geijn, Robert A.; Poulson, Jack: Parallel matrix multiplication: a systematic journey (2016)
- Ballard, G.; Carson, E.; Demmel, J.; Hoemmen, M.; Knight, N.; Schwartz, O.: Communication lower bounds and optimal algorithms for numerical linear algebra (2014)
- Bock, Nicolas; Challacombe, Matt: An optimized sparse approximate matrix multiply for matrices with decay (2013)
- Buluç, Aydin; Gilbert, John R.: Parallel sparse matrix-matrix multiplication and indexing: implementation and experiments (2012)
- Lu, Qingda; Gao, Xiaoyang; Krishnamoorthy, Sriram; Baumgartner, Gerald; Ramanujam, J.; Sadayappan, P.: Empirical performance model-driven data layout optimization and library call selection for tensor contraction expressions (2012) ioport
- Buluç, Aydın; Gilbert, John: New ideas in sparse matrix matrix multiplication (2011)
- Nishtala, Rajesh; Zheng, Yili; Hargrove, Paul H.; Yelick, Katherine A.: Tuning collective communication for partitioned global address space programming models (2011) ioport
- Auckenthaler, T.; Bader, M.; Huckle, T.; Spörl, A.; Waldherr, K.: Matrix exponentials and parallel prefix computation in a quantum control problem (2010)
- Gunnels, John; Lee, Jon; Margulies, Susan: Efficient high-precision matrix algebra on parallel architectures for nonlinear combinatorial optimization (2010)
- Kalinov, Alexey: Measuring the scalability of heterogeneous parallel systems (2006)
- Kalinov, A. Ya.: Scalability of heterogeneous parallel systems (2006)
- El-Qawasmeh, Eyas; Al-Ayyoub, Abdel-Elah; Abu-Ghazaleh, Nayef: Quick matrix multiplication on clusters of workstations (2004)
- Hunold, S.; Rauber, T.; Rünger, G.: Hierarchical matrix-matrix multiplication based on multiprocessor tasks (2004)