Algorithm 784: GEMM-based level 3 BLAS: portability and optimization issues. This companion article discusses portability and optimization issues of the GEMM-based level 3 BLAS model implementations and the performance evaluation benchmark. All software comes in all four data types (single- and double-precision, real and complex) and are designed to be easy to implement and use on different platforms. Each of the GEMM-based routines has a few machine-dependent parameters that specify internal block sizes, cache characteristics, and branch points for alternative code sections. These parameters provide means for adjustment to the characteristics of a memory hierarchy.
Keywords for this software
References in zbMATH (referenced in 4 articles )
Showing results 1 to 4 of 4.
- D’Alberto, Paolo; Bodrato, Marco; Nicolau, Alexandru: Exploiting parallelism in matrix-computation kernels for symmetric multiprocessor systems: matrix-multiplication and matrix-addition algorithm optimizations by software pipelining and threads allocation (2011)
- D’Alberto, Paolo; Nicolau, Alexandru: Adaptive Winograd’s matrix multiplications (2009)
- Kågström, Bo; Ling, Per; Van Loan, Charles: Algorithm 784: GEMM-based level 3 BLAS: Portability and optimization issues (1998)
- Kågströ m, Bo; van Loan, Charles: Algorithm 784: GEMM-based level 3 BLAS. portability and optimization issues. (1998) ioport