CholeskyQR2

CholeskyQR2: a simple and communication-avoiding algorithm for computing a tall-skinny QR factorization on a large-scale parallel system. Designing communication-avoiding algorithms is crucial for high performance computing on a large-scale parallel system. The TSQR algorithm is a communication-avoiding algorithm for computing a tall-skinny QR factorization, and TSQR is known to be much faster and as stable as the classical Householder QR algorithm. The Cholesky QR algorithm is another very simple and fast communication-avoiding algorithm, but rarely used in practice because of its numerical instability. Our recent work points out that an algorithm that simply repeats Cholesky QR twice, which we call CholeskyQR2, gives excellent accuracy for a wide range of matrices arising in practice. Although the communication cost of CholeskyQR2 is twice that of TSQR, it has an advantage that its reduction operation is addition whereas that of TSQR is a QR factorization, whose high-performance implementation is more difficult. Thus, CholeskyQR2 can potentially be significantly faster than TSQR. Indeed, in our experiments using 16384 nodes of the K computer, CholeskyQR2 ran about three times faster than TSQR for a 4194304 × 64 matrix.

This software is also peer reviewed by journal TOMS.


References in zbMATH (referenced in 10 articles )

Showing results 1 to 10 of 10.
Sorted by year (citations)

  1. Carson, Erin; Lund, Kathryn; Rozložník, Miroslav; Thomas, Stephen: Block Gram-Schmidt algorithms and their stability properties (2022)
  2. Liao, Zeyu; Hayami, Ken; Morikuni, Keiichi; Yin, Jun-Feng: A stabilized GMRES method for singular and severely ill-conditioned systems of linear equations (2022)
  3. Higham, Nicholas J.; Pranesh, Srikara: Exploiting lower precision arithmetic in solving symmetric positive definite linear systems and least squares problems (2021)
  4. Fukaya, Takeshi; Kannan, Ramaseshan; Nakatsukasa, Yuji; Yamamoto, Yusaku; Yanagisawa, Yuka: Shifted Cholesky QR for computing the QR factorization of ill-conditioned matrices (2020)
  5. Imakura, Akira; Yamamoto, Yusaku: Efficient implementations of the modified Gram-Schmidt orthogonalization with a non-standard inner product (2019)
  6. Tomás, Andrés E.; Quintana-Ortí, Enrique S.: Cholesky and Gram-Schmidt orthogonalization for tall-and-skinny QR factorizations on graphics processors (2019)
  7. Li, Huamin; Kluger, Yuval; Tygert, Mark: Randomized algorithms for distributed computation of principal component analysis and singular value decomposition (2018)
  8. Mach, T.; Reichel, L.; Van Barel, M.; Vandebril, R.: Adaptive cross approximation for ill-posed problems (2016)
  9. Yamamoto, Yusaku; Nakatsukasa, Yuji; Yanagisawa, Yuka; Fukaya, Takeshi: Roundoff error analysis of the CholeskyQR2 algorithm in an oblique inner product (2016)
  10. Yamamoto, Yusaku; Nakatsukasa, Yuji; Yanagisawa, Yuka; Fukaya, Takeshi: Roundoff error analysis of the CholeskyQR2 algorithm (2015)