NOTES | HOME
$$ \newcommand{\RR}{\mathbb{R}} \newcommand{\GG}{\mathbb{G}} \newcommand{\PP}{\mathbb{P}} \newcommand{\PS}{\mathcal{P}} \newcommand{\SS}{\mathbb{S}} \newcommand{\NN}{\mathbb{N}} \newcommand{\ZZ}{\mathbb{Z}} \newcommand{\CC}{\mathbb{C}} \newcommand{\HH}{\mathbb{H}} \newcommand{\ones}{\mathbb{1\hspace{-0.4em}1}} \newcommand{\alg}[1]{\mathfrak{#1}} \newcommand{\mat}[1]{ \begin{pmatrix} #1 \end{pmatrix} } \renewcommand{\bar}{\overline} \renewcommand{\hat}{\widehat} \renewcommand{\tilde}{\widetilde} \newcommand{\inv}[1]{ {#1}^{-1} } \newcommand{\eqdef}{\overset{\text{def}}=} \newcommand{\block}[1]{\left(#1\right)} \newcommand{\set}[1]{\left\{#1\right\}} \newcommand{\abs}[1]{\left|#1\right|} \newcommand{\trace}[1]{\mathrm{tr}\block{#1}} \newcommand{\norm}[1]{ \left\| #1 \right\| } \newcommand{\argmin}[1]{ \underset{#1}{\mathrm{argmin}} } \newcommand{\argmax}[1]{ \underset{#1}{\mathrm{argmax}} } \newcommand{\st}{\ \mathrm{s.t.}\ } \newcommand{\sign}[1]{\mathrm{sign}\block{#1}} \newcommand{\half}{\frac{1}{2}} \newcommand{\inner}[1]{\langle #1 \rangle} \newcommand{\dd}{\mathrm{d}} \newcommand{\ddd}[2]{\frac{\partial #1}{\partial #2} } \newcommand{\db}{\dd^b} \newcommand{\ds}{\dd^s} \newcommand{\dL}{\dd_L} \newcommand{\dR}{\dd_R} \newcommand{\Ad}{\mathrm{Ad}} \newcommand{\ad}{\mathrm{ad}} \newcommand{\LL}{\mathcal{L}} \newcommand{\Krylov}{\mathcal{K}} \newcommand{\Span}[1]{\mathrm{Span}\block{#1}} \newcommand{\diag}{\mathrm{diag}} \newcommand{\tr}{\mathrm{tr}} \newcommand{\sinc}{\mathrm{sinc}} \newcommand{\cat}[1]{\mathcal{#1}} \newcommand{\Ob}[1]{\mathrm{Ob}\block{\cat{#1}}} \newcommand{\Hom}[1]{\mathrm{Hom}\block{\cat{#1}}} \newcommand{\op}[1]{\cat{#1}^{op}} \newcommand{\hom}[2]{\cat{#1}\block{#2}} \newcommand{\id}{\mathrm{id}} \newcommand{\Set}{\mathbb{Set}} \newcommand{\Cat}{\mathbb{Cat}} \newcommand{\Hask}{\mathbb{Hask}} \newcommand{\lim}{\mathrm{lim}\ } \newcommand{\funcat}[1]{\left[\cat{#1}\right]} \newcommand{\natsq}[6]{ \begin{matrix} & #2\block{#4} & \overset{#2\block{#6}}\longrightarrow & #2\block{#5} & \\ {#1}_{#4} \hspace{-1.5em} &\downarrow & & \downarrow & \hspace{-1.5em} {#1}_{#5}\\ & #3\block{#4} & \underset{#3\block{#6}}\longrightarrow & #3\block{#5} & \\ \end{matrix} } \newcommand{\comtri}[6]{ \begin{matrix} #1 & \overset{#4}\longrightarrow & #2 & \\ #6 \hspace{-1em} & \searrow & \downarrow & \hspace{-1em} #5 \\ & & #3 & \end{matrix} } \newcommand{\natism}[6]{ \begin{matrix} & #2\block{#4} & \overset{#2\block{#6}}\longrightarrow & #2\block{#5} & \\ {#1}_{#4} \hspace{-1.5em} &\downarrow \uparrow & & \downarrow \uparrow & \hspace{-1.5em} {#1}_{#5}\\ & #3\block{#4} & \underset{#3\block{#6}}\longrightarrow & #3\block{#5} & \\ \end{matrix} } \newcommand{\cone}[1]{\mathcal{#1}} $$

Krylov Methods

  1. Krylov Subspaces
  2. Tridiagonalization
  3. The Lanczos Method
    1. Gradient Methods
    2. Conjugate Gradients
    3. MINRES
  4. Notes & References

Krylov Subspaces

\[\Krylov_k(A, b) = \Span{b, Ab, A^2b, \ldots, A^k b}\]

The dimension of \(\Krylov_k(A, b)\) is related to the the minimum polynomial of \(b\) with respect to \(A\), that is the lowest-degree non-zero polynomial \(p\) such that \(p(A)v = 0\).

Tridiagonalization

Let us construct an orthogonal basis for the family \(\left\{b, Ab, \ldots, A^k b\right\}\) using the Graham-Schmidt process, and arrange the basis vectors \(q_0, q_1, \dots, q_k\) by column in a matrix \(Q_k\). If the matrix \(A\) is symmetric, then \(Q_k\) satisifies:

\[Q_k^T A Q_k = T_k\]

where \(T_k\) is tridiagonal. To see this, consider \(T_{ij}\) for \(i>j + 1\). Clearly, \(T_{ij} = q_i^T A q_j\) and \(Aq_j \in \Krylov_{j+1}(A, b)\), of which \(Q_{j+1}\)’s columns form an orthogonal basis. But since \(Q_i\)’s columns are also orthogonal, \(q_i\) is orthogonal to every column of \(Q_{j+1}\). Therefore \(q_i^T A q_j = 0\) and \(T\) is lower-Hessenberg. But since \(T\) is also symmetric, \(T\) is tridiagonal. Now, an orthogonal basis \(Q \in O(n)\) for the complete space \(\Krylov_n(A, b)\) satisfies:

\[Q^T A Q = T\]

or, equivalently:

\[AQ = QT\]

By the same token, if \(M\) is a non-standard inner product and \(A\) is auto-adjoint for \(M\) (i.e. \(MA\) is symmetric), then we have the following factorization:

\[Q_k^T MA Q_k = T_k\]

where \(T_k\) is tridiagonal. The complete factorization satisfies:

\[Q^T MA Q = T\]

where \(Q^T M Q = I\), and we again obtain:

\[A Q = Q T\]

The Lanczos Method

The Lanczos Method is a simple iterative procedure to compute the above tridiagonal factorization columnwise. Let us rewrite \(T\) as:

\[T = \mat{\alpha_1 & \beta_1 & \\ \beta_1 & \alpha_2 & \beta_2 \\ & \beta_2 &\alpha_3 & \beta_3 \\ & & \ddots & \ddots & \ddots \\ & & & \beta_{n-1} & \alpha_n }\]

The \(k\)-th column of \(T\) satisfies:

\[Aq_k = Q t_k\]

In other words:

\[A q_k = \beta_{k-1} q_{k-1} + \alpha_k q_k + \beta_k q_{k+1}\]

where \(\alpha_k = q_k^T A q_k\). This provides a recursion scheme for computing \(q_{k+1}\) from previous values \(q_k, q_{k-1}\). Let us introduce \(\beta_0 q_0 = 0\), the Lanczos iteration is defined as:

\[\beta_k q_{k+1} = A q_k - \alpha_k q_k - \beta_{k-1} q_{k-1}\]

where \(\beta_k\) is chosen so that \(\norm{q_{k+1}} = 1\). Likewise, using a non-standard inner product \(M\) still yields:

\[\beta_k q_{k+1} = A q_k - \alpha_k q_k - \beta_{k-1} q_{k-1}\]

except this time \(\alpha_k\) is given by:

\[\alpha_k = q_k^T MA q_k\]

and \(\beta_k\) is chosen so that \(\norm{q_{k+1}}_M = 1\).

Gradient Methods

Iterative methods for solving linear systems \(Ax=b\) usually minimize some energy function \(E(x)\) by taking steps along the gradient of \(E\). For \(E(x) = \half x^T A x - b^T x\), the update rule is of the form:

\[x_{k+1} = x_k - \alpha_k \block{A x_k - b}\]

One can easily check that \(x_k \in \Krylov_k \Rightarrow x_{k+1} \in \Krylov_{k+1}\), and Krylov methods generalize gradient methods by projecting the solution onto nested Krylov subspaces. If one can build a sufficiently nice basis for these subspaces, one can hope to find a better approximation of the initial problem as we iterate. For instance, the Conjugate Gradient method satisfies:

\[x_k = \argmin{x \in \Krylov_k}\ \half x^T A x - b^T x\]

while the MINRES1 method satisfies:

\[x_k = \argmin{x \in \Krylov_k}\ \norm{A x - b}^2\]

Conjugate Gradients

\[x_k = \argmin{x \in \Krylov_k}\ \half x^T A x - b^T x\]

Since we constructed an orthogonal basis for \(\Krylov_k\), let us use it by rewriting \(x_k = Q_k y_k\). We obtain the following problem:

\[y_k = \argmin{y}\ \half y^T Q_k^T A Q_k y - b^T Q_k y\]

That is:

\[y_k = \argmin{y}\ \half y^T T_k y - \underbrace{b^T Q_k} y\]

or equivalently: \(T_k y_k = c_k\). Now, \(Q_k b = \beta_1 e_1\) and since \(T_k\) is tridiagonal, we can maintain an incremental \(LDL^T\) factorization to efficiently update \(y\)’s coordinates.

MINRES

\[y_k = \argmin{y}\ \norm{Q_k^T A Q_k y - Q_k^Tb}^2\] \[y_k = \argmin{y}\ \norm{T_k y - Q_k^Tb}^2\]

Here, an incremental QR factorization of \(T_k\) is maintained. See Choi’s thesis2 for more details.

Notes & References

  1. C. C. Paige and M. A. Saunders (1975). Solution of sparse indefinite systems of linear equations, SIAM J. Numerical Analysis 12, 617-629. 

  2. S.-C. Choi (2006). Iterative Methods for Singular Linear Equations and Least-Squares Problems, PhD thesis, Stanford University.