Krylov Methods

$$ \newcommand{\RR}{\mathbb{R}} \newcommand{\GG}{\mathbb{G}} \newcommand{\PP}{\mathbb{P}} \newcommand{\PS}{\mathcal{P}} \newcommand{\SS}{\mathbb{S}} \newcommand{\NN}{\mathbb{N}} \newcommand{\ZZ}{\mathbb{Z}} \newcommand{\CC}{\mathbb{C}} \newcommand{\HH}{\mathbb{H}} \newcommand{\ones}{\mathbb{1\hspace{-0.4em}1}} \newcommand{\alg}[1]{\mathfrak{#1}} \newcommand{\mat}[1]{ \begin{pmatrix} #1 \end{pmatrix} } \renewcommand{\bar}{\overline} \renewcommand{\hat}{\widehat} \renewcommand{\tilde}{\widetilde} \newcommand{\inv}[1]{ {#1}^{-1} } \newcommand{\eqdef}{\overset{\text{def}}=} \newcommand{\block}[1]{\left(#1\right)} \newcommand{\set}[1]{\left\{#1\right\}} \newcommand{\abs}[1]{\left|#1\right|} \newcommand{\trace}[1]{\mathrm{tr}\block{#1}} \newcommand{\norm}[1]{ \left\| #1 \right\| } \newcommand{\argmin}[1]{ \underset{#1}{\mathrm{argmin}} } \newcommand{\argmax}[1]{ \underset{#1}{\mathrm{argmax}} } \newcommand{\st}{\ \mathrm{s.t.}\ } \newcommand{\sign}[1]{\mathrm{sign}\block{#1}} \newcommand{\half}{\frac{1}{2}} \newcommand{\inner}[1]{\langle #1 \rangle} \newcommand{\dd}{\mathrm{d}} \newcommand{\ddd}[2]{\frac{\partial #1}{\partial #2} } \newcommand{\db}{\dd^b} \newcommand{\ds}{\dd^s} \newcommand{\dL}{\dd_L} \newcommand{\dR}{\dd_R} \newcommand{\Ad}{\mathrm{Ad}} \newcommand{\ad}{\mathrm{ad}} \newcommand{\LL}{\mathcal{L}} \newcommand{\Krylov}{\mathcal{K}} \newcommand{\Span}[1]{\mathrm{Span}\block{#1}} \newcommand{\diag}{\mathrm{diag}} \newcommand{\tr}{\mathrm{tr}} \newcommand{\sinc}{\mathrm{sinc}} \newcommand{\cat}[1]{\mathcal{#1}} \newcommand{\Ob}[1]{\mathrm{Ob}\block{\cat{#1}}} \newcommand{\Hom}[1]{\mathrm{Hom}\block{\cat{#1}}} \newcommand{\op}[1]{\cat{#1}^{op}} \newcommand{\hom}[2]{\cat{#1}\block{#2}} \newcommand{\id}{\mathrm{id}} \newcommand{\Set}{\mathbb{Set}} \newcommand{\Cat}{\mathbb{Cat}} \newcommand{\Hask}{\mathbb{Hask}} \newcommand{\lim}{\mathrm{lim}\ } \newcommand{\funcat}[1]{\left[\cat{#1}\right]} \newcommand{\natsq}[6]{ \begin{matrix} & #2\block{#4} & \overset{#2\block{#6}}\longrightarrow & #2\block{#5} & \\ {#1}_{#4} \hspace{-1.5em} &\downarrow & & \downarrow & \hspace{-1.5em} {#1}_{#5}\\ & #3\block{#4} & \underset{#3\block{#6}}\longrightarrow & #3\block{#5} & \\ \end{matrix} } \newcommand{\comtri}[6]{ \begin{matrix} #1 & \overset{#4}\longrightarrow & #2 & \\ #6 \hspace{-1em} & \searrow & \downarrow & \hspace{-1em} #5 \\ & & #3 & \end{matrix} } \newcommand{\natism}[6]{ \begin{matrix} & #2\block{#4} & \overset{#2\block{#6}}\longrightarrow & #2\block{#5} & \\ {#1}_{#4} \hspace{-1.5em} &\downarrow \uparrow & & \downarrow \uparrow & \hspace{-1.5em} {#1}_{#5}\\ & #3\block{#4} & \underset{#3\block{#6}}\longrightarrow & #3\block{#5} & \\ \end{matrix} } \newcommand{\cone}[1]{\mathcal{#1}} $$

Krylov Methods

Krylov Subspaces

The dimension of $\Krylov_k(A, b)$ is related to the the minimum polynomial of $b$ with respect to $A$, that is the lowest-degree non-zero polynomial $p$ such that $p(A)v = 0$.

Tridiagonalization

Let us construct an orthogonal basis for the family $\left\{b, Ab, \ldots, A^k b\right\}$ using the Graham-Schmidt process, and arrange the basis vectors $q_0, q_1, \dots, q_k$ by column in a matrix $Q_k$. If the matrix $A$ is symmetric, then $Q_k$ satisifies:

where $T_k$ is tridiagonal. To see this, consider $T_{ij}$ for $i>j + 1$. Clearly, $T_{ij} = q_i^T A q_j$ and $Aq_j \in \Krylov_{j+1}(A, b)$, of which $Q_{j+1}$’s columns form an orthogonal basis. But since $Q_i$’s columns are also orthogonal, $q_i$ is orthogonal to every column of $Q_{j+1}$. Therefore $q_i^T A q_j = 0$ and $T$ is lower-Hessenberg. But since $T$ is also symmetric, $T$ is tridiagonal. Now, an orthogonal basis $Q \in O(n)$ for the complete space $\Krylov_n(A, b)$ satisfies:

By the same token, if $M$ is a non-standard inner product and $A$ is auto-adjoint for $M$ (i.e. $MA$ is symmetric), then we have the following factorization:

The Lanczos Method

The Lanczos Method is a simple iterative procedure to compute the above tridiagonal factorization columnwise. Let us rewrite $T$ as:

where $\alpha_k = q_k^T A q_k$. This provides a recursion scheme for computing $q_{k+1}$ from previous values $q_k, q_{k-1}$. Let us introduce $\beta_0 q_0 = 0$, the Lanczos iteration is defined as:

where $\beta_k$ is chosen so that $\norm{q_{k+1}} = 1$. Likewise, using a non-standard inner product $M$ still yields:

Gradient Methods

Iterative methods for solving linear systems $Ax=b$ usually minimize some energy function $E(x)$ by taking steps along the gradient of $E$. For $E(x) = \half x^T A x - b^T x$, the update rule is of the form:

One can easily check that $x_k \in \Krylov_k \Rightarrow x_{k+1} \in \Krylov_{k+1}$, and Krylov methods generalize gradient methods by projecting the solution onto nested Krylov subspaces. If one can build a sufficiently nice basis for these subspaces, one can hope to find a better approximation of the initial problem as we iterate. For instance, the Conjugate Gradient method satisfies:

Conjugate Gradients

Since we constructed an orthogonal basis for $\Krylov_k$, let us use it by rewriting $x_k = Q_k y_k$. We obtain the following problem:

or equivalently: $T_k y_k = c_k$. Now, $Q_k b = \beta_1 e_1$ and since $T_k$ is tridiagonal, we can maintain an incremental $LDL^T$ factorization to efficiently update $y$’s coordinates.

MINRES

Here, an incremental QR factorization of $T_k$ is maintained. See Choi’s thesis² for more details.

Krylov Methods

Krylov Subspaces

Tridiagonalization

The Lanczos Method

Gradient Methods

Conjugate Gradients

MINRES

Notes & References