$$
\newcommand{\RR}{\mathbb{R}}
\newcommand{\GG}{\mathbb{G}}
\newcommand{\PP}{\mathbb{P}}
\newcommand{\PS}{\mathcal{P}}
\newcommand{\SS}{\mathbb{S}}
\newcommand{\NN}{\mathbb{N}}
\newcommand{\ZZ}{\mathbb{Z}}
\newcommand{\CC}{\mathbb{C}}
\newcommand{\HH}{\mathbb{H}}
\newcommand{\ones}{\mathbb{1\hspace{-0.4em}1}}
\newcommand{\alg}[1]{\mathfrak{#1}}
\newcommand{\mat}[1]{ \begin{pmatrix} #1 \end{pmatrix} }
\renewcommand{\bar}{\overline}
\renewcommand{\hat}{\widehat}
\renewcommand{\tilde}{\widetilde}
\newcommand{\inv}[1]{ {#1}^{-1} }
\newcommand{\eqdef}{\overset{\text{def}}=}
\newcommand{\block}[1]{\left(#1\right)}
\newcommand{\set}[1]{\left\{#1\right\}}
\newcommand{\abs}[1]{\left|#1\right|}
\newcommand{\trace}[1]{\mathrm{tr}\block{#1}}
\newcommand{\norm}[1]{ \left\| #1 \right\| }
\newcommand{\argmin}[1]{ \underset{#1}{\mathrm{argmin}} }
\newcommand{\argmax}[1]{ \underset{#1}{\mathrm{argmax}} }
\newcommand{\st}{\ \mathrm{s.t.}\ }
\newcommand{\sign}[1]{\mathrm{sign}\block{#1}}
\newcommand{\half}{\frac{1}{2}}
\newcommand{\inner}[1]{\langle #1 \rangle}
\newcommand{\dd}{\mathrm{d}}
\newcommand{\ddd}[2]{\frac{\partial #1}{\partial #2} }
\newcommand{\db}{\dd^b}
\newcommand{\ds}{\dd^s}
\newcommand{\dL}{\dd_L}
\newcommand{\dR}{\dd_R}
\newcommand{\Ad}{\mathrm{Ad}}
\newcommand{\ad}{\mathrm{ad}}
\newcommand{\LL}{\mathcal{L}}
\newcommand{\Krylov}{\mathcal{K}}
\newcommand{\Span}[1]{\mathrm{Span}\block{#1}}
\newcommand{\diag}{\mathrm{diag}}
\newcommand{\tr}{\mathrm{tr}}
\newcommand{\sinc}{\mathrm{sinc}}
\newcommand{\cat}[1]{\mathcal{#1}}
\newcommand{\Ob}[1]{\mathrm{Ob}\block{\cat{#1}}}
\newcommand{\Hom}[1]{\mathrm{Hom}\block{\cat{#1}}}
\newcommand{\op}[1]{\cat{#1}^{op}}
\newcommand{\hom}[2]{\cat{#1}\block{#2}}
\newcommand{\id}{\mathrm{id}}
\newcommand{\Set}{\mathbb{Set}}
\newcommand{\Cat}{\mathbb{Cat}}
\newcommand{\Hask}{\mathbb{Hask}}
\newcommand{\lim}{\mathrm{lim}\ }
\newcommand{\funcat}[1]{\left[\cat{#1}\right]}
\newcommand{\natsq}[6]{
\begin{matrix}
& #2\block{#4} & \overset{#2\block{#6}}\longrightarrow & #2\block{#5} & \\
{#1}_{#4} \hspace{-1.5em} &\downarrow & & \downarrow & \hspace{-1.5em} {#1}_{#5}\\
& #3\block{#4} & \underset{#3\block{#6}}\longrightarrow & #3\block{#5} & \\
\end{matrix}
}
\newcommand{\comtri}[6]{
\begin{matrix}
#1 & \overset{#4}\longrightarrow & #2 & \\
#6 \hspace{-1em} & \searrow & \downarrow & \hspace{-1em} #5 \\
& & #3 &
\end{matrix}
}
\newcommand{\natism}[6]{
\begin{matrix}
& #2\block{#4} & \overset{#2\block{#6}}\longrightarrow & #2\block{#5} & \\
{#1}_{#4} \hspace{-1.5em} &\downarrow \uparrow & & \downarrow \uparrow & \hspace{-1.5em} {#1}_{#5}\\
& #3\block{#4} & \underset{#3\block{#6}}\longrightarrow & #3\block{#5} & \\
\end{matrix}
}
\newcommand{\cone}[1]{\mathcal{#1}}
$$
Differential Geometry
- Implicit Function Theorem
- Cheat Sheet
- Chain Rule
- Inner Product
- Squared Norm
- Squared Root
- Norm
- Inverse
- Normalization
- Cross Product
- Cross Product Norm
Implicit Function Theorem
Consider a smooth function \(g: \RR^m \to \RR^n\) and the preimage of \(0 \in
\RR^n\) by \(g\):
\[X = g^{-1}(0)\]
For instance, let us start with the case when \(g\) is linear \(X =
\ker(g)\). Calling \(G\) the matrix of \(g\), then \(X\) is defined as:
\[X = {x \in \RR^n, \quad Gx = 0}\]
When \(m > n\) and \(G\) is full-rank (in this case: \(n\)), then all \(n \times
n\) submatrices of \(G\) are invertible. In particular, we may reorder and
partition coordinates so that
\[G = \mat{L & R}\]
where \(R \in GL(n)\) is invertible and \(L\) is \(n \times (m - n)\). In this
case we obtain that:
\[\begin{align}
G x = 0 &\iff L x_L + R x_R = 0 \\
&\iff x_R = \inv{R}L x_L \\
\end{align}\]
In other words, \(x_R\) can be seen as a (linear) function of \(x_L\). The
implicit function theorem is an extension of this to the non-linear case.
Coming back to the non-linear case, we will now assume that \(g\) is smooth and
that is differential \(\dd g\) is full-rank (again: \(n\)) at some interior
point \(x \in \mathrm(X)\) of \(X\). As a consequence, there exists some open
set \(U \subset X\) containing \(x\) such that \(\dd g\) is everywhere full-rank
on \(U\). Since \(g\) is constant on \(X\), it is also constant on \(U\), so
for every tangent vector \(\dd u\), we get:
\[\dd g(u).\dd u = 0\]
Again, since \(\dd g(u)\) is everywhere full-rank over \(U\), we may
reorder/partition coordinates globally on \(U\) such that
\[\dd g(u) = \mat{L(u) & R(u)}\]
with \(L(u), R(u)\) as in the linear case. In particular, we also have:
\[\begin{align}
\dd g(u).\dd u = 0 &\iff L(u).\dd u_L + R(u).\dd u_R = 0 \\
&\iff \dd u_R = \inv{R}(u)L(u).\dd u_L \\
\end{align}\]
So that \(\dd u_R\) can be seen as a function of \(\dd u_L\). The key is that
\(L, R\) are both smooth functions of \(u\), and inverting \(R\) is a smooth
operation as well (see Lie groups) so that \(\dd u_R\) is a
smooth function of \(\dd u_L\). If we fix a tangent vector \(\dd u_L\) and an
initial condition \(u = \block{u_L, u_R}\), then we obtain the following
ordinary differential equation for \(u_R\):
\[\dot{u}_R = \inv{R}\block{u_L, u_R} L\block{u_L, u_R}.\dd u_L\]
(to be continued)
Cheat Sheet
Chain Rule
\[h(x) = f(g(x))\]
\[\dd h(x).\dd x = \dd f(g(x)).\dd g(x).\dd x\]
\[\dd^2 h(x).\dd x_1.\dd x_2 = \dd^2 f(g(x)).\dd g(x).\dd x_2.\dd g(x).\dd x_1 + \dd f(g(x)).\dd^2 g(x).\dd x_1.\dd x_2\]
Inner Product
\[f(x, y) = x^T y\]
\[\dd f(x, y).\dd x, \dd y = \dd x^Ty + x^T \dd y\]
\[\nabla f(x, y) = \mat{y \\ x}\]
\[\dd^2 f(x, y).\dd x_1, \dd y_1.\dd x_2, \dd y_2 = \dd x_1^T\dd y_2 + \dd x_2^T \dd y_1\]
\[\nabla^2 f = \mat{0 & I \\ I & 0}\]
Squared Norm
\[f(x, y) = \norm{x}^2 = x^T x\]
\[\dd f(x).\dd x = 2x^T \dd x\]
\[\nabla f(x) = 2x\]
\[\dd^2 f(x).\dd x_1.\dd x_2 = 2\dd x_2^T \dd x_1\]
\[\nabla^2 f(x) = 2I\]
Squared Root
\[f(x) = \sqrt{x}\]
\[\dd f(x) = \frac{1}{2 \sqrt{x}}\]
\[\dd^2 f(x) = -\frac{1}{4x\sqrt{x}}\]
Norm
\[f(x) = \norm{x} = \sqrt{\norm{x}^2}\]
\[\dd f(x).\dd x = \frac{x^T}{\norm{x}} \dd x\]
\[\nabla f(x) = \frac{x}{\norm{x}}\]
\[\dd^2 f(x).\dd x_1.\dd x_2 = \dd x_2^T \frac{1}{\norm{x}}\block{I - \frac{x}{\norm{x}}\frac{x^T}{\norm{x}}} \dd x_1\]
\[\nabla^2 f(x) = \frac{1}{\norm{x}} \block{I - \frac{x}{\norm{x}}\frac{x^T}{\norm{x}}}\]
Inverse
\[f(x) = \frac{1}{x}\]
\[\dd f(x) = -\frac{1}{x^2}\]
\[\dd^2 f(x) = \frac{2}{x^3}\]
Normalization
\[f(x) = \frac{x}{\norm{x}}\]
\[\begin{align}
\dd f(x).\dd x &= \frac{1}{\norm{x}}\dd x + x\block{-\frac{1}{\norm{x}^2}\frac{1}{\norm{x}} x^T \dd x}\\
&= \frac{1}{\norm{x}}\underbrace{\block{I - \frac{x}{\norm{x}}\frac{x^T}{\norm{x}}}}_{P(x)}.\dd x \\
\end{align}\]
\[\begin{align}
\lambda^T\dd f(x).\dd x_1 &= \frac{1}{\norm{x}}\underbrace{\block{\lambda^T\dd x_1 - \lambda^T f(x) f(x)^T \dd x_1}}_{K(x)^T\dd x_1 = \lambda^T P(x).\dd x_1}\\
\lambda^T \dd^2 f(x).\dd x_2.\dd x_1 &= -\frac{1}{\norm{x}^2}\frac{x^T\dd x_2}{\norm{x}} K(x)^T\dd x_1 + \frac{1}{\norm{x}}\dd K(x)^T.\dd x_2.\dd x_1 \\
&= -\frac{1}{\norm{x}^2}\dd x_2^T f(x) K(x)^T\dd x_1 + \frac{1}{\norm{x}} \dd K(x)^T.\dd x_2.\dd x_1 \\
\dd K(x)^T.\dd x_2.\dd x_1 &= -\lambda^T \dd f(x).\dd x_2. f(x)^T \dd x_1 - \lambda^T f(x) \block{\dd f(x).\dd x_2}^T.\dd x_1\\
&= -\frac{1}{\norm{x}}K(x)^T.\dd x_2.f(x)^T \dd x_1 - \dd x_2^T \block{\lambda^T f(x)} \underbrace{\dd f(x)^T}_{\dd f(x)}.\dd x_1 \\
\lambda^T \dd^2 f(x).\dd x_2.\dd x_1 &= -\frac{1}{\norm{x}^2} \dd x_2^T \block{f(x)K(x)^T + K(x)f(x)^T} \dd x_1
- \dd x_2^T \frac{\lambda^T f(x)}{\norm{x}} \dd f(x) \dd x_1 \\
&= -\frac{1}{\norm{x}^3}\dd x_2^T \block{\lambda^Tx P(x) + P(x) \lambda x^T + \lambda^Tx P(x)}\dd x_1
\end{align}\]
(phhhew.)
Cross Product
\[f(x, y) = x \times y\]
\[\begin{align}
\dd f(x, y).\dd x.\dd y &= \dd x \times y + x \times \dd y \\
&= \mat{-\hat{y} & \hat{x}} \mat{\dd x \\ \dd y}
\end{align}\]
\[\begin{align}
\lambda^T\dd^2 f(x, y) &= \lambda^T \dd x_1 \times \dd y_2 + \lambda^T \dd x_2 \times \dd y_1 \\
&= -\dd x_1^T \hat{\lambda} \dd y_2 - \dd x_2^T \hat{\lambda} \dd y_1 \\
&= \dd y_2^T \hat{\lambda} \dd x_1 - \dd x_2^T \hat{\lambda} \dd y_1 \\
&= \mat{\dd x_2^T & \dd y_2^T} \mat{0 & -\hat{\lambda} \\ \hat{\lambda} & 0} \mat{\dd x_1 \\ \dd y_1} \\
\end{align}\]
Cross Product Norm
\[h(x, y) = \norm{x \times y} = (f \circ g) (x, y)\]
where \(g(x, y) = x \times y\) and \(f(z) = \norm{z}\)
\[\begin{align}
\dd h(x, y).\dd x. \dd y &= \frac{z^T}{\norm{z}}.\mat{-\hat{y} & \hat{x}} \mat{\dd x \\ \dd y} \\
\end{align}\]
\[\begin{align}
\lambda^T \dd^2 h(x, y) &= \lambda^T \dd^2 f(z).\dd g_1.\dd g_2 + \lambda^T \dd f(z).\dd^2 g\\
&= \mat{\hat{y}^T K \hat{y} & -\hat{y}^T K \hat{x} \\
-\hat{x}^T K \hat{y} & \hat{x}^T K \hat{x} } + \mat{0 & -\hat{\tau} \\ \hat{\tau} & 0}
\end{align}\]
where \(K = \lambda^T \dd^2 f(z)\) and \(\tau^T = \lambda^T \dd f(z)\)