NOTES | HOME
$$ \newcommand{\RR}{\mathbb{R}} \newcommand{\GG}{\mathbb{G}} \newcommand{\PP}{\mathbb{P}} \newcommand{\PS}{\mathcal{P}} \newcommand{\SS}{\mathbb{S}} \newcommand{\NN}{\mathbb{N}} \newcommand{\ZZ}{\mathbb{Z}} \newcommand{\CC}{\mathbb{C}} \newcommand{\HH}{\mathbb{H}} \newcommand{\ones}{\mathbb{1\hspace{-0.4em}1}} \newcommand{\alg}[1]{\mathfrak{#1}} \newcommand{\mat}[1]{ \begin{pmatrix} #1 \end{pmatrix} } \renewcommand{\bar}{\overline} \renewcommand{\hat}{\widehat} \renewcommand{\tilde}{\widetilde} \newcommand{\inv}[1]{ {#1}^{-1} } \newcommand{\eqdef}{\overset{\text{def}}=} \newcommand{\block}[1]{\left(#1\right)} \newcommand{\set}[1]{\left\{#1\right\}} \newcommand{\abs}[1]{\left|#1\right|} \newcommand{\trace}[1]{\mathrm{tr}\block{#1}} \newcommand{\norm}[1]{ \left\| #1 \right\| } \newcommand{\argmin}[1]{ \underset{#1}{\mathrm{argmin}} } \newcommand{\argmax}[1]{ \underset{#1}{\mathrm{argmax}} } \newcommand{\st}{\ \mathrm{s.t.}\ } \newcommand{\sign}[1]{\mathrm{sign}\block{#1}} \newcommand{\half}{\frac{1}{2}} \newcommand{\inner}[1]{\langle #1 \rangle} \newcommand{\dd}{\mathrm{d}} \newcommand{\ddd}[2]{\frac{\partial #1}{\partial #2} } \newcommand{\db}{\dd^b} \newcommand{\ds}{\dd^s} \newcommand{\dL}{\dd_L} \newcommand{\dR}{\dd_R} \newcommand{\Ad}{\mathrm{Ad}} \newcommand{\ad}{\mathrm{ad}} \newcommand{\LL}{\mathcal{L}} \newcommand{\Krylov}{\mathcal{K}} \newcommand{\Span}[1]{\mathrm{Span}\block{#1}} \newcommand{\diag}{\mathrm{diag}} \newcommand{\tr}{\mathrm{tr}} \newcommand{\sinc}{\mathrm{sinc}} \newcommand{\cat}[1]{\mathcal{#1}} \newcommand{\Ob}[1]{\mathrm{Ob}\block{\cat{#1}}} \newcommand{\Hom}[1]{\mathrm{Hom}\block{\cat{#1}}} \newcommand{\op}[1]{\cat{#1}^{op}} \newcommand{\hom}[2]{\cat{#1}\block{#2}} \newcommand{\id}{\mathrm{id}} \newcommand{\Set}{\mathbb{Set}} \newcommand{\Cat}{\mathbb{Cat}} \newcommand{\Hask}{\mathbb{Hask}} \newcommand{\lim}{\mathrm{lim}\ } \newcommand{\funcat}[1]{\left[\cat{#1}\right]} \newcommand{\natsq}[6]{ \begin{matrix} & #2\block{#4} & \overset{#2\block{#6}}\longrightarrow & #2\block{#5} & \\ {#1}_{#4} \hspace{-1.5em} &\downarrow & & \downarrow & \hspace{-1.5em} {#1}_{#5}\\ & #3\block{#4} & \underset{#3\block{#6}}\longrightarrow & #3\block{#5} & \\ \end{matrix} } \newcommand{\comtri}[6]{ \begin{matrix} #1 & \overset{#4}\longrightarrow & #2 & \\ #6 \hspace{-1em} & \searrow & \downarrow & \hspace{-1em} #5 \\ & & #3 & \end{matrix} } \newcommand{\natism}[6]{ \begin{matrix} & #2\block{#4} & \overset{#2\block{#6}}\longrightarrow & #2\block{#5} & \\ {#1}_{#4} \hspace{-1.5em} &\downarrow \uparrow & & \downarrow \uparrow & \hspace{-1.5em} {#1}_{#5}\\ & #3\block{#4} & \underset{#3\block{#6}}\longrightarrow & #3\block{#5} & \\ \end{matrix} } \newcommand{\cone}[1]{\mathcal{#1}} $$

Differential Geometry

  1. Implicit Function Theorem
  2. Cheat Sheet
    1. Chain Rule
    2. Inner Product
    3. Squared Norm
    4. Squared Root
    5. Norm
    6. Inverse
    7. Normalization
    8. Cross Product
    9. Cross Product Norm

Implicit Function Theorem

Consider a smooth function \(g: \RR^m \to \RR^n\) and the preimage of \(0 \in \RR^n\) by \(g\):

\[X = g^{-1}(0)\]

For instance, let us start with the case when \(g\) is linear \(X = \ker(g)\). Calling \(G\) the matrix of \(g\), then \(X\) is defined as:

\[X = {x \in \RR^n, \quad Gx = 0}\]

When \(m > n\) and \(G\) is full-rank (in this case: \(n\)), then all \(n \times n\) submatrices of \(G\) are invertible. In particular, we may reorder and partition coordinates so that

\[G = \mat{L & R}\]

where \(R \in GL(n)\) is invertible and \(L\) is \(n \times (m - n)\). In this case we obtain that:

\[\begin{align} G x = 0 &\iff L x_L + R x_R = 0 \\ &\iff x_R = \inv{R}L x_L \\ \end{align}\]

In other words, \(x_R\) can be seen as a (linear) function of \(x_L\). The implicit function theorem is an extension of this to the non-linear case.

Coming back to the non-linear case, we will now assume that \(g\) is smooth and that is differential \(\dd g\) is full-rank (again: \(n\)) at some interior point \(x \in \mathrm(X)\) of \(X\). As a consequence, there exists some open set \(U \subset X\) containing \(x\) such that \(\dd g\) is everywhere full-rank on \(U\). Since \(g\) is constant on \(X\), it is also constant on \(U\), so for every tangent vector \(\dd u\), we get:

\[\dd g(u).\dd u = 0\]

Again, since \(\dd g(u)\) is everywhere full-rank over \(U\), we may reorder/partition coordinates globally on \(U\) such that

\[\dd g(u) = \mat{L(u) & R(u)}\]

with \(L(u), R(u)\) as in the linear case. In particular, we also have:

\[\begin{align} \dd g(u).\dd u = 0 &\iff L(u).\dd u_L + R(u).\dd u_R = 0 \\ &\iff \dd u_R = \inv{R}(u)L(u).\dd u_L \\ \end{align}\]

So that \(\dd u_R\) can be seen as a function of \(\dd u_L\). The key is that \(L, R\) are both smooth functions of \(u\), and inverting \(R\) is a smooth operation as well (see Lie groups) so that \(\dd u_R\) is a smooth function of \(\dd u_L\). If we fix a tangent vector \(\dd u_L\) and an initial condition \(u = \block{u_L, u_R}\), then we obtain the following ordinary differential equation for \(u_R\):

\[\dot{u}_R = \inv{R}\block{u_L, u_R} L\block{u_L, u_R}.\dd u_L\]

(to be continued)

Cheat Sheet

Chain Rule

\[h(x) = f(g(x))\] \[\dd h(x).\dd x = \dd f(g(x)).\dd g(x).\dd x\] \[\dd^2 h(x).\dd x_1.\dd x_2 = \dd^2 f(g(x)).\dd g(x).\dd x_2.\dd g(x).\dd x_1 + \dd f(g(x)).\dd^2 g(x).\dd x_1.\dd x_2\]

Inner Product

\[f(x, y) = x^T y\] \[\dd f(x, y).\dd x, \dd y = \dd x^Ty + x^T \dd y\] \[\nabla f(x, y) = \mat{y \\ x}\] \[\dd^2 f(x, y).\dd x_1, \dd y_1.\dd x_2, \dd y_2 = \dd x_1^T\dd y_2 + \dd x_2^T \dd y_1\] \[\nabla^2 f = \mat{0 & I \\ I & 0}\]

Squared Norm

\[f(x, y) = \norm{x}^2 = x^T x\] \[\dd f(x).\dd x = 2x^T \dd x\] \[\nabla f(x) = 2x\] \[\dd^2 f(x).\dd x_1.\dd x_2 = 2\dd x_2^T \dd x_1\] \[\nabla^2 f(x) = 2I\]

Squared Root

\[f(x) = \sqrt{x}\] \[\dd f(x) = \frac{1}{2 \sqrt{x}}\] \[\dd^2 f(x) = -\frac{1}{4x\sqrt{x}}\]

Norm

\[f(x) = \norm{x} = \sqrt{\norm{x}^2}\] \[\dd f(x).\dd x = \frac{x^T}{\norm{x}} \dd x\] \[\nabla f(x) = \frac{x}{\norm{x}}\] \[\dd^2 f(x).\dd x_1.\dd x_2 = \dd x_2^T \frac{1}{\norm{x}}\block{I - \frac{x}{\norm{x}}\frac{x^T}{\norm{x}}} \dd x_1\] \[\nabla^2 f(x) = \frac{1}{\norm{x}} \block{I - \frac{x}{\norm{x}}\frac{x^T}{\norm{x}}}\]

Inverse

\[f(x) = \frac{1}{x}\] \[\dd f(x) = -\frac{1}{x^2}\] \[\dd^2 f(x) = \frac{2}{x^3}\]

Normalization

\[f(x) = \frac{x}{\norm{x}}\] \[\begin{align} \dd f(x).\dd x &= \frac{1}{\norm{x}}\dd x + x\block{-\frac{1}{\norm{x}^2}\frac{1}{\norm{x}} x^T \dd x}\\ &= \frac{1}{\norm{x}}\underbrace{\block{I - \frac{x}{\norm{x}}\frac{x^T}{\norm{x}}}}_{P(x)}.\dd x \\ \end{align}\] \[\begin{align} \lambda^T\dd f(x).\dd x_1 &= \frac{1}{\norm{x}}\underbrace{\block{\lambda^T\dd x_1 - \lambda^T f(x) f(x)^T \dd x_1}}_{K(x)^T\dd x_1 = \lambda^T P(x).\dd x_1}\\ \lambda^T \dd^2 f(x).\dd x_2.\dd x_1 &= -\frac{1}{\norm{x}^2}\frac{x^T\dd x_2}{\norm{x}} K(x)^T\dd x_1 + \frac{1}{\norm{x}}\dd K(x)^T.\dd x_2.\dd x_1 \\ &= -\frac{1}{\norm{x}^2}\dd x_2^T f(x) K(x)^T\dd x_1 + \frac{1}{\norm{x}} \dd K(x)^T.\dd x_2.\dd x_1 \\ \dd K(x)^T.\dd x_2.\dd x_1 &= -\lambda^T \dd f(x).\dd x_2. f(x)^T \dd x_1 - \lambda^T f(x) \block{\dd f(x).\dd x_2}^T.\dd x_1\\ &= -\frac{1}{\norm{x}}K(x)^T.\dd x_2.f(x)^T \dd x_1 - \dd x_2^T \block{\lambda^T f(x)} \underbrace{\dd f(x)^T}_{\dd f(x)}.\dd x_1 \\ \lambda^T \dd^2 f(x).\dd x_2.\dd x_1 &= -\frac{1}{\norm{x}^2} \dd x_2^T \block{f(x)K(x)^T + K(x)f(x)^T} \dd x_1 - \dd x_2^T \frac{\lambda^T f(x)}{\norm{x}} \dd f(x) \dd x_1 \\ &= -\frac{1}{\norm{x}^3}\dd x_2^T \block{\lambda^Tx P(x) + P(x) \lambda x^T + \lambda^Tx P(x)}\dd x_1 \end{align}\]

(phhhew.)

Cross Product

\[f(x, y) = x \times y\] \[\begin{align} \dd f(x, y).\dd x.\dd y &= \dd x \times y + x \times \dd y \\ &= \mat{-\hat{y} & \hat{x}} \mat{\dd x \\ \dd y} \end{align}\] \[\begin{align} \lambda^T\dd^2 f(x, y) &= \lambda^T \dd x_1 \times \dd y_2 + \lambda^T \dd x_2 \times \dd y_1 \\ &= -\dd x_1^T \hat{\lambda} \dd y_2 - \dd x_2^T \hat{\lambda} \dd y_1 \\ &= \dd y_2^T \hat{\lambda} \dd x_1 - \dd x_2^T \hat{\lambda} \dd y_1 \\ &= \mat{\dd x_2^T & \dd y_2^T} \mat{0 & -\hat{\lambda} \\ \hat{\lambda} & 0} \mat{\dd x_1 \\ \dd y_1} \\ \end{align}\]

Cross Product Norm

\[h(x, y) = \norm{x \times y} = (f \circ g) (x, y)\]

where \(g(x, y) = x \times y\) and \(f(z) = \norm{z}\)

\[\begin{align} \dd h(x, y).\dd x. \dd y &= \frac{z^T}{\norm{z}}.\mat{-\hat{y} & \hat{x}} \mat{\dd x \\ \dd y} \\ \end{align}\] \[\begin{align} \lambda^T \dd^2 h(x, y) &= \lambda^T \dd^2 f(z).\dd g_1.\dd g_2 + \lambda^T \dd f(z).\dd^2 g\\ &= \mat{\hat{y}^T K \hat{y} & -\hat{y}^T K \hat{x} \\ -\hat{x}^T K \hat{y} & \hat{x}^T K \hat{x} } + \mat{0 & -\hat{\tau} \\ \hat{\tau} & 0} \end{align}\]

where \(K = \lambda^T \dd^2 f(z)\) and \(\tau^T = \lambda^T \dd f(z)\)