Multivariate Calculus

Differentials¶

$d f = f_{x} d x + f_{y} d y + f_{z} d z$
$d f \neq Δ f$
$Δ f = f_{x} Δ x + f_{y} Δ y + f_{z} Δ z$
$d f$ is used to encode infinitesimal changes
used to act as a placegolder value
divide wrt time to get rate of change $\to$ CHAIN RULE

Chain Rule with More Variables¶

Let $w = f (x, y)$ when $x (u, v), y (u, v)$ then,

d w = f_{x} d x + f_{y} d y = (f_{x} x_{u} + f_{y} y_{u}) d u + (f_{x} x_{v} + f_{y} y_{v}) d v = \frac{\partial f}{\partial u} d u + \frac{\partial f}{\partial v} d v

Gradient Vector¶

\frac{d w}{d t} = w_{x} \frac{d x}{d t} + w_{y} \frac{d y}{d t} + w_{z} \frac{d z}{d t} = \vec{\nabla} w . \frac{d \vec{r}}{d t}

Note: $\vec{\nabla} w ⊥ level surfaces$ (tangent to the level surface at any given point)

Directional Derivatives¶

\frac{d w}{d s} |_{\hat{u}} = \vec{\nabla} w \cdot \frac{d \vec{r}}{d s} = \vec{\nabla} w \cdot \hat{u}

Implications¶

Direction of $\vec{\nabla} w$ is the direction of fastest increase of $w$

Lagrange Multipliers¶

Goal: minima/maximize a multi-variable function ( $m i n / m a x f (x, y, z)$ ) where $x, y, z$ are not independent and $\exists$ $g (x, y, z) = c$ .

These can be obtained on combining the given restraints with the following.

\vec{\nabla} f = λ \vec{\nabla} g

Basic idea: to find $(x, y)$ where the level curves of $f$ and $g$ are tangent to each other ( $\vec{\nabla} f ∥ \vec{\nabla} g$ ).

Note: Take care that the point is indeed a maxima or minima as required and not just a saddle point (second derivative test won't be applicable so be vigilant).

Functions	Example	Value	First derivative	Second derivative
$f : R \to R$	$x^{2}$	$R$	$R$	$R$
$f : R^{d} \to R$	loss function	$R$	$R^{d} [gradient]$	$R^{d \times d} [hessian]$
$f : R^{d} \to R^{p}$	neural net layer	$R^{p}$	$R^{d \times p} [jacobian]$	$R^{d \times p \times p}$

Gradient¶

\nabla_{x} f (x) = \nabla_{x} f (x_{1}, \dots, x_{d}) = [\begin{matrix} \frac{\partial f (x)}{\partial x_{1}} \\ ⋮ \\ \frac{\partial f (x)}{\partial x_{d}} \end{matrix}]

\nabla_{A} f (A) = [\begin{matrix} \frac{\partial f (A)}{\partial A_{11}} & \dots \\ ⋮ & ⋮ \\ \dots & \frac{\partial f (A)}{\partial A_{m n}} \end{matrix}]

Hessian¶

We have $f : R^{d} \to R^{p}$ thus, $f (x_{1}, \dots, x_{d}) = [\begin{matrix} f_{1} (x_{1}, \dots, x_{d}) \\ ⋮ \\ f_{p} (x_{1}, \dots, x_{d}) \end{matrix}]$

Note: Hessians are square-symmetric matrices.

\nabla_{x}^{2} f (x) = [\begin{matrix} \frac{\partial^{2} f (x)}{\partial x_{1}^{2}} & \frac{\partial^{2} f (x)}{\partial x_{1} x_{2}} & \dots \\ ⋮ & ⋱ & ⋮ \\ ⋮ & \dots & \frac{\partial^{2} f (x)}{\partial x_{n}^{2}} \end{matrix}]

Jacobian¶

J = [\begin{matrix} \frac{\partial f}{\partial x_{1}} & \dots & \frac{\partial f}{\partial x_{d}} \end{matrix}] = [\begin{matrix} \nabla^{T} f_{1} \\ ⋮ \\ \nabla^{T} f_{p} \end{matrix}] = [\begin{matrix} \frac{\partial f_{1}}{\partial x_{1}} & \dots & \frac{\partial f_{1}}{\partial x_{n}} \\ ⋮ & ⋱ & ⋮ \\ \frac{\partial f_{p}}{\partial x_{1}} & \dots & \frac{\partial f_{p}}{\partial x_{d}} \end{matrix}]

where $\nabla^{T} f_{i}$ is the transpose (row vector) of the gradient of the $i$ component.

Examples¶

$\nabla_{x} b^{T} x = b$
$\nabla_{x}^{2} b^{T} x = 0$
$\nabla_{x} x^{T} A x = 2 A x$ , if $A$ is symmetric
$\nabla_{x}^{2} x^{T} A x = 2 A$ , if $A$ is symmetric