Differentials
- \(df = f_xdx + f_ydy+f_zdz\)
- \(df \neq \Delta f\)
- \(\Delta f = f_x\Delta x + f_y\Delta y + f_z\Delta z\)
- \(df\) is used to encode infinitesimal changes
- used to act as a placegolder value
- divide wrt time to get rate of change \(\rightarrow\) CHAIN RULE
Chain Rule with More Variables
Let \(w = f(x, y)\) when \(x(u, v), y(u, v)\) then,
Gradient Vector
Note: \(\vec{\nabla}w \ \perp \text{ level surfaces}\) (tangent to the level surface at any given point)
Directional Derivatives
Implications
Direction of \(\vec{\nabla}w\) is the direction of fastest increase of \(w\)
Lagrange Multipliers
Goal: minima/maximize a multi-variable function (\(min/max\ \ f(x, y, z)\)) where \(x, y, z\) are not independent and \(\exists\) \(g(x, y, z) = c\).
These can be obtained on combining the given restraints with the following.
Basic idea: to find \((x, y)\) where the level curves of \(f\) and \(g\) are tangent to each other (\(\vec{\nabla}f \parallel \vec{\nabla}g\)).
Note: Take care that the point is indeed a maxima or minima as required and not just a saddle point (second derivative test won't be applicable so be vigilant).
Functions | Example | Value | First derivative | Second derivative |
---|---|---|---|---|
\(f: \mathbb R \to \mathbb R\) | \(x^2\) | \(\mathbb R\) | \(\mathbb R\) | \(\mathbb R\) |
\(f: \mathbb R^d \to \mathbb R\) | loss function | \(\mathbb R\) | \(\mathbb R^d [\text{gradient}]\) | \(\mathbb R^{d\times d} [\text{hessian}]\) |
\(f: \mathbb R^d \to \mathbb R^p\) | neural net layer | \(\mathbb R^p\) | \(\mathbb R^{d \times p} [\text{jacobian}]\) | \(\mathbb R^{d \times p \times p}\) |
Gradient
Hessian
We have \(f: \mathbb R^d \to \mathbb R^p\) thus, \(f(x_1, \cdots, x_d) = \begin{bmatrix} f_1(x_1, \cdots, x_d)\\ \vdots \\ f_p(x_1, \cdots, x_d) \end{bmatrix}\)
Note: Hessians are square-symmetric matrices.
Jacobian
where \(\nabla^{\mathrm T} f_i\) is the transpose (row vector) of the gradient of the \(i\) component.
Examples
- \(\nabla_xb^Tx = b\)
- \(\nabla_x^2 b^Tx = 0\)
- \(\nabla_xx^TAx = 2Ax\), if \(A\) is symmetric
- \(\nabla_x^2x^TAx = 2A\), if \(A\) is symmetric