Skip to content

Multivariate Calculus

Differentials

  • df=fxdx+fydy+fzdz
  • dfΔf
  • Δf=fxΔx+fyΔy+fzΔz
  • df is used to encode infinitesimal changes
  • used to act as a placegolder value
  • divide wrt time to get rate of change CHAIN RULE

Chain Rule with More Variables

Let w=f(x,y) when x(u,v),y(u,v) then,

dw=fxdx+fydy=(fxxu+fyyu)du+(fxxv+fyyv)dv=fudu+fvdv

Gradient Vector

dwdt=wxdxdt+wydydt+wzdzdt=w.drdt

Note: w  level surfaces (tangent to the level surface at any given point)

Directional Derivatives

dwds|u^=wdrds=wu^

Implications

Direction of w is the direction of fastest increase of w

Lagrange Multipliers

Goal: minima/maximize a multi-variable function (min/max  f(x,y,z)) where x,y,z are not independent and g(x,y,z)=c.

These can be obtained on combining the given restraints with the following.

f=λg

Basic idea: to find (x,y) where the level curves of f and g are tangent to each other (fg).

Note: Take care that the point is indeed a maxima or minima as required and not just a saddle point (second derivative test won't be applicable so be vigilant).

Functions Example Value First derivative Second derivative
f:RR x2 R R R
f:RdR loss function R Rd[gradient] Rd×d[hessian]
f:RdRp neural net layer Rp Rd×p[jacobian] Rd×p×p

Gradient

xf(x)=xf(x1,,xd)=[f(x)x1f(x)xd]
Af(A)=[f(A)A11f(A)Amn]

Hessian

We have f:RdRp thus, f(x1,,xd)=[f1(x1,,xd)fp(x1,,xd)]

Note: Hessians are square-symmetric matrices.

x2f(x)=[2f(x)x122f(x)x1x22f(x)xn2]

Jacobian

J=[fx1fxd]=[Tf1Tfp]=[f1x1f1xnfpx1fpxd]

where Tfi is the transpose (row vector) of the gradient of the i component.

Examples

  • xbTx=b
  • x2bTx=0
  • xxTAx=2Ax, if A is symmetric
  • x2xTAx=2A, if A is symmetric