Skip to content

Probability

Axioms

  1. P(A)0
  2. P(AB)=P(A)+P(B), where both are disjoint sets
  3. P(Ω)=1

Properties

P(AB)=P(A)+P(B)P(AB)
P(A)P(B), if AB

Conditional Probability

P(A|B)=P(AB)P(B)

Total Probability

P(B)=P(A1B)++P(AnB)
=P(Ai)P(B|Ai)

Here, Ai’s are independent.

Independence

If A and B are independent, we have the following.

P(A|B)=P(A)
P(AB)=P(A)P(B)

Conditional Independence

P(AB|C)=P(A|C)P(B|C)

Expectation and Variance

E[g(X)]=xg(x)pX(x)
Var[X]=E[(XE[X])2]
Var[X]=E[X2]E[X]2

Linearity

E[aX+b]=aE[X]+b
Var[aX+b]=a2 Var[X]
E[aX+bY+c]=aE[X]+bE[Y]+c

Probability Mass Functions

pX,Y(x,y)=P(X=x,Y=y)

Marginal PMFs

pX(x)=ypX,Y(x,y)
E[g(X,Y)]=xyg(x,y)pX,Y(x,y)

Conditionals

pX|A(x)=P(X=x|A), such that xpX|A(x)=1
pX(x)=P(Ai)px|Ai(x)
pX,Y(x,y)=pY(y)pX|Y(x|y)
pX=ypY(y)pX|Y(x|y)
E[g(X)|A]=xg(x)pX|A(x)
E[g(X)|Y=y]=xg(x)pX|Y(x|y)
E[X]=P(Ai)E[x|Ai]

Independence strikes back

  • pX|A(x)=pX(x)
  • E[XY]=E[X]E[Y]
  • Var[X+Y]=Var[X]+Var[Y]
  • pX,Y(x,y)=pX(x)pY(y), xX and yY

Continuity and Lovely Curves

P(XB)=BfX(x)dx, where fX(x)0
E[g(X)]=g(x)fX(x)dx

Cumulative Distributions

FX(x)=P(Xx)
FX(x)=xfX(x)dx

Conditionals

BfX|A(x)dx=P(XB|A)
fX(x)=P(Ai)fx|Ai(x)
fX,Y(x,y)=fY(y)fX|Y(x|y)
fX=fY(y)fX|Y(x|y)

Conditional Expectation

E[g(X)|A]=Ωg(x)fX|A(x)
E[g(X)|Y=y]=Ωg(x)fX|Y(x|y)
E[X]=P(Ai)E[x|Ai]

Bayes' Theorem

You can of course interchange p with f to account for continuous random variables.

pX(x)pY|X(y|x)=pY(y)pX|Y(x|Y)

Gimme More

FY(y)=P(g(X)y)=x|g(x)yfX(x)dx

where Y=g(X).

Also, if Y=aX+b, then we have fY(y)=1|a|fX(yab).

Correlations

Cov(X,Y)=E[(XE[X])(YE[Y])]=E[XY]E[X]E[Y]
Var[X+Y]=Var[X]+Var[Y]+2Cov[X,Y]
Cor(X,Y)=ρ(X,Y)=Cov(X,Y)var(X)var(Y)

Law of iterated expectations

E[E[X|Y]]=E[X]

Law of total variance

Var[X]=E[Var[X|Y]]+Var[E[X|Y]]

Limits of the land

  1. If X takes only non-negative values then P(Xa)E[X]/a.
  2. P(|Xμ|c)σ2/c2 for all c>0.
  3. According to the idea of convergence limnP(|Xna|ϵ)=0.

Central Limit Theorem

Independent sampling of any distribution always yields a mean distribution which is normal.

Zn=X1++Xnnμσn
limnP(Znz)=ϕ(z)=12πez2/2

Law of large numbers

If you repeat an experiment independently a large number of times and average the result, what you obtain should be close to the expected value.

P(limX1++Xnn=μ)=1

Distributions

The gaussian distribution shows up in nature a lot because there are many situations in which a lot of small effects sum up to the thing you actually measure.

Poisson statistics describe situations where an event occurs randomly but has a constant probability of happening everytime.

Poisson Distribution

  • To predict the number of events occurring in the future!
  • More formally, to predict the probability of a given number of events occurring in a fixed interval of time.
  • The Poisson Distribution, on the other hand, doesn’t require you to know n or p. We are assuming n is infinitely large and p is infinitesimal. The only parameter of the Poisson distribution is the rate λ (the expected value of x). In real life, only knowing the rate (i.e., during 2pm~4pm, I received 3 phone calls) is much more common than knowing both n & p.

This gives the probability of observing k events in an interval. The average number of events in an interval is designated by λ.

P(X=k)=λkk!eλ
  • Even though the Poisson distribution models rare events, the rate λ can be any number. It doesn’t always have to be small.
  • Assumptions:
    • The average rate of events per unit time is constant.
    • Events are independent.
  • If the number of events per unit time follows a Poisson distribution, then the amount of time between events follows the exponential distribution. The Poisson distribution is discrete and the exponential distribution is continuous, yet the two distributions are closely related.

Random Shit

  1. How do you test whether a data sample is normal or not? https://en.wikipedia.org/wiki/Jarque–Bera_test

  2. Optimal theoretical size for a bet is given by the https://en.wikipedia.org/wiki/Kelly_criterion.

  3. Random variable X is distributed as N(a, b), and random variable Y is distributed as N(c, d). What is the distribution of (1) X+Y, (2) XY, (3) X×Y, (4) X/Y?

    (1) X+YN(a+c,(b2+d2+2ρbd))

    (2) X+YN(a+c,(b2+d22ρbd))

    (3) X×YN(a×c,(a2d+c2b+bd))

  4. The Monty Hall Problem