Probability

Axioms¶

$P (A) \geq 0$
$P (A \cup B) = P (A) + P (B)$ , where both are disjoint sets
$P (Ω) = 1$

Properties¶

P (A \cup B) = P (A) + P (B) - P (A \cap B)

P (A) \leq P (B), if A \subseteq B

Conditional Probability¶

P (A | B) = \frac{P (A \cap B)}{P (B)}

Total Probability¶

P (B) = P (A_{1} \cap B) + \dots + P (A_{n} \cap B)

= \sum P (A_{i}) P (B | A_{i})

Here, $A_{i}$ ’s are independent.

Independence¶

If $A$ and $B$ are independent, we have the following.

P (A | B) = P (A)

P (A \cap B) = P (A) P (B)

Conditional Independence¶

P (A \cap B | C) = P (A | C) P (B | C)

Expectation and Variance¶

E [g (X)] = \sum_{x} g (x) p_{X} (x)

V a r [X] = E [(X - E [X])^{2}]

V a r [X] = E [X^{2}] - E [X]^{2}

Linearity¶

E [a X + b] = a E [X] + b

V a r [a X + b] = a^{2} V a r [X]

E [a X + b Y + c] = a E [X] + b E [Y] + c

Probability Mass Functions¶

p_{X, Y} (x, y) = P (X = x, Y = y)

Marginal PMFs¶

p_{X} (x) = \sum_{y} p_{X, Y} (x, y)

E [g (X, Y)] = \sum_{x} \sum_{y} g (x, y) p_{X, Y} (x, y)

Conditionals¶

p_{X | A} (x) = P (X = x | A), such that \sum_{x} p_{X | A} (x) = 1

p_{X} (x) = \sum P (A_{i}) p_{x | A_{i}} (x)

p_{X, Y} (x, y) = p_{Y} (y) p_{X | Y} (x | y)

p_{X} = \sum_{y} p_{Y} (y) p_{X | Y} (x | y)

E [g (X) | A] = \sum_{x} g (x) p_{X | A} (x)

E [g (X) | Y = y] = \sum_{x} g (x) p_{X | Y} (x | y)

E [X] = \sum P (A_{i}) E [x | A_{i}]

Independence strikes back¶

$p_{X | A} (x) = p_{X} (x)$
$E [X Y] = E [X] E [Y]$
$V a r [X + Y] = V a r [X] + V a r [Y]$
$p_{X, Y} (x, y) = p_{X} (x) p_{Y} (y)$ , $\forall$ $x \in X$ and $y \in Y$

Continuity and Lovely Curves¶

P (X \in B) = \int_{B} f_{X} (x) d x, where f_{X} (x) \geq 0

E [g (X)] = \int_{- \infty}^{\infty} g (x) f_{X} (x) d x

Cumulative Distributions¶

F_{X} (x) = P (X \leq x)

F_{X} (x) = \int_{- \infty}^{x} f_{X} (x) d x

Conditionals¶

\int_{B} f_{X | A} (x) d x = P (X \in B | A)

f_{X} (x) = \sum P (A_{i}) f_{x | A_{i}} (x)

f_{X, Y} (x, y) = f_{Y} (y) f_{X | Y} (x | y)

f_{X} = \int_{- \infty}^{\infty} f_{Y} (y) f_{X | Y} (x | y)

Conditional Expectation¶

E [g (X) | A] = \int_{Ω} g (x) f_{X | A} (x)

E [g (X) | Y = y] = \int_{Ω} g (x) f_{X | Y} (x | y)

E [X] = \sum P (A_{i}) E [x | A_{i}]

Bayes' Theorem¶

You can of course interchange $p$ with $f$ to account for continuous random variables.

p_{X} (x) p_{Y | X} (y | x) = p_{Y} (y) p_{X | Y} (x | Y)

Gimme More¶

F_{Y} (y) = P (g (X) \leq y) = \int_{x | g (x) \leq y} f_{X} (x) d x

where $Y = g (X)$ .

Also, if $Y = a X + b$ , then we have $f_{Y} (y) = \frac{1}{| a |} f_{X} (\frac{y - a}{b})$ .

Correlations¶

C o v (X, Y) = E [(X - E [X]) (Y - E [Y])] = E [X Y] - E [X] E [Y]

V a r [X + Y] = V a r [X] + V a r [Y] + 2 C o v [X, Y]

C o r (X, Y) = ρ (X, Y) = \frac{C o v (X, Y)}{\sqrt{v a r (X) v a r (Y)}}

Law of iterated expectations¶

E [E [X | Y]] = E [X]

Law of total variance¶

V a r [X] = E [V a r [X | Y]] + V a r [E [X | Y]]

Limits of the land¶

If $X$ takes only non-negative values then $P (X \geq a) \leq E [X] / a$ .
$P (| X - μ | \geq c) \leq σ^{2} / c^{2}$ for all $c > 0$ .
According to the idea of convergence $lim_{n \to \infty} P (| X_{n} - a | \geq ϵ) = 0$ .

Central Limit Theorem¶

Independent sampling of any distribution always yields a mean distribution which is normal.

Z_{n} = \frac{X_{1} + \dots + X_{n} - n μ}{σ \sqrt{n}}

lim_{n \to \infty} P (Z_{n} \leq z) = ϕ (z) = \frac{1}{\sqrt{2 π}} e^{- z^{2} / 2}

Law of large numbers¶

If you repeat an experiment independently a large number of times and average the result, what you obtain should be close to the expected value.

P (lim \frac{X_{1} + \dots + X_{n}}{n} = μ) = 1

Distributions¶

The gaussian distribution shows up in nature a lot because there are many situations in which a lot of small effects sum up to the thing you actually measure.

Poisson statistics describe situations where an event occurs randomly but has a constant probability of happening everytime.

Poisson Distribution¶

To predict the number of events occurring in the future!
More formally, to predict the probability of a given number of events occurring in a fixed interval of time.
The Poisson Distribution, on the other hand, doesn’t require you to know $n$ or $p$ . We are assuming $n$ is infinitely large and $p$ is infinitesimal. The only parameter of the Poisson distribution is the rate $λ$ (the expected value of $x$ ). In real life, only knowing the rate (i.e., during 2pm~4pm, I received 3 phone calls) is much more common than knowing both $n$ & $p$ .

This gives the probability of observing $k$ events in an interval. The average number of events in an interval is designated by $λ$ .

P (X = k) = \frac{λ^{k}}{k!} e^{- λ}

Even though the Poisson distribution models rare events, the rate $λ$ can be any number. It doesn’t always have to be small.
Assumptions:
- The average rate of events per unit time is constant.
- Events are independent.
If the number of events per unit time follows a Poisson distribution, then the amount of time between events follows the exponential distribution. The Poisson distribution is discrete and the exponential distribution is continuous, yet the two distributions are closely related.

Random Shit¶

How do you test whether a data sample is normal or not? https://en.wikipedia.org/wiki/Jarque–Bera_test
Optimal theoretical size for a bet is given by the https://en.wikipedia.org/wiki/Kelly_criterion.
Random variable X is distributed as N(a, b), and random variable Y is distributed as N(c, d). What is the distribution of (1) $X + Y$ , (2) $X - Y$ , (3) $X \times Y$ , (4) $X / Y$ ?

(1) $X + Y \sim N (a + c, \sqrt{(b^{2} + d^{2} + 2 ρ b d)})$

(2) $X + Y \sim N (a + c, \sqrt{(b^{2} + d^{2} - 2 ρ b d)})$

(3) $X \times Y \sim N (a \times c, \sqrt{(a^{2} d + c^{2} b + b d)})$
The Monty Hall Problem