# Hypotheses Testing

Created | |
---|---|

Tags |

# np.random.choice(...)

```
population = np.random.normal(loc=65, scale=3.5, size=300)
population_mean = np.mean(population)
sample_1 = np.random.choice(population, size=30, replace=False)
```

**Hypothesis Test Errors**

*Type I* errors, also known as *false positives*, is the error of rejecting a null hypothesis when it is actually true. This can be viewed as a miss being registered as a hit. The acceptable rate of this type of error is called *significance level* and is usually set to be `0.05`

(5%) or `0.01`

(1%).

*Type II* errors, also known as *false negatives*, is the error of not rejecting a null hypothesis when the alternative hypothesis is the true. This can be viewed as a hit being registered as a miss.

Depending on the purpose of testing, testers decide which type of error to be concerned. But, usually `type I`

error is more important than `type II`

.

**Sample Vs. Population Mean**

In statistics, we often use the mean of a sample to estimate or infer the mean of the broader population from which the *sample* was taken. In other words, the *sample mean* is an estimation of the *population mean*.

**Central Limit Theorem**

The *central limit theorem* states that as samples of larger size are collected from a population, the distribution of sample means approaches a normal distribution with the same mean as the population. No matter the distribution of the population (uniform, binomial, etc), the sampling distribution of the mean will approximate a normal distribution and its mean is the same as the population mean.

The central limit theorem allows us to perform tests, make inferences, and solve problems using the normal distribution, even when the population is not normally distributed.

**Hypothesis Test P-value**

Statistical *hypothesis tests* return a *p-value*, which indicates the probability that the *null hypothesis* of a test is true. If the p-value is less than or equal to the *significance level*, then the null hypothesis is rejected in favor of the alternative hypothesis. And, if the p-value is greater than the significance level, then the null hypothesis is not rejected.

**Univariate T-test**

A *univariate T-test* (or 1 Sample T-test) is a type of hypothesis test that compares a sample mean to a hypothetical population mean and determines the probability that the sample came from a distribution with the desired mean.

This can be performed in Python using the `ttest_1samp()`

function of the `SciPy`

library. The code block shows how to call `ttest_1samp()`

. It requires two inputs, a sample distribution of values and an expected mean and returns two outputs, the t-statistic and the p-value.

```
from scipy.stats import ttest_1samp
t_stat, p_val = ttest_1samp(example_distribution, expected_mean)
from scipy.stats import ttest_ind
t_stat, pval = ttest_ind() # 2 Sample T-Test
t_stat, pval = f_oneway(a, b, c) # ANNOVA
```

**Tukey’s Range Hypothesis Tests**

A *Tukey’s Range* hypothesis test can be used to check if the relationship between two datasets is statistically significant.

The Tukey’s Range test can be performed in Python using the `StatsModels`

library function `pairwise_tukeyhsd()`

. The example code block shows how to call `pairwise_tukeyhsd()`

. It accepts a list of data, a list of labels, and the desired significance level.

```
from statsmodels.stats.multicomp import pairwise_tukeyhsd
tukey_results = pairwise_tukeyhsd(data, labels, alpha=significance_level)
```

# P-Values

```
def reject_null_hypothesis(p_value):
"""
Returns the truthiness of whether the null hypothesis can be rejected
Takes a p-value as its input and assumes p <= 0.05 is significant
"""
return p_value <= 0.05
# hypothesis_tests = [....] some array
for p_value in hypothesis_tests:
reject_null_hypothesis(p_value)
```

# Binomial Test

```
import scipy.stats.binom_test
binom_test(x, n, p)
```

where, $x$ is the number of “successes” (0.051 * 10000 in this case) $n$ is the number of samples (10000 in this case) $p$ is the expected percentage of successes (0.06 in this case)

# Chi Square Test

```
from scipy.stats import chi2_contingency
# Contingency table
# harvester | leaf cutter
# ----+------------------+------------
# 1st gr | 30 | 10
# 2nd gr | 35 | 5
# 3rd gr | 28 | 12
# 4th gr | 20 . | 20
X = [[30, 10],
[35, 5],
[28, 12],
[20, 20]]
chi2, pval, dof, expected = chi2_contingency(X)
print pval
```