AB Test

We have a website with white background. The mean amount of time people spend on it is μ = 20 minutes
We change the background to yellow and want to see if it improves user time on our site

Null hypothesis H0
The change had no effect, there is no news here

That would mean the mean is still 20 minutes even after the change

Alternative hypothesis Ha
μ > 20
people actually spend more time on it

We set a threshold significance level α = 0.05

Take sample of people visiting yellow website and calculate mean like sample mean, standard deviation, 

if the null hypothesis is true, what is the probability of getting a sample with statistics that we get?

if that probability is lower than our significance level. if it is less than 0.05 (5%) then we reject the null hypothesis and say we have evidence for the alternative. 
However if the probability of getting samples are at significance level or higher, then we say hey we can't reject the null hypothesis and we aren't able to have evidence for the alternative.

Step 3: take 100 samples from yellow page and calculate mean and sample standard deviation and calculate p-value (X_bar = 25, STD)
Step 4: p-value: p(sample mean X_bar >= 25  |  H0 is true) 

Step 5: if p-value < α   => reject null hypothesis                if p-value >= α  we do not reject the null hypothesis.

A neurologist is testing the effects of a drug on rat response time, by injecting 100 rats with a unit dose of the drug. 
Neurologist knows that
mean response time of the rats not injected with drug is 1.2 seconds,
mean response time of the rats injected with drug is 1.05 seconds, with a sample standard deviation of 0.5 seconds. Do you think that the drug has an effect on response time?

H0: drug has no effect: μ is 1.2 even with drug
Ha: drug has an effect: μ is not 1.2 when drug is given

Should we accept alternative hypothesis or stick with null hypothesis?

Let's assume null hypothesis is true, what is the prob we would have got these results with samples 

Chi-square goodness-of-fit tests  Khan academy: very good!

X^2 = sigma (observed - expected)^2 / expected

degree of freedom = number of options  - 1

then look up table to see for which degree of freedom what is the prob of this particular X^2. That becomes your p-value

The McNemar’s test operates upon a contingency table. of two classifiers working on same dataset. So you get two results per sample

Approximate statistical tests for comparing supervised classification learning algorithms

AB test confidence online

Subpages (1): Propensity Score