Ebrahim AlhamedFrameworks Library

m.02 · I · Quantifying Uncertainty · Business Experimentation

A/B Testing & Hypothesis Logic

The only way to prove causation is to randomise.

Correlation can never prove causation. A randomised experiment can. By assigning users to treatment and control by coin-flip, you make the two groups identical on average — so any difference in outcome has to come from the treatment. That one trick, first used by John Snow in 1854, is the backbone of modern A/B testing. — after Snow, Fisher, and every growth team since

Decision matrix.

Two axes: is the null actually true, and did you reject it?

2×2 matrix Two-by-two matrix with your decision on the horizontal axis and truth on the vertical axis, showing four quadrant positions. Correct (specificity) Type I (false +) Type II (false −) Correct (power) truth H₀ true H₀ false your decision fail to reject reject H₀

Ideas that pay rent.

Hypothesis Test · Statistical inference
H₀ (default / null) · t-statistic = (observed − expected) / SE · p-value · decision threshold
If |t| > 2 (roughly), reject H₀. Otherwise do not.
A/B Testing (RCT) · Causal inference standard
random assignment · treatment vs control · difference in means is the causal effect
Randomisation is the only antidote to confounding.
Type I vs Type II Error · Decision theory
Type I: false positive (α) · Type II: false negative (β)
Every threshold trades one off against the other. Pick deliberately.

Running an experiment you will actually trust.

  1. State H₀ in one sentence. "The new button does not change click-through rate."
  2. Commit to sample size before starting. Peeking at results early destroys validity.
  3. Randomise the assignment. Not by date. Not by region. By coin-flip.
  4. Report effect size alongside p-value. Statistical significance is not business significance.

Key reading · Uber Engineering + Snow (1854)

The power of randomised experiments.

Snow's Broad Street pump analysis was the first use of quasi-random assignment to prove causation in public health. Every modern A/B testing stack is a digital re-run of the same logic.

Randomise or you will never know.

← m.01 The Range You Can Defend ··· m.03 Regression & Correlation →