The core idea
Correlation tells you whether two variables move together. Regression quantifies by how much, in what direction, per unit. The regression coefficient is the single most useful number in business analytics: "for every unit change in X, Y changes by b units, holding everything else constant." With that, you can predict, compare, and — if you controlled the right variables — start to argue about causation. — after Galton & Fisher
The hero diagram
Scatter, line and residual.
The line is what you fit. The residuals are what you missed.
The tools on the bench
Ideas that pay rent.
How to apply
Building a usable regression.
- Plot before you fit. Scatter plots reveal non-linearity, outliers, and clustering that equations hide.
- Start simple. One predictor, then add. Stop adding when Adjusted R² stops rising.
- Check the signs. If a coefficient's sign contradicts business sense, look for a confounder.
- Look at residuals. Patterns in residuals mean the model is missing something important.
Key reading · Session 3 · Christodoulou
Confounding and omitted-variable bias.
If you regress exam score on study time alone, you might see a negative relationship — because smart students study less and still score higher. Add IQ as a control, and the real (positive) relationship appears. This is why multiple regression is essential for any causal claim.
Correlation without controls is a ghost story.