Navigation
Home
Models
Linear Regression Logistic Regression
Supervised Learning

Linear Regression

Supervised Regression
Theory

What it does

Estimates a continuous output by fitting the best-fit hyperplane through data, minimizing the sum of squared residuals (MSE). Each feature receives a coefficient β that quantifies its exact linear contribution to the prediction.

When to use

Target is continuous (yield, price, log-survival). Relationship between features and target is roughly linear. Check residuals are normally distributed (Q-Q plot). Add regularization (Ridge/Lasso) when many features are correlated.

Key strength

Fully interpretable — βi means "y changes by βi for every unit increase in xi". Fastest model to train. Best first baseline. Limitation: assumes linearity; fails on non-linear data and is sensitive to outliers.

Regularization

Ridge (L2) shrinks all coefficients toward zero — handles multicollinearity. Lasso (L1) zeroes out irrelevant features entirely — automatic feature selection. Both add a penalty λ·‖β‖ to the loss.

Hypothesis & Loss

ŷ = β₀ + β₁x₁ + β₂x₂ + … + βₙxₙ
Loss (MSE) = (1/n) Σ (yᵢ − ŷᵢ)²     β = (XᵀX)⁻¹ Xᵀy  [Normal Eq.]
Ridge: Loss + λ·Σβⱼ²   |   Lasso: Loss + λ·Σ|βⱼ|
Dataset Selection
Synthetic Datasets — 2D
Real Datasets — Multi-feature
Model