Estimates a continuous output by fitting the best-fit hyperplane through data, minimizing the sum of squared residuals (MSE). Each feature receives a coefficient β that quantifies its exact linear contribution to the prediction.
When to use
Target is continuous (yield, price, log-survival). Relationship between features and target is roughly linear. Check residuals are normally distributed (Q-Q plot). Add regularization (Ridge/Lasso) when many features are correlated.
Key strength
Fully interpretable — βi means "y changes by βi for every unit increase in xi". Fastest model to train. Best first baseline. Limitation: assumes linearity; fails on non-linear data and is sensitive to outliers.
Regularization
Ridge (L2) shrinks all coefficients toward zero — handles multicollinearity. Lasso (L1) zeroes out irrelevant features entirely — automatic feature selection. Both add a penalty λ·‖β‖ to the loss.