Navigation
Home
Models
Linear Regression Logistic Regression
Supervised Learning

Logistic Regression

Supervised Classification
Theory

What it does

Models the probability that an observation belongs to a class. The log-odds of the positive class is modelled as a linear combination of features; the sigmoid function squashes this to (0, 1), giving a probability. A decision boundary at threshold 0.5 separates classes.

When to use

Binary or multi-class classification where you need well-calibrated class probabilities (e.g. medical diagnosis, credit scoring). Assumes each class is (roughly) linearly separable in feature space. Add regularization (L1 / L2) when features are many or correlated.

Key strength

Outputs calibrated probabilities — not just a label. Coefficients are interpretable as log-odds ratios: eˢᵉ means "the odds multiply by eˢᵉ for each unit increase in xⱼ". Limitation: linear decision boundary; fails on XOR-type data (use kernel or neural net).

Regularization — note the sign flip

C = 1/λ — the inverse of regularization strength. Smaller C → stronger regularization → simpler boundary. L2 shrinks all coefficients. L1 zeroes irrelevant features (feature selection). ElasticNet blends both.

Hypothesis, Loss & Multi-class

σ(z) = 1 / (1 + e⁻ᶻ)    z = β₀ + β₁x₁ + … + βₙxₙ
P(y=1|x) = σ(z)     Decision boundary: z = 0  ⟺  P = 0.5
Loss (Cross-Entropy) = −(1/n) Σ [ yᵢ log ŷᵢ + (1−yᵢ) log(1−ŷᵢ) ]
L2: Loss + (1/2C)·Σβⱼ²  |  L1: Loss + (1/C)·Σ|βⱼ|  |  Multi-class: One-vs-Rest (OvR) by default
Interactive — Sigmoid Function
Move z along the log-odds axis to see how it maps to a probability:
σ(z) = 0.500
Boundary: exactly 50% probability
Dataset Selection
Synthetic Datasets — 2D (decision boundary visible)
Real Datasets — Multi-feature
Regularization
C = 1/λ  ← stronger reg | weaker reg →
1.00