--- license: mit library_name: numpy tags: - chest-xray - medical-imaging - from-scratch - numpy - education pipeline_tag: image-classification --- # CheXVision-mini — from-scratch NumPy neural network A pure-**NumPy** multilayer perceptron (no autograd, no deep-learning framework), with every forward and backward pass derived and coded by hand, trained for binary chest X-ray screening (**normal vs abnormal**) on NIH ChestX-ray14. Companion to [CheXVision](https://github.com/arudaev/chexvision) (PyTorch: a custom CNN + a DenseNet-121 transfer model). This model demonstrates the **fundamentals** — hand-written backprop verified by finite-difference gradient checking. It is intentionally a fundamentals demo: the headline performance belongs to the PyTorch models (DenseNet binary AUC ≈ 0.787), not to this MLP. ## Results — held-out test set (final) Metrics on an **untouched test split**, at an operating threshold chosen on the validation set only (Youden's J = 0.389). ROC-AUC is threshold-independent. | Metric | Test | Validation | |---|---|---| | ROC-AUC | **0.6502** | 0.6994 | | Accuracy | 0.6467 | 0.6536 | | Balanced accuracy | 0.5904 | 0.6517 | | Precision | 0.6749 | 0.5803 | | Recall (sensitivity) | 0.8277 | 0.6393 | | Specificity | 0.3530 | 0.6640 | | F1 | 0.7435 | 0.6084 | Checkpoint selected by best validation AUC (epoch 176/200). Samples — train 60000, val 8557, test 10000 (test positive rate 0.6187). Test confusion matrix @ 0.389: TN=1346, FP=2467, FN=1066, TP=5121. > **Note on the test split:** NIH ChestX-ray14's official `test` split is more > positive-heavy (0.6187) than train/validation > (0.4208). Because of that base-rate shift, plain accuracy > can mislead — **ROC-AUC (threshold-independent) and balanced accuracy are the > metrics to trust** for comparison. ## Architecture MLP on 64×64 grayscale images: **4096 → 1024 → 256 → 64 → 1** logit, ReLU activations, dropout 0.3, He initialisation. Loss: BCE-with-logits (+ label smoothing 0.05). Optimizer: adam with cosine LR decay; L2 weight decay (weights only). Per-feature standardisation; augmentation: h-flip / noise / brightness. ## Files - `model.npz` — best weights + normalisation stats (`_norm_mean`, `_norm_std`). - `metrics.json` — test & validation metrics, ROC/PR curves, confusion matrices, config. - `history.json` — per-epoch train/reg/val loss, val accuracy/AUC, learning rate. - `val_scores.npy` / `val_labels.npy`, `test_scores.npy` / `test_labels.npy` — raw scores + labels. - `loss_curve.png` — training curves + val AUC. ## Usage ```python from chexvision_mini.inference import load_checkpoint, preprocess_image, predict_label model, mean, std, threshold = load_checkpoint("artifacts") x = preprocess_image("xray.png", image_size=64, mean=mean, std=std) prob, label = predict_label(model, x, threshold) # P(abnormal), "normal"/"abnormal" ``` Or from the CLI: `python -m chexvision_mini predict --checkpoint artifacts --image xray.png`. ## Links - Code: https://github.com/arudaev/chexvision-mini - Parent project: https://github.com/arudaev/chexvision