---
license: mit
library_name: numpy
tags:
- chest-xray
- medical-imaging
- from-scratch
- numpy
- education
pipeline_tag: image-classification
---

# CheXVision-mini — from-scratch NumPy neural network

A pure-**NumPy** multilayer perceptron (no autograd, no deep-learning framework),
with every forward and backward pass derived and coded by hand, trained for
binary chest X-ray screening (**normal vs abnormal**) on NIH ChestX-ray14.

Companion to [CheXVision](https://github.com/arudaev/chexvision) (PyTorch: a
custom CNN + a DenseNet-121 transfer model). This model demonstrates the
**fundamentals** — hand-written backprop verified by finite-difference gradient
checking. It is intentionally a fundamentals demo: the headline performance
belongs to the PyTorch models (DenseNet binary AUC ≈ 0.787), not to this MLP.

## Results — held-out test set (final)

Metrics on an **untouched test split**, at an operating threshold chosen on the
validation set only (Youden's J = 0.389). ROC-AUC is threshold-independent.

| Metric | Test | Validation |
|---|---|---|
| ROC-AUC | **0.6502** | 0.6994 |
| Accuracy | 0.6467 | 0.6536 |
| Balanced accuracy | 0.5904 | 0.6517 |
| Precision | 0.6749 | 0.5803 |
| Recall (sensitivity) | 0.8277 | 0.6393 |
| Specificity | 0.3530 | 0.6640 |
| F1 | 0.7435 | 0.6084 |

Checkpoint selected by best validation AUC (epoch 176/200).
Samples — train 60000, val 8557, test 10000
(test positive rate 0.6187).
Test confusion matrix @ 0.389: TN=1346, FP=2467, FN=1066, TP=5121.

> **Note on the test split:** NIH ChestX-ray14's official `test` split is more
> positive-heavy (0.6187) than train/validation
> (0.4208). Because of that base-rate shift, plain accuracy
> can mislead — **ROC-AUC (threshold-independent) and balanced accuracy are the
> metrics to trust** for comparison.

## Architecture

MLP on 64×64 grayscale images: **4096 → 1024 → 256 → 64 → 1** logit,
ReLU activations, dropout 0.3, He initialisation.
Loss: BCE-with-logits (+ label smoothing 0.05).
Optimizer: adam with cosine LR decay; L2 weight decay
(weights only). Per-feature standardisation; augmentation: h-flip / noise / brightness.

## Files

- `model.npz` — best weights + normalisation stats (`_norm_mean`, `_norm_std`).
- `metrics.json` — test & validation metrics, ROC/PR curves, confusion matrices, config.
- `history.json` — per-epoch train/reg/val loss, val accuracy/AUC, learning rate.
- `val_scores.npy` / `val_labels.npy`, `test_scores.npy` / `test_labels.npy` — raw scores + labels.
- `loss_curve.png` — training curves + val AUC.

## Usage

```python
from chexvision_mini.inference import load_checkpoint, preprocess_image, predict_label
model, mean, std, threshold = load_checkpoint("artifacts")
x = preprocess_image("xray.png", image_size=64, mean=mean, std=std)
prob, label = predict_label(model, x, threshold)   # P(abnormal), "normal"/"abnormal"
```

Or from the CLI: `python -m chexvision_mini predict --checkpoint artifacts --image xray.png`.

## Links

- Code: https://github.com/arudaev/chexvision-mini
- Parent project: https://github.com/arudaev/chexvision