model-clinic
Diagnose, treat, and understand neural network models. Like a doctor for your PyTorch checkpoints.
pip install model-clinic
Status: v0.3.0 on PyPI. v0.4.0 (deep repair) in validation โ not yet released. We're testing repair capabilities across 72+ real checkpoints from a 645-hour training failure. Early results are promising but not conclusive.
What it does
Finds problems in model weights, prescribes fixes, applies them with before/after testing, and rolls back if things get worse. Works on any PyTorch checkpoint โ no model class, training code, or architecture knowledge needed.
Static analysis (no GPU needed, 22 detectors):
- Dead neurons, stuck gates, NaN/Inf
- Exploding/vanishing norms, LayerNorm drift
- Heavy-tailed distributions, saturated weights
- Duplicate rows, attention Q/K/V imbalance
- Mixed dtypes, weight corruption
- Head redundancy, positional encoding issues
- Token collapse, gradient noise, representation drift
- MoE router collapse, LoRA merge artifacts
- Quantization degradation, model aging/forgetting
Runtime analysis (needs model + tokenizer, 6 detectors):
- Generation collapse detection (entropy, top-1 probability)
- Coherence scoring across diverse prompts
- Activation health per layer (hooks)
- Residual stream growth tracking
Deep repair (v0.4.0, in validation):
- Level 2: Spectral surgery โ SVD-based denoising of weight matrices
- Level 3: Distillation repair โ reset dead modules, train from working layers
- Level 4: Cross-checkpoint grafting โ best-of-N merging per parameter
- Level 5: Activation-guided repair โ detect and fix destructive layers at runtime
Health scoring:
- 0-100 score with letter grade (A-F)
- Per-category breakdown: weights, stability, output, activations
- Comparable across models and training runs
Quick start
# Examine any checkpoint
model-clinic exam checkpoint.pt
# HuggingFace model
model-clinic exam Qwen/Qwen2.5-0.5B-Instruct --hf
# Treat and save
model-clinic treat checkpoint.pt --save treated.pt
# Health score only
model-health checkpoint.pt
# HTML diagnostic report
model-clinic report checkpoint.pt --output report.html
# Compare two checkpoints
model-clinic compare before.pt after.pt
# Try with a synthetic broken model (no checkpoint needed)
model-clinic demo everything-broken
Python API
from model_clinic import load_state_dict, diagnose, prescribe, apply_treatment
# Load any checkpoint format
state_dict, meta = load_state_dict("checkpoint.pt")
# Diagnose
findings = diagnose(state_dict)
for f in findings:
print(f"[{f.severity}] {f.condition}: {f.param_name}")
# Health score
from model_clinic import compute_health_score
health = compute_health_score(findings)
print(f"Score: {health.overall}/100 ({health.grade})")
# Training monitor
from model_clinic import ClinicMonitor
monitor = ClinicMonitor(check_every=500)
# In training loop: alerts = monitor.check(model)
Real-world results
We used model-clinic to perform a forensic analysis of 72 checkpoints from a 645-hour training run. The results told the full story of what went wrong:
| Checkpoint | Score | What Happened |
|---|---|---|
| Pretrain step 16K | 84/B | Healthy backbone |
| Growth enabled | 56/D | Neural foam growth destroyed it |
| Fine-tuning (GRPO/Rho-1) | 65/C | Partial recovery |
| After repair (L1+L2+L3) | 76/C | Spectral surgery + distillation |
| After gate opening | 82/B | Memory system activated |
Full write-up: How We Mass-Produced Broken Models for 645 Hours
All CLI tools
| Command | What it does |
|---|---|
model-clinic exam |
Diagnose model health |
model-clinic treat |
Diagnose and apply fixes |
model-clinic validate |
Verify checkpoint loads correctly |
model-clinic report |
HTML diagnostic report |
model-clinic compare |
Compare two checkpoints |
model-clinic demo |
Synthetic broken model demos |
model-xray |
Per-parameter weight stats |
model-diff |
Param-by-param comparison |
model-health |
Quick health check |
model-surgery |
Direct parameter modification |
model-ablate |
Systematic ablation |
model-neurons |
Neuron activation profiling |
model-attention |
Attention pattern analysis |
model-logit-lens |
Layer-by-layer prediction tracking |
Conditions detected
22 static + 6 runtime detectors covering: NaN/Inf, dead neurons, stuck gates, exploding/vanishing norms, heavy tails, norm drift, saturated weights, identical rows, attention imbalance, dtype mismatch, weight corruption, head redundancy, positional encoding issues, token collapse, gradient noise, representation drift, MoE router collapse, LoRA merge artifacts, quantization degradation, model aging, generation collapse, low coherence, activation anomalies.
Installation
pip install model-clinic # Core (static analysis)
pip install model-clinic[hf] # + HuggingFace runtime analysis
pip install model-clinic[all] # Everything
Stats
- 22 static detectors, 6 runtime detectors
- 699 tests passing (v0.4.0)
- 22 CLI commands
- ~50 public API exports
- Works on any
.pt,.pth,.safetensors, or HuggingFace model
License
MIT