model-clinic

Diagnose, treat, and understand neural network models. Like a doctor for your PyTorch checkpoints.

pip install model-clinic

Status: v0.3.0 on PyPI. v0.4.0 (deep repair) in validation — not yet released. We're testing repair capabilities across 72+ real checkpoints from a 645-hour training failure. Early results are promising but not conclusive.

What it does

Finds problems in model weights, prescribes fixes, applies them with before/after testing, and rolls back if things get worse. Works on any PyTorch checkpoint — no model class, training code, or architecture knowledge needed.

Static analysis (no GPU needed, 22 detectors):

Dead neurons, stuck gates, NaN/Inf
Exploding/vanishing norms, LayerNorm drift
Heavy-tailed distributions, saturated weights
Duplicate rows, attention Q/K/V imbalance
Mixed dtypes, weight corruption
Head redundancy, positional encoding issues
Token collapse, gradient noise, representation drift
MoE router collapse, LoRA merge artifacts
Quantization degradation, model aging/forgetting

Runtime analysis (needs model + tokenizer, 6 detectors):

Generation collapse detection (entropy, top-1 probability)
Coherence scoring across diverse prompts
Activation health per layer (hooks)
Residual stream growth tracking

Deep repair (v0.4.0, in validation):

Level 2: Spectral surgery — SVD-based denoising of weight matrices
Level 3: Distillation repair — reset dead modules, train from working layers
Level 4: Cross-checkpoint grafting — best-of-N merging per parameter
Level 5: Activation-guided repair — detect and fix destructive layers at runtime

Health scoring:

0-100 score with letter grade (A-F)
Per-category breakdown: weights, stability, output, activations
Comparable across models and training runs

Quick start

# Examine any checkpoint
model-clinic exam checkpoint.pt

# HuggingFace model
model-clinic exam Qwen/Qwen2.5-0.5B-Instruct --hf

# Treat and save
model-clinic treat checkpoint.pt --save treated.pt

# Health score only
model-health checkpoint.pt

# HTML diagnostic report
model-clinic report checkpoint.pt --output report.html

# Compare two checkpoints
model-clinic compare before.pt after.pt

# Try with a synthetic broken model (no checkpoint needed)
model-clinic demo everything-broken

Python API

from model_clinic import load_state_dict, diagnose, prescribe, apply_treatment

# Load any checkpoint format
state_dict, meta = load_state_dict("checkpoint.pt")

# Diagnose
findings = diagnose(state_dict)
for f in findings:
    print(f"[{f.severity}] {f.condition}: {f.param_name}")

# Health score
from model_clinic import compute_health_score
health = compute_health_score(findings)
print(f"Score: {health.overall}/100 ({health.grade})")

# Training monitor
from model_clinic import ClinicMonitor
monitor = ClinicMonitor(check_every=500)
# In training loop: alerts = monitor.check(model)

Real-world results

We used model-clinic to perform a forensic analysis of 72 checkpoints from a 645-hour training run. The results told the full story of what went wrong:

Checkpoint	Score	What Happened
Pretrain step 16K	84/B	Healthy backbone
Growth enabled	56/D	Neural foam growth destroyed it
Fine-tuning (GRPO/Rho-1)	65/C	Partial recovery
After repair (L1+L2+L3)	76/C	Spectral surgery + distillation
After gate opening	82/B	Memory system activated

Full write-up: How We Mass-Produced Broken Models for 645 Hours

All CLI tools

Command	What it does
`model-clinic exam`	Diagnose model health
`model-clinic treat`	Diagnose and apply fixes
`model-clinic validate`	Verify checkpoint loads correctly
`model-clinic report`	HTML diagnostic report
`model-clinic compare`	Compare two checkpoints
`model-clinic demo`	Synthetic broken model demos
`model-xray`	Per-parameter weight stats
`model-diff`	Param-by-param comparison
`model-health`	Quick health check
`model-surgery`	Direct parameter modification
`model-ablate`	Systematic ablation
`model-neurons`	Neuron activation profiling
`model-attention`	Attention pattern analysis
`model-logit-lens`	Layer-by-layer prediction tracking

Conditions detected

22 static + 6 runtime detectors covering: NaN/Inf, dead neurons, stuck gates, exploding/vanishing norms, heavy tails, norm drift, saturated weights, identical rows, attention imbalance, dtype mismatch, weight corruption, head redundancy, positional encoding issues, token collapse, gradient noise, representation drift, MoE router collapse, LoRA merge artifacts, quantization degradation, model aging, generation collapse, low coherence, activation anomalies.

Installation

pip install model-clinic          # Core (static analysis)
pip install model-clinic[hf]      # + HuggingFace runtime analysis
pip install model-clinic[all]     # Everything

Stats

22 static detectors, 6 runtime detectors
699 tests passing (v0.4.0)
22 CLI commands
~50 public API exports
Works on any .pt, .pth, .safetensors, or HuggingFace model

License

MIT

spartan8806
/

model-clinic