model-clinic

PyPI version Python versions

Diagnose, treat, and understand neural network models. Like a doctor for your PyTorch checkpoints.

pip install model-clinic

Status: v0.3.0 on PyPI. v0.4.0 (deep repair) in validation โ€” not yet released. We're testing repair capabilities across 72+ real checkpoints from a 645-hour training failure. Early results are promising but not conclusive.

What it does

Finds problems in model weights, prescribes fixes, applies them with before/after testing, and rolls back if things get worse. Works on any PyTorch checkpoint โ€” no model class, training code, or architecture knowledge needed.

Static analysis (no GPU needed, 22 detectors):

  • Dead neurons, stuck gates, NaN/Inf
  • Exploding/vanishing norms, LayerNorm drift
  • Heavy-tailed distributions, saturated weights
  • Duplicate rows, attention Q/K/V imbalance
  • Mixed dtypes, weight corruption
  • Head redundancy, positional encoding issues
  • Token collapse, gradient noise, representation drift
  • MoE router collapse, LoRA merge artifacts
  • Quantization degradation, model aging/forgetting

Runtime analysis (needs model + tokenizer, 6 detectors):

  • Generation collapse detection (entropy, top-1 probability)
  • Coherence scoring across diverse prompts
  • Activation health per layer (hooks)
  • Residual stream growth tracking

Deep repair (v0.4.0, in validation):

  • Level 2: Spectral surgery โ€” SVD-based denoising of weight matrices
  • Level 3: Distillation repair โ€” reset dead modules, train from working layers
  • Level 4: Cross-checkpoint grafting โ€” best-of-N merging per parameter
  • Level 5: Activation-guided repair โ€” detect and fix destructive layers at runtime

Health scoring:

  • 0-100 score with letter grade (A-F)
  • Per-category breakdown: weights, stability, output, activations
  • Comparable across models and training runs

Quick start

# Examine any checkpoint
model-clinic exam checkpoint.pt

# HuggingFace model
model-clinic exam Qwen/Qwen2.5-0.5B-Instruct --hf

# Treat and save
model-clinic treat checkpoint.pt --save treated.pt

# Health score only
model-health checkpoint.pt

# HTML diagnostic report
model-clinic report checkpoint.pt --output report.html

# Compare two checkpoints
model-clinic compare before.pt after.pt

# Try with a synthetic broken model (no checkpoint needed)
model-clinic demo everything-broken

Python API

from model_clinic import load_state_dict, diagnose, prescribe, apply_treatment

# Load any checkpoint format
state_dict, meta = load_state_dict("checkpoint.pt")

# Diagnose
findings = diagnose(state_dict)
for f in findings:
    print(f"[{f.severity}] {f.condition}: {f.param_name}")

# Health score
from model_clinic import compute_health_score
health = compute_health_score(findings)
print(f"Score: {health.overall}/100 ({health.grade})")

# Training monitor
from model_clinic import ClinicMonitor
monitor = ClinicMonitor(check_every=500)
# In training loop: alerts = monitor.check(model)

Real-world results

We used model-clinic to perform a forensic analysis of 72 checkpoints from a 645-hour training run. The results told the full story of what went wrong:

Checkpoint Score What Happened
Pretrain step 16K 84/B Healthy backbone
Growth enabled 56/D Neural foam growth destroyed it
Fine-tuning (GRPO/Rho-1) 65/C Partial recovery
After repair (L1+L2+L3) 76/C Spectral surgery + distillation
After gate opening 82/B Memory system activated

Full write-up: How We Mass-Produced Broken Models for 645 Hours

All CLI tools

Command What it does
model-clinic exam Diagnose model health
model-clinic treat Diagnose and apply fixes
model-clinic validate Verify checkpoint loads correctly
model-clinic report HTML diagnostic report
model-clinic compare Compare two checkpoints
model-clinic demo Synthetic broken model demos
model-xray Per-parameter weight stats
model-diff Param-by-param comparison
model-health Quick health check
model-surgery Direct parameter modification
model-ablate Systematic ablation
model-neurons Neuron activation profiling
model-attention Attention pattern analysis
model-logit-lens Layer-by-layer prediction tracking

Conditions detected

22 static + 6 runtime detectors covering: NaN/Inf, dead neurons, stuck gates, exploding/vanishing norms, heavy tails, norm drift, saturated weights, identical rows, attention imbalance, dtype mismatch, weight corruption, head redundancy, positional encoding issues, token collapse, gradient noise, representation drift, MoE router collapse, LoRA merge artifacts, quantization degradation, model aging, generation collapse, low coherence, activation anomalies.

Installation

pip install model-clinic          # Core (static analysis)
pip install model-clinic[hf]      # + HuggingFace runtime analysis
pip install model-clinic[all]     # Everything

Stats

  • 22 static detectors, 6 runtime detectors
  • 699 tests passing (v0.4.0)
  • 22 CLI commands
  • ~50 public API exports
  • Works on any .pt, .pth, .safetensors, or HuggingFace model

License

MIT

Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support