PruneHeal-13M

13.2M parameter language model trained with Prune-Heal methodology on a single RTX 3090.

The smallest model you will find with real benchmark scores.

Benchmark Results (lm-evaluation-harness, 0-shot)

Benchmark Metric Score Random Baseline
PIQA acc 55.98% 50%
WinoGrande acc 50.28% 50%
BoolQ acc 46.02% 50%
ARC-Easy acc 32.79% 25%
HellaSwag acc_norm 25.22% 25%
ARC-Challenge acc_norm 20.73% 25%

What is Prune-Heal?

A training method that decouples loss from perplexity. Low loss (accurate predictions) + high perplexity (broad token distributions) = a model that reasons instead of memorizes.

Training Pipeline

  1. Pretrain on 72M tokens (Wikipedia + TinyStories + Plato)
  2. Prune โ€” iterative magnitude pruning removes 37% of weights across 4 cycles
  3. Heal โ€” retrain without masks, pruned weights regenerate from gradient signal
  4. Q&A โ€” three-phase training (Q&A together, questions, answers) x3 rounds

Key Numbers

  • 13,190,784 parameters (13.2M)
  • Loss: 2.8 with Perplexity: 21+ (decoupled)
  • Training time: ~45 minutes on a single RTX 3090
  • VRAM: <2GB
  • Training data: 72M tokens (Wikipedia, TinyStories, Plato)

Architecture

Standard LLaMA architecture:

  • 6 layers, d_model=192, 6 attention heads
  • SwiGLU activation, RMSNorm
  • GPT-2 BPE tokenizer (50,257 tokens)
  • 256 token context length
  • Weight-tied embeddings

The Prune-Heal Insight

Current LLMs chase low perplexity through massive scale. PruneHeal shows that high perplexity maintained alongside low loss is the signature of reasoning rather than memorization.

A model with perplexity 20+ considers 20+ plausible continuations and selects based on context. That is choice. That is the start of reasoning.

The prune-heal cycle achieves this by:

  • Pruning disrupts memorized pathways
  • Healing allows weights to regenerate into new, more general patterns
  • The result: same parameter count, but weights that encode structure instead of sequences

Usage

Hardware

  • Single NVIDIA RTX 3090 (24GB VRAM, <2GB used)
  • 32GB RAM
  • Trained by one person in spare time

Author

James โ€” Bee Bytez

Downloads last month
3
Safetensors
Model size
13.2M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results