Tiny-ML Leaderboard

Sub-100M parameter language models, same eval harness, transparent methodology.

Why this exists. The community deserves a single place to compare tiny LMs fairly. We include every model with verifiable benchmarks — ours, our competitors', yours. Submit a model via PR.

Detailed Results

Model Org Params WikiText-2 ↓ BLiMP ↑ ARC-Easy ↑ Training Tokens Links
Supra-50M-Instruct SupraLabs 51.8M not reported 76.3% 52.2% 20B
GPT-S-5M Axiomic Labs 5.16M 2.56 72.27% 35.69% 25B
Glint-1.3 (merged) CompactAI 982K 3.08 68.7% 32.5% 100B
Supra-Mini-v5 SupraLabs 7.87M 2.66 63.5% 34.4%
Glint-1 CompactAI 1M 4.07 61.2% 32.0% 100B
Supra-Mini-v4 SupraLabs 2.62M 3.17 60.7% 31.5%
Glint-0.4 CompactAI 1M 5.24 58.5% 31.0% 10B
Supra-Mini-v3 SupraLabs 468K 4.49 55.3% 27.3%
Supra-Mini-v2 SupraLabs 168K 7.79 53.5% 26.8%
Glint-0.2 CompactAI 1M TBD 49.8% 27.0% ~100M
Glint-0.3 CompactAI 1M TBD 47.3% 25.5% ~100M
Glint-0.1 CompactAI 1M TBD 46.7% 21.0% ~100M
Shard-1 CompactAI 54.5M TBD TBD TBD ~20B
StorySupra-10M SupraLabs 12.6M not reported not reported not reported
DistillSupra-0.2M SupraLabs 289K not reported not reported not reported
MicroSupra-1k SupraLabs 1K not reported not reported not reported
TrueMath CompactAI 1-layer synthetic

Benchmark Overview

CompactAI SupraLabs Axiomic Labs

BLiMP ↑ (higher is better)

ARC-Easy ↑ (higher is better)

WikiText-2 ↓ (lower is better)

Add your model

Open a PR on this Space with your model's benchmark results and reproduction steps. We require: params, training data provenance, eval harness used, and scores for at least 2 of the 3 benchmarks.