Tiny-ML Leaderboard

Sub-100M parameter language models, same eval harness, transparent methodology.

Why this exists. The community deserves a single place to compare tiny LMs fairly. We include every model with verifiable benchmarks — ours, our competitors', yours. Submit a model via PR.

Detailed Results

Model	Org	Params	WikiText-2 ↓	BLiMP ↑	ARC-Easy ↑	Training Tokens	Links
Supra-50M-Instruct	SupraLabs	51.8M	not reported	76.3%	52.2%	20B	card base
GPT-S-5M	Axiomic Labs	5.16M	2.56	72.27%	35.69%	25B	card
Glint-1.3 (merged)	CompactAI	982K	3.08	68.7%	32.5%	100B	card
Supra-Mini-v5	SupraLabs	7.87M	2.66	63.5%	34.4%	—	card
Glint-1	CompactAI	1M	4.07	61.2%	32.0%	100B	card
Supra-Mini-v4	SupraLabs	2.62M	3.17	60.7%	31.5%	—	card
Glint-0.4	CompactAI	1M	5.24	58.5%	31.0%	10B	card
Supra-Mini-v3	SupraLabs	468K	4.49	55.3%	27.3%	—	card
Supra-Mini-v2	SupraLabs	168K	7.79	53.5%	26.8%	—	card
Glint-0.2	CompactAI	1M	TBD	49.8%	27.0%	~100M	card
Glint-0.3	CompactAI	1M	TBD	47.3%	25.5%	~100M	card
Glint-0.1	CompactAI	1M	TBD	46.7%	21.0%	~100M	card
Shard-1	CompactAI	54.5M	TBD	TBD	TBD	~20B	card
StorySupra-10M	SupraLabs	12.6M	not reported	not reported	not reported	—	card
DistillSupra-0.2M	SupraLabs	289K	not reported	not reported	not reported	—	card
MicroSupra-1k	SupraLabs	1K	not reported	not reported	not reported	—	card
TrueMath	CompactAI	1-layer	—	—	—	synthetic	card

Benchmark Overview

CompactAI SupraLabs Axiomic Labs

BLiMP ↑ (higher is better)

ARC-Easy ↑ (higher is better)

WikiText-2 ↓ (lower is better)

Add your model

Open a PR on this Space with your model's benchmark results and reproduction steps. We require: params, training data provenance, eval harness used, and scores for at least 2 of the 3 benchmarks.