Outlier-Ai
/

README

Model card Files Files and versions

README / README.md

ur-dad-matt's picture

docs: V3.3 lineup update

d6f1e2a verified 2 days ago

|

history blame contribute delete

2.98 kB

	---
	title: Outlier-Ai
	---

	# Outlier-Ai

	Ternary-quantized Mixture-of-Experts for consumer hardware. 3 patents filed. 14 days solo from zero to 150B.

	Outlier is a research project building dense LLM-quality models on top of Qwen2.5 via ternary-quantized delta MoE experts. The architecture stores weights as `{-1, 0, +1}` (~1.58 bits) plus a per-row fp16 scale, achieving 6×–8× memory reduction over fp16 while preserving accuracy.

	## Model lineup

	\| Model \| MMLU \| Context \| Status \| Effective params \|
	\|---\|---\|---\|---\|---\|
	\| [Outlier-10B-V3.2](https://huggingface.co/Outlier-Ai/Outlier-10B-V3.2) \| — \| 32K \| research preview \| ~23B \|
	\| [Outlier-40B-V3.2](https://huggingface.co/Outlier-Ai/Outlier-40B-V3.2) \| 77.80% \| 32K \| production \| ~30B \|
	\| [Outlier-70B-V3.3](https://huggingface.co/Outlier-Ai/Outlier-70B-V3.3) ⭐ \| 83.10% \| 128K \| production (new) \| ~40B \|
	\| [Outlier-150B-V3.2](https://huggingface.co/Outlier-Ai/Outlier-150B-V3.2) \| 84.46% \| 32K \| production \| ~150B \|

	⭐ V3.3 is V3.2 base weights + a 280-scalar trained alpha overlay (15 KB) + YaRN 4× context extension. Same weights as V3.2, +1.61pp MMLU, 4× longer context.

	## Architecture

	- Base: Qwen2.5 family (7B / 14B / 32B / 72B for 10B / 40B / 70B / 150B respectively)
	- MoE delta: Ternary-quantized expert weights stored as `int8 sign × fp16 per-row scale`, summed with the shared base FFN output via per-expert alpha contribution scalars
	- Routing: Per-layer router (top-k = 2, n_experts = 8 typically)
	- 150B special: Cross-layer expert sharing (ReXMoE) — 88 unique experts shared across 44 routers via 11 groups × 4 PSR variants
	- Training: CAKLD (combined adaptive knowledge distillation) loss, alpha-gated delta updates, frozen base
	- Quantization: Tequila adaptive deadzone for ternary, LoTA-QAF for activation quantization

	## Patents (filed)

	1. Per-channel ternary scale recalibration — adaptive per-output-channel scaling for ternary weights
	2. Cross-layer expert sharing (ReXMoE) — used in Outlier-150B
	3. Alpha contribution overlay — the V3.3 fix; 280 trained scalars recover a 1.34pp MMLU regression on 70B with 250,000× fewer trainable parameters than full LoRA

	## Tagline

	> Built in 14 days on $900 and a Mac Studio.

	The full Outlier project went from a blank repo to a 150B model with verified MMLU on April 2026 by a single developer running cloud sprints between Mac Studio sessions. Total cloud spend through V3.3: ~$300. Total wall clock: 14 days.

	## Resources

	- 📄 [Paper draft (arXiv)](#) — code 396SXN cs.LG (pending submission)
	- 🌐 [outlier.host](https://outlier.host)
	- 💻 [GitHub: Outlier-host/outlier](https://github.com/Outlier-host/outlier)
	- 📊 [v10 ground truth](https://github.com/Outlier-host/outlier/blob/main/OUTLIER_GROUND_TRUTH_v10.md) — single source of truth for every benchmark number

	## License

	All Outlier model weights and code are released under Apache 2.0.