Outlier-Ai

Ternary-quantized Mixture-of-Experts for consumer hardware. 3 patents filed. 14 days solo from zero to 150B.

Outlier is a research project building dense LLM-quality models on top of Qwen2.5 via ternary-quantized delta MoE experts. The architecture stores weights as {-1, 0, +1} (~1.58 bits) plus a per-row fp16 scale, achieving 6×–8Γ— memory reduction over fp16 while preserving accuracy.

Model lineup

Model MMLU Context Status Effective params
Outlier-10B-V3.2 β€” 32K research preview ~23B
Outlier-40B-V3.2 77.80% 32K production ~30B
Outlier-70B-V3.3 ⭐ 83.10% 128K production (new) ~40B
Outlier-150B-V3.2 84.46% 32K production ~150B

⭐ V3.3 is V3.2 base weights + a 280-scalar trained alpha overlay (15 KB) + YaRN 4Γ— context extension. Same weights as V3.2, +1.61pp MMLU, 4Γ— longer context.

Architecture

  • Base: Qwen2.5 family (7B / 14B / 32B / 72B for 10B / 40B / 70B / 150B respectively)
  • MoE delta: Ternary-quantized expert weights stored as int8 sign Γ— fp16 per-row scale, summed with the shared base FFN output via per-expert alpha contribution scalars
  • Routing: Per-layer router (top-k = 2, n_experts = 8 typically)
  • 150B special: Cross-layer expert sharing (ReXMoE) β€” 88 unique experts shared across 44 routers via 11 groups Γ— 4 PSR variants
  • Training: CAKLD (combined adaptive knowledge distillation) loss, alpha-gated delta updates, frozen base
  • Quantization: Tequila adaptive deadzone for ternary, LoTA-QAF for activation quantization

Patents (filed)

  1. Per-channel ternary scale recalibration β€” adaptive per-output-channel scaling for ternary weights
  2. Cross-layer expert sharing (ReXMoE) β€” used in Outlier-150B
  3. Alpha contribution overlay β€” the V3.3 fix; 280 trained scalars recover a 1.34pp MMLU regression on 70B with 250,000Γ— fewer trainable parameters than full LoRA

Tagline

Built in 14 days on $900 and a Mac Studio.

The full Outlier project went from a blank repo to a 150B model with verified MMLU on April 2026 by a single developer running cloud sprints between Mac Studio sessions. Total cloud spend through V3.3: ~$300. Total wall clock: 14 days.

Resources

License

All Outlier model weights and code are released under Apache 2.0.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support