Outlier-10B-V2

⚠️ Legacy model — superseded by Outlier-10B-V3.2

This is an early Outlier release. It is kept publicly available for historical reference and reproducibility. New users should prefer the V3.2 model linked above.

What this was

Outlier-10B-V2 was an early Outlier release based on Qwen2.5-7B-Instruct with 7.6B (legacy V2 dense) total parameters. It has been superseded by the V3.2 architecture which adds significant improvements in training methodology and runtime efficiency.

What's new in V3.2

  • Zero-delta expert initialization (faster convergence)
  • CAKLD distillation training
  • Three-tier paged runtime
  • Cross-layer expert prefetch
  • Alpha-only TTT for personalization

See Outlier-10B-V3.2 for the latest.

Architecture

Outlier uses a shared expert + ternary delta expert architecture:

  • Shared expert: The full base model serves as a shared dense expert
  • Ternary delta experts: Additional experts stored at 1.58 bits/weight using ternary quantization ({-1, 0, +1})
  • Dense-Sparse-Dense (DSD) layer pattern: Alternating dense and sparse layers for efficient compute
  • Zero-delta initialization: Experts initialized to zero so training begins from the base model
  • Top-2 routing: Each token activates the shared expert plus the top-2 ternary delta experts
  • Three-tier paged runtime: GPU → CPU → disk paging for consumer hardware deployment
  • Cross-layer expert prefetch: Prefetches next-layer experts during current-layer compute

License

Apache 2.0. The base model (Qwen2.5-7B-Instruct) was created by Alibaba Cloud and is used under its original license terms.

Built by

Matt Kerr · Kerr & Company LLC · Grand Rapids, MI

Downloads last month
675
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Outlier-Ai/Outlier-10B-V2

Base model

Qwen/Qwen2.5-7B
Finetuned
(3192)
this model
Quantizations
1 model

Dataset used to train Outlier-Ai/Outlier-10B-V2