Outlier-40B-V3.2

Status: production. MMLU verified, full sample.

A ternary-quantized Mixture-of-Experts model built as a delta overlay on Qwen/Qwen2.5-14B-Instruct. Production-grade for users who need a smaller-than-70B Outlier deployment.

Model summary

Field	Value
Base model	`Qwen/Qwen2.5-14B-Instruct`
Architecture	Outlier MoE (overlay on Qwen2.5)
Parameters	~30B effective
Context length	32,768 tokens
MoE layers	(see config.json — moe_layers field)
Experts per layer	8
Top-k routing	2
Expert quantization	Ternary (int8 + per-row fp16 scale)
MMLU (full sample)	77.80% ± 0.33%

Provenance

Metric	Value
MMLU	77.80% ± 0.33%
Sample size (n)	14,042
Stderr	±0.0033
Harness	`lm_eval` 0.4.11
Date measured	Day 12 cluster sprint (2026-04-12)
Source file	`cloud_sprint_day12/results/40b_v3_2_mmlu_full.json`

See OUTLIER_GROUND_TRUTH_v10.md §2.2 for full provenance chain.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Outlier-Ai/Outlier-40B-V3.2", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    "Outlier-Ai/Outlier-40B-V3.2",
    trust_remote_code=True,
    torch_dtype="bfloat16",
    device_map="auto",
)

inputs = tokenizer("The capital of France is", return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=20, do_sample=False)
print(tokenizer.decode(out[0]))

V3.3 status

V3.3 alpha-fix overlay has not yet been trained for 40B. The recipe that worked for 70B (280 trained alpha scalars, ~30 minutes of cloud compute) is expected to transfer. Future sprint will train and verify.

License

Apache 2.0

Downloads last month: 770

Model tree for Outlier-Ai/Outlier-40B-V3.2

Base model

Qwen/Qwen2.5-14B

Finetuned

Qwen/Qwen2.5-14B-Instruct

Finetuned

(383)

this model