Outlier-40B-V3.2

Status: production. MMLU verified, full sample.

A ternary-quantized Mixture-of-Experts model built as a delta overlay on Qwen/Qwen2.5-14B-Instruct. Production-grade for users who need a smaller-than-70B Outlier deployment.

Model summary

Field Value
Base model Qwen/Qwen2.5-14B-Instruct
Architecture Outlier MoE (overlay on Qwen2.5)
Parameters ~30B effective
Context length 32,768 tokens
MoE layers (see config.json — moe_layers field)
Experts per layer 8
Top-k routing 2
Expert quantization Ternary (int8 + per-row fp16 scale)
MMLU (full sample) 77.80% ± 0.33%

Provenance

Metric Value
MMLU 77.80% ± 0.33%
Sample size (n) 14,042
Stderr ±0.0033
Harness lm_eval 0.4.11
Date measured Day 12 cluster sprint (2026-04-12)
Source file cloud_sprint_day12/results/40b_v3_2_mmlu_full.json

See OUTLIER_GROUND_TRUTH_v10.md §2.2 for full provenance chain.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Outlier-Ai/Outlier-40B-V3.2", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    "Outlier-Ai/Outlier-40B-V3.2",
    trust_remote_code=True,
    torch_dtype="bfloat16",
    device_map="auto",
)

inputs = tokenizer("The capital of France is", return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=20, do_sample=False)
print(tokenizer.decode(out[0]))

V3.3 status

V3.3 alpha-fix overlay has not yet been trained for 40B. The recipe that worked for 70B (280 trained alpha scalars, ~30 minutes of cloud compute) is expected to transfer. Future sprint will train and verify.

License

Apache 2.0

Downloads last month
770
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Outlier-Ai/Outlier-40B-V3.2

Base model

Qwen/Qwen2.5-14B
Finetuned
(383)
this model