Outlier-40B-V3.2
Status: production. MMLU verified, full sample.
A ternary-quantized Mixture-of-Experts model built as a delta overlay on Qwen/Qwen2.5-14B-Instruct. Production-grade for users who need a smaller-than-70B Outlier deployment.
Model summary
| Field | Value |
|---|---|
| Base model | Qwen/Qwen2.5-14B-Instruct |
| Architecture | Outlier MoE (overlay on Qwen2.5) |
| Parameters | ~30B effective |
| Context length | 32,768 tokens |
| MoE layers | (see config.json — moe_layers field) |
| Experts per layer | 8 |
| Top-k routing | 2 |
| Expert quantization | Ternary (int8 + per-row fp16 scale) |
| MMLU (full sample) | 77.80% ± 0.33% |
Provenance
| Metric | Value |
|---|---|
| MMLU | 77.80% ± 0.33% |
| Sample size (n) | 14,042 |
| Stderr | ±0.0033 |
| Harness | lm_eval 0.4.11 |
| Date measured | Day 12 cluster sprint (2026-04-12) |
| Source file | cloud_sprint_day12/results/40b_v3_2_mmlu_full.json |
See OUTLIER_GROUND_TRUTH_v10.md §2.2 for full provenance chain.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("Outlier-Ai/Outlier-40B-V3.2", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
"Outlier-Ai/Outlier-40B-V3.2",
trust_remote_code=True,
torch_dtype="bfloat16",
device_map="auto",
)
inputs = tokenizer("The capital of France is", return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=20, do_sample=False)
print(tokenizer.decode(out[0]))
V3.3 status
V3.3 alpha-fix overlay has not yet been trained for 40B. The recipe that worked for 70B (280 trained alpha scalars, ~30 minutes of cloud compute) is expected to transfer. Future sprint will train and verify.
License
Apache 2.0
- Downloads last month
- 770