Outlier-70B-V3.2
⚠️ Status: SUPERSEDED BY Outlier-70B-V3.3.
Use V3.3 instead — same base weights, +1.61pp MMLU, 4× longer context, 15 KB upgrade overlay.
A ternary-quantized Mixture-of-Experts model built as a delta overlay on Qwen/Qwen2.5-32B-Instruct. This V3.2 repo is preserved as the base weight source for V3.3 — the V3.3 release uses these exact weights plus a small alpha overlay file.
Why upgrade to V3.3?
| V3.2 | V3.3 (alpha-fixed + YaRN 4x) | |
|---|---|---|
| MMLU | 81.49% (day12, possibly stale) | 83.10% ± 0.30% |
| Context length | 32,768 | 131,072 (4× longer) |
| Weight files to download | full | full + 15 KB overlay |
| Loading overhead | none | 1 line of Python |
The V3.3 upgrade is a 280-scalar trained overlay (15 KB total) plus a rope_scaling config patch. Zero retraining of the weights themselves — V3.3 reads V3.2's weights from this repo and applies the overlay at load time.
Model summary
| Field | Value |
|---|---|
| Base model | Qwen/Qwen2.5-32B-Instruct |
| Architecture | Outlier MoE (overlay on Qwen2.5) |
| Parameters | ~40B effective |
| Context length | 32,768 tokens |
| MoE layers | 35 (indices 14–48) |
| Experts per layer | 8 |
| Top-k routing | 2 |
| Expert quantization | Ternary (int8 + per-row fp16 scale) |
| MMLU (day12, [SUPERSEDED]) | 81.49% ± 0.31% |
Migration to V3.3
from transformers import AutoModelForCausalLM
from outlier.runtime.alpha_loader import load_alpha_overlay
# V3.3: same V3.2 base weights + alpha overlay + YaRN 4x rope scaling
model = AutoModelForCausalLM.from_pretrained(
"Outlier-Ai/Outlier-70B-V3.3", # ← new repo, points back at V3.2 weights
trust_remote_code=True,
torch_dtype="bfloat16",
device_map="auto",
)
load_alpha_overlay(model, model_tag="70b_v3.3") # ← 280-scalar fix, 1 line
Provenance (V3.2 day12, now superseded)
| Metric | Value |
|---|---|
| MMLU | 81.49% ± 0.31% |
| Sample size (n) | 14,042 |
| Harness | lm_eval 0.4.11 |
| Date measured | Day 12 cluster sprint (2026-04-12) |
| Source file | cloud_sprint_day12/results/70b_v3_2_mmlu_full.json |
| Status | [SUPERSEDED BY V3.3] per OUTLIER_GROUND_TRUTH_v10.md §2.3 |
The day12 measurement is also under suspicion of measurement artifact (similar to the day12→day13 drift seen on 150B). It has not been re-measured under the current lm_eval 0.4.9.1. See v10 §2.4 for context.
License
Apache 2.0
- Downloads last month
- 765