Outlier-70B-V3.2

⚠️ Status: SUPERSEDED BY Outlier-70B-V3.3.

Use V3.3 instead — same base weights, +1.61pp MMLU, 4× longer context, 15 KB upgrade overlay.

A ternary-quantized Mixture-of-Experts model built as a delta overlay on Qwen/Qwen2.5-32B-Instruct. This V3.2 repo is preserved as the base weight source for V3.3 — the V3.3 release uses these exact weights plus a small alpha overlay file.

Why upgrade to V3.3?

	V3.2	V3.3 (alpha-fixed + YaRN 4x)
MMLU	81.49% (day12, possibly stale)	83.10% ± 0.30%
Context length	32,768	131,072 (4× longer)
Weight files to download	full	full + 15 KB overlay
Loading overhead	none	1 line of Python

The V3.3 upgrade is a 280-scalar trained overlay (15 KB total) plus a rope_scaling config patch. Zero retraining of the weights themselves — V3.3 reads V3.2's weights from this repo and applies the overlay at load time.

Model summary

Field	Value
Base model	`Qwen/Qwen2.5-32B-Instruct`
Architecture	Outlier MoE (overlay on Qwen2.5)
Parameters	~40B effective
Context length	32,768 tokens
MoE layers	35 (indices 14–48)
Experts per layer	8
Top-k routing	2
Expert quantization	Ternary (int8 + per-row fp16 scale)
MMLU (day12, [SUPERSEDED])	81.49% ± 0.31%

Migration to V3.3

from transformers import AutoModelForCausalLM
from outlier.runtime.alpha_loader import load_alpha_overlay

# V3.3: same V3.2 base weights + alpha overlay + YaRN 4x rope scaling
model = AutoModelForCausalLM.from_pretrained(
    "Outlier-Ai/Outlier-70B-V3.3",  # ← new repo, points back at V3.2 weights
    trust_remote_code=True,
    torch_dtype="bfloat16",
    device_map="auto",
)
load_alpha_overlay(model, model_tag="70b_v3.3")  # ← 280-scalar fix, 1 line

Provenance (V3.2 day12, now superseded)

Metric	Value
MMLU	81.49% ± 0.31%
Sample size (n)	14,042
Harness	`lm_eval` 0.4.11
Date measured	Day 12 cluster sprint (2026-04-12)
Source file	`cloud_sprint_day12/results/70b_v3_2_mmlu_full.json`
Status	`[SUPERSEDED BY V3.3]` per `OUTLIER_GROUND_TRUTH_v10.md` §2.3

The day12 measurement is also under suspicion of measurement artifact (similar to the day12→day13 drift seen on 150B). It has not been re-measured under the current lm_eval 0.4.9.1. See v10 §2.4 for context.

License

Apache 2.0

Downloads last month: 765

Model tree for Outlier-Ai/Outlier-70B-V3.2

Base model

Qwen/Qwen2.5-32B

Finetuned

Qwen/Qwen2.5-32B-Instruct

Finetuned

(1216)

this model

Finetunes

1 model