Outlier-70B-V3.2

⚠️ Status: SUPERSEDED BY Outlier-70B-V3.3.

Use V3.3 instead — same base weights, +1.61pp MMLU, 4× longer context, 15 KB upgrade overlay.

A ternary-quantized Mixture-of-Experts model built as a delta overlay on Qwen/Qwen2.5-32B-Instruct. This V3.2 repo is preserved as the base weight source for V3.3 — the V3.3 release uses these exact weights plus a small alpha overlay file.

Why upgrade to V3.3?

V3.2 V3.3 (alpha-fixed + YaRN 4x)
MMLU 81.49% (day12, possibly stale) 83.10% ± 0.30%
Context length 32,768 131,072 (4× longer)
Weight files to download full full + 15 KB overlay
Loading overhead none 1 line of Python

The V3.3 upgrade is a 280-scalar trained overlay (15 KB total) plus a rope_scaling config patch. Zero retraining of the weights themselves — V3.3 reads V3.2's weights from this repo and applies the overlay at load time.

Model summary

Field Value
Base model Qwen/Qwen2.5-32B-Instruct
Architecture Outlier MoE (overlay on Qwen2.5)
Parameters ~40B effective
Context length 32,768 tokens
MoE layers 35 (indices 14–48)
Experts per layer 8
Top-k routing 2
Expert quantization Ternary (int8 + per-row fp16 scale)
MMLU (day12, [SUPERSEDED]) 81.49% ± 0.31%

Migration to V3.3

from transformers import AutoModelForCausalLM
from outlier.runtime.alpha_loader import load_alpha_overlay

# V3.3: same V3.2 base weights + alpha overlay + YaRN 4x rope scaling
model = AutoModelForCausalLM.from_pretrained(
    "Outlier-Ai/Outlier-70B-V3.3",  # ← new repo, points back at V3.2 weights
    trust_remote_code=True,
    torch_dtype="bfloat16",
    device_map="auto",
)
load_alpha_overlay(model, model_tag="70b_v3.3")  # ← 280-scalar fix, 1 line

Provenance (V3.2 day12, now superseded)

Metric Value
MMLU 81.49% ± 0.31%
Sample size (n) 14,042
Harness lm_eval 0.4.11
Date measured Day 12 cluster sprint (2026-04-12)
Source file cloud_sprint_day12/results/70b_v3_2_mmlu_full.json
Status [SUPERSEDED BY V3.3] per OUTLIER_GROUND_TRUTH_v10.md §2.3

The day12 measurement is also under suspicion of measurement artifact (similar to the day12→day13 drift seen on 150B). It has not been re-measured under the current lm_eval 0.4.9.1. See v10 §2.4 for context.

License

Apache 2.0

Downloads last month
765
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Outlier-Ai/Outlier-70B-V3.2

Base model

Qwen/Qwen2.5-32B
Finetuned
(1216)
this model
Finetunes
1 model