Outlier-150B-V3.2

Status: production. MMLU re-verified Day 13 at 84.46%. Flagship Outlier release.

The largest Outlier model. Built on Qwen/Qwen2.5-72B-Instruct with the ReXMoE architecture — cross-layer expert sharing, where experts are shared across groups of layers via PSR (per-scale-residual) variants. 88 unique experts shared via 44 routers across 11 groups × 4 PSR variants.

Model summary

Field	Value
Base model	`Qwen/Qwen2.5-72B-Instruct`
Architecture	Outlier ReXMoE (cross-layer expert sharing)
Parameters	~150B effective
Context length	32,768 tokens
MoE layers	(see config.json)
Unique experts	88 (shared across 44 routers)
Expert groups	11
PSR variants	4
Expert quantization	Ternary (int8 + per-row fp16 scale)
MMLU (full sample, day13)	84.46% ± 0.29%

Provenance

Metric	Value
MMLU	84.46% ± 0.29%
Sample size (n)	14,042
Stderr	±0.0029
Harness	`lm_eval` 0.4.9.1
Date measured	2026-04-14 (Day 13 cluster sprint)
Hardware	2× NVIDIA B200 SXM6
Source file	`phase8_upgraded_150b_full.json`
Source SHA256	`5db066e5574e6bc1e3f1dec452098aa6d1be44333e7ea32f9561288babb3b228`

Full provenance in OUTLIER_GROUND_TRUTH_v10.md §2.4.

Day 12 → Day 13 measurement drift (documented, unresolved)

Day	Harness	MMLU
Day 12	`lm_eval` 0.4.11	83.16% ± 0.31%
Day 13	`lm_eval` 0.4.9.1	84.46% ± 0.29%
Drift	—	+1.30pp

Day 13 is accepted as canonical. The 1.30pp drift between harness versions has not been root-caused — it could be an lm_eval version difference (known small differences in MMLU prompt formatting across versions), a transformers point-release difference, or a Day 12 measurement artifact. We did not rerun Day 12's exact pipeline to pin down the cause. See v10 §2.4 for the methodology notes.

V3.3 status

A 150B alpha-fix overlay has not yet been trained. The recipe that worked for 70B (280 trained alpha scalars, ~30 minutes of cloud compute) is expected to transfer. Future sprint. Until then, the production 150B is V3.2 with the day13 number above.

A 128K-context variant (150B + YaRN 4×) is pending separate release — the YaRN config patch is verified safe on 70B; the same patch should work on 150B identically but has not been published as a separate repo yet.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Outlier-Ai/Outlier-150B-V3.2", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    "Outlier-Ai/Outlier-150B-V3.2",
    trust_remote_code=True,
    torch_dtype="bfloat16",
    device_map="auto",  # 150B requires multi-GPU; ~280 GB bf16
)

inputs = tokenizer("The capital of France is", return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=20, do_sample=False)
print(tokenizer.decode(out[0]))

Hardware notes: 150B at bf16 needs ~280 GB of VRAM. We've successfully run it on 2× NVIDIA B200 (180 GB each, 360 GB total). Single-GPU bf16 inference is not currently feasible without further quantization.

Limitations

Secondary benchmarks (HellaSwag, ARC, TruthfulQA, WinoGrande) are [UNVERIFIED]. Day 13 sprint was trimmed to MMLU-only due to throughput constraints.
128K context via YaRN 4× verified on 70B; not yet packaged as a separate 150B release.
V3.3 alpha-fix overlay not trained for 150B yet.

License

Apache 2.0

Downloads last month: 806

Model tree for Outlier-Ai/Outlier-150B-V3.2

Base model

Qwen/Qwen2.5-72B

Finetuned

Qwen/Qwen2.5-72B-Instruct

Finetuned

(64)

this model