Outlier-150B-V3.2

Status: production. MMLU re-verified Day 13 at 84.46%. Flagship Outlier release.

The largest Outlier model. Built on Qwen/Qwen2.5-72B-Instruct with the ReXMoE architecture — cross-layer expert sharing, where experts are shared across groups of layers via PSR (per-scale-residual) variants. 88 unique experts shared via 44 routers across 11 groups × 4 PSR variants.

Model summary

Field Value
Base model Qwen/Qwen2.5-72B-Instruct
Architecture Outlier ReXMoE (cross-layer expert sharing)
Parameters ~150B effective
Context length 32,768 tokens
MoE layers (see config.json)
Unique experts 88 (shared across 44 routers)
Expert groups 11
PSR variants 4
Expert quantization Ternary (int8 + per-row fp16 scale)
MMLU (full sample, day13) 84.46% ± 0.29%

Provenance

Metric Value
MMLU 84.46% ± 0.29%
Sample size (n) 14,042
Stderr ±0.0029
Harness lm_eval 0.4.9.1
Date measured 2026-04-14 (Day 13 cluster sprint)
Hardware 2× NVIDIA B200 SXM6
Source file phase8_upgraded_150b_full.json
Source SHA256 5db066e5574e6bc1e3f1dec452098aa6d1be44333e7ea32f9561288babb3b228

Full provenance in OUTLIER_GROUND_TRUTH_v10.md §2.4.

Day 12 → Day 13 measurement drift (documented, unresolved)

Day Harness MMLU
Day 12 lm_eval 0.4.11 83.16% ± 0.31%
Day 13 lm_eval 0.4.9.1 84.46% ± 0.29%
Drift — +1.30pp

Day 13 is accepted as canonical. The 1.30pp drift between harness versions has not been root-caused — it could be an lm_eval version difference (known small differences in MMLU prompt formatting across versions), a transformers point-release difference, or a Day 12 measurement artifact. We did not rerun Day 12's exact pipeline to pin down the cause. See v10 §2.4 for the methodology notes.

V3.3 status

A 150B alpha-fix overlay has not yet been trained. The recipe that worked for 70B (280 trained alpha scalars, ~30 minutes of cloud compute) is expected to transfer. Future sprint. Until then, the production 150B is V3.2 with the day13 number above.

A 128K-context variant (150B + YaRN 4×) is pending separate release — the YaRN config patch is verified safe on 70B; the same patch should work on 150B identically but has not been published as a separate repo yet.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Outlier-Ai/Outlier-150B-V3.2", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    "Outlier-Ai/Outlier-150B-V3.2",
    trust_remote_code=True,
    torch_dtype="bfloat16",
    device_map="auto",  # 150B requires multi-GPU; ~280 GB bf16
)

inputs = tokenizer("The capital of France is", return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=20, do_sample=False)
print(tokenizer.decode(out[0]))

Hardware notes: 150B at bf16 needs ~280 GB of VRAM. We've successfully run it on 2× NVIDIA B200 (180 GB each, 360 GB total). Single-GPU bf16 inference is not currently feasible without further quantization.

Limitations

  • Secondary benchmarks (HellaSwag, ARC, TruthfulQA, WinoGrande) are [UNVERIFIED]. Day 13 sprint was trimmed to MMLU-only due to throughput constraints.
  • 128K context via YaRN 4× verified on 70B; not yet packaged as a separate 150B release.
  • V3.3 alpha-fix overlay not trained for 150B yet.

License

Apache 2.0

Downloads last month
806
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Outlier-Ai/Outlier-150B-V3.2

Base model

Qwen/Qwen2.5-72B
Finetuned
(64)
this model