gpt-oss-220a20b

gpt-oss-220a20b is a 220B-class expanded MoE model derived from openai/gpt-oss-120b. It is a new release in this model line, built for agentic software engineering, repository work, OpenAI-compatible tool use, and math-assisted coding.

The model adds roughly 100B-class specialist MoE capacity to the GPT-OSS 120B base and ships ready to run at top-k=20.

This release was developed on a two-GPU local system using a new training method targeting both continued pretraining and post-training for expanded MoE models.

Highlights

  • 220B-class expanded MoE model.
  • About 20B active parameters per token at the shipped top-k=20 setting.
  • 248 total expert rows: 128 base rows plus 120 added specialist rows.
  • Ships ready for top-20 MoE inference.
  • Tested primarily with vLLM OpenAI-compatible Chat Completions tool calling.
  • Developed for coding agents, repository exploration, SWE tasks, tool-using automation, and math-assisted coding.

Specialist Capacity Mix

The added specialist capacity is SWE-heavy, with agentic/tool-use conditioning across the specialist expansion. By primary expert role:

Specialist area Added expert rows Share of added rows
SWE / repository / coding 72 60.0%
Agentic / sequential tool-use 32 26.7%
Math / reasoning 16 13.3%

These percentages describe the high-level expert allocation.

Model Details

  • Base model: openai/gpt-oss-120b
  • Model type: expanded MoE
  • Quantization/runtime format: MXFP4
  • Total expert rows: 248
  • Added specialist rows: 120
  • Default active experts: top-k=20
  • Config fields: num_experts_per_tok=20, experts_per_token=20
  • Recommended serving stack: vLLM with GPT-OSS reasoning parser and OpenAI tool-call parser

Recommended Serving

vllm serve /path/to/gpt-oss-220a20b \
  --served-model-name vllm/doobee \
  --tensor-parallel-size 2 \
  --max-model-len 131072 \
  --gpu-memory-utilization 0.95 \
  --trust-remote-code \
  --reasoning-parser openai_gptoss \
  --tool-call-parser openai \
  --enable-auto-tool-choice

The model config sets num_experts_per_tok=20 and experts_per_token=20, so no top-k override is needed for runtimes that respect the config.

Recommended Sampling

  • Recommended temperature range: 0.7-1.0.

Tool Use

This model was selected for release after local OpenAI-compatible tool-use testing at top-k=20.

For best results, use real OpenAI-compatible tool definitions through the vLLM Chat Completions or Responses-compatible path. Avoid relying on raw text parsing of tool calls.

Intended Use

gpt-oss-220a20b is intended for:

  • agentic software engineering workflows
  • repository exploration and codebase summarization
  • SWE-style debugging and implementation tasks
  • OpenAI-compatible tool-use agents
  • math-assisted coding and reasoning
  • production agent and developer-tool deployments

Limitations

  • This is an independent expanded model derived from GPT-OSS 120B, not an official OpenAI release.
  • Tool-calling behavior depends strongly on the serving stack, chat template, tool schema, and sampling settings.

Development Note

This model was produced on a two-GPU local setup using a new training method for MoE expansion targeting continued pretraining and post-training.

Downloads last month
26
Safetensors
Model size
231B params
Tensor type
BF16
·
U8
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for LLMWildling/gpt-oss-220a20b

Quantized
(109)
this model