gpt-oss-180b-goomba

gpt-oss-180b-goomba is an agentic coding model derived from GPT-OSS 120B.

Goomba expands the GPT-OSS 120B base with additional specialist MoE capacity and is intended for agentic coding, repository work, SWE-style tasks, and tool-using automation.

Goomba is the first release in this line to feature a new post-training data formulation. It is completely different from the previous releases and is much stronger at tool calling, raw SWE-style coding, and math-assisted reasoning.

This model was trained on just two GPUs.

Overview

  • Base model: openai/gpt-oss-120b
  • Approx total parameters: 181B
  • Approx active parameters: 16.5B per token at top-k=16
  • Total expert rows: 200
  • Added specialist experts: 72
  • Format: MXFP4
  • Out-of-box active experts: top-k=16
  • Intended use: agentic coding, SWE-style workflows, repository exploration, tool-using automation, raw SWE coding, math-assisted coding
  • Status: research preview

Recommended vLLM

This model was primarily tested with vLLM using the GPT-OSS reasoning parser and OpenAI tool-call parser.

vllm serve /path/to/model \
  --served-model-name vllm/doobee \
  --tensor-parallel-size 2 \
  --max-model-len 60000 \
  --gpu-memory-utilization 0.88 \
  --enforce-eager \
  --trust-remote-code \
  --reasoning-parser openai_gptoss \
  --tool-call-parser openai \
  --enable-auto-tool-choice

Recommended parameters:

  • num_experts_per_tok=16 is already set in config.json
  • tensor-parallel-size=2
  • max-model-len=60000
  • gpu-memory-utilization=0.88
  • reasoning-parser=openai_gptoss
  • tool-call-parser=openai
  • enable-auto-tool-choice

The config ships with both num_experts_per_tok=16 and experts_per_token=16, so runtimes that respect the model config should use top-k 16 automatically. If your runtime overrides or ignores those fields, pass this explicitly:

--hf-overrides '{"num_experts_per_tok": 16}'

Tool Calling

Goomba was primarily tested as an agentic coding model. Basic OpenAI-compatible tool calling is expected to work best with the vLLM GPT-OSS reasoning parser and OpenAI tool-call parser enabled.

Suggested temperatures:

  • 0.3 for steady coding-agent work
  • 0.5 for broader agentic exploration

Recommended range: 0.3-0.5.

For repository exploration tasks, use an agent prompt that asks the model to inspect subdirectories, identify entry points, and summarize the project structure rather than stopping after a single directory listing.

License

Replace the placeholder license: other metadata with the actual license you want to publish under after confirming compatibility with the base model and your added weights.

Downloads last month
28
Safetensors
Model size
187B params
Tensor type
BF16
·
U8
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for LLMWildling/gpt-oss-180b-goomba

Quantized
(107)
this model