Gemma4-39B-A6B Preview

gemma4-39b-a6b-preview is a Gemma4 MoE expansion of unsloth/Gemma-4-26B-A4B-it, built with an internal pre-training/post-training pipeline targeting software engineering, repository reasoning, agentic workflows, and general instruction following.

This preview build was produced on a 2-GPU local training setup. A larger 52B build is planned to follow soon as the more stable release line.

Model Details

  • Base model: unsloth/Gemma-4-26B-A4B-it
  • Architecture: Gemma4 MoE
  • Total logical parameters: approximately 38.7B
  • Active parameters: approximately 5.9B at the default active expert budget
  • Expert layout: 128 base experts + 64 selected specialist experts
  • Context target: up to 131k tokens in vLLM serving
  • Primary focus: SWE, code/repository analysis, agentic traces, and reasoning
  • Recommended temperature: 0.0 to 0.7 for agentic/tool use

Serving

Use the included Gemma4 chat template. Thinking should be enabled for best agentic behavior.

CUDA_VISIBLE_DEVICES=0 \
VLLM_ALLOW_INSECURE_SERIALIZATION=1 \
vllm serve . \
  --served-model-name gemma4-39b-a6b-preview \
  --host 0.0.0.0 \
  --port 23333 \
  --max-model-len 131072 \
  --tensor-parallel-size 1 \
  --gpu-memory-utilization 0.90 \
  --trust-remote-code \
  --chat-template ./chat_template.jinja \
  --default-chat-template-kwargs '{"enable_thinking": true}' \
  --enable-auto-tool-choice \
  --tool-call-parser gemma4 \
  --reasoning-parser gemma4

OpenAI-compatible endpoint:

http://localhost:23333/v1

Validation

Validated through Gemma4-native vLLM serving with thinking enabled across chat, SWE-style prompts, reasoning, parsed tool calls, and post-tool final answers.

Notes

This is a full model checkpoint, not a LoRA adapter.

Downloads last month
17
Safetensors
Model size
22B params
Tensor type
F32
BF16
F8_E4M3
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support