Qwen3-8B ToolACE (Post-GRPO)

Qwen3-8B fine-tuned for precise function calling via SFT + GRPO on the ToolACE dataset.

Training Pipeline

  1. SFT (1 epoch): LoRA r=64, assistant-only loss masking, cosine LR schedule
  2. GRPO (400 steps): DAPO loss, decomposed rewards (format 0.1, tool name 0.5, tool args 0.4)

Serving

# BF16
vllm serve kenkaneki/Qwen3-8B-ToolACE --enable-auto-tool-choice --tool-call-parser hermes

# FP8 dynamic (recommended)
vllm serve kenkaneki/Qwen3-8B-ToolACE --quantization fp8 --enable-auto-tool-choice --tool-call-parser hermes

# With EAGLE-3 speculative decoding (1.8x speedup)
vllm serve kenkaneki/Qwen3-8B-ToolACE --speculative-config '{"model":"kenkaneki/Qwen3-8B-ToolACE-speculator.eagle3","num_speculative_tokens":3,"method":"eagle3"}' --enable-auto-tool-choice --tool-call-parser hermes

Latency (H100, vLLM, ToolACE prompts)

Config c=1 E2EL p50 c=1 tok/s c=32 tok/s
BF16 323.9 ms 150.4 2293
FP8 dynamic 222.9 ms 217.7 3268
EAGLE3 FT 175.4 ms 271.3 4378

Related Models

Training

Hardware: NVIDIA H100 80GB. Training time: ~47 min (27 min SFT + 20 min GRPO).

Code: github.com/aimedvedevq/toolaceqwen

Downloads last month
46
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kenkaneki/Qwen3-8B-ToolACE

Finetuned
Qwen/Qwen3-8B
Finetuned
(1397)
this model
Finetunes
1 model
Quantizations
1 model

Dataset used to train kenkaneki/Qwen3-8B-ToolACE

Collection including kenkaneki/Qwen3-8B-ToolACE