toolася
Collection
toolcalling sft+grpo+specdecoding • 3 items • Updated
Qwen3-8B fine-tuned for precise function calling via SFT + GRPO on the ToolACE dataset.
# BF16
vllm serve kenkaneki/Qwen3-8B-ToolACE --enable-auto-tool-choice --tool-call-parser hermes
# FP8 dynamic (recommended)
vllm serve kenkaneki/Qwen3-8B-ToolACE --quantization fp8 --enable-auto-tool-choice --tool-call-parser hermes
# With EAGLE-3 speculative decoding (1.8x speedup)
vllm serve kenkaneki/Qwen3-8B-ToolACE --speculative-config '{"model":"kenkaneki/Qwen3-8B-ToolACE-speculator.eagle3","num_speculative_tokens":3,"method":"eagle3"}' --enable-auto-tool-choice --tool-call-parser hermes
| Config | c=1 E2EL p50 | c=1 tok/s | c=32 tok/s |
|---|---|---|---|
| BF16 | 323.9 ms | 150.4 | 2293 |
| FP8 dynamic | 222.9 ms | 217.7 | 3268 |
| EAGLE3 FT | 175.4 ms | 271.3 | 4378 |
Hardware: NVIDIA H100 80GB. Training time: ~47 min (27 min SFT + 20 min GRPO).
Base model
Qwen/Qwen3-8B-Base