ROAD: Reflective Optimization via Automated Debugging for Zero-Shot Agent Alignment
Paper
•
2512.24040
•
Published
A 4-billion parameter model fine-tuned for agentic customer service tasks, achieving 78.51% Pass@1 on the Tau2-Bench retail domain - outperforming models 50-400x its size.
| Metric | Value |
|---|---|
| Parameters | 4B |
| Tau2-Bench Retail Pass@1 | 78.51% |
| Base Model | Qwen3-4B-Thinking |
| Training | ROAD + GRPO Fine-tuning |
Tau2-Bench Retail Domain Pass@1 scores:
| Model | Size | Retail Pass@1 |
|---|---|---|
| Claude Opus 4 | ~200B | 81.4% |
| Claude Sonnet 4 | ~70-100B | 80.5% |
| Amity Sigma v3r | 4B | 78.51% |
| Qwen3-Max | ~340-407B | 75.44% |
| GPT-4.1 | ~1.7T | 74.0% |
| Claude-3.7-Sonnet | ~50-70B | 72.1% |
| o4-mini | ~25-50B | 68.3% |
| GPT-4.1-mini | ~8-20B | 61.4% |
| Domain | Pass@1 | Pass@2 | Pass@3 | Pass@4 |
|---|---|---|---|---|
| Retail | 78.51% | 67.40% | 60.53% | 56.14% |
| Airline | 55.50% | 45.00% | 38.50% | 34.00% |
| Telecom | 32.89% | 24.71% | 19.96% | 16.67% |
vllm serve amityco/amity-sigma-thinking-v3r \
--reasoning-parser deepseek_r1 \
--enable-auto-tool-choice \
--tool-call-parser hermes \
--max-model-len 45000 \
--trust-remote-code
For multi-GPU setups:
vllm serve amityco/amity-sigma-thinking-v3r \
--reasoning-parser deepseek_r1 \
--enable-auto-tool-choice \
--tool-call-parser hermes \
--max-model-len 45000 \
--trust-remote-code \
--tensor-parallel-size 8
| Setup | GPU Requirements | Notes |
|---|---|---|
| Minimum | 1x A100 40GB | Full precision |
| Alternative | 2x RTX 4090 | Consumer GPUs |
| Recommended | 1x A100 80GB | Best performance |
Qwen3-4B-Thinking (Base)
↓
ROAD Optimization (Policy Enhancement)
↓
Synthetic Data Generation (Amity Sigma)
↓
Human-in-the-Loop Feedback
↓
GRPO Fine-tuning
↓
Amity Sigma v3r
| Model | Estimated Cost | Comparison |
|---|---|---|
| Amity Sigma v3r (self-hosted) | ~$1-2 | Baseline |
| Claude Sonnet 4 | ~$6-10 | 4-6x more |
| GPT-4.1 | ~$15-25 | 10-15x more |
| Claude Opus 4 | ~$30-50 | 20-30x more |
This model is designed for:
@misc{amity-sigma-v3r,
title={Amity Sigma Thinking v3r: Efficient Agentic Model for Customer Service},
author={Amity},
year={2026},
url={https://huggingface.co/amityco/amity-sigma-thinking-v3r}
}
For questions or feedback: touchapon@amity.co