Amity Sigma Thinking v3r

A 4-billion parameter model fine-tuned for agentic customer service tasks, achieving 78.51% Pass@1 on the Tau2-Bench retail domain - outperforming models 50-400x its size.

Model Highlights

Metric Value
Parameters 4B
Tau2-Bench Retail Pass@1 78.51%
Base Model Qwen3-4B-Thinking
Training ROAD + GRPO Fine-tuning

Performance Comparison

Tau2-Bench Retail Domain Pass@1 scores:

Model Size Retail Pass@1
Claude Opus 4 ~200B 81.4%
Claude Sonnet 4 ~70-100B 80.5%
Amity Sigma v3r 4B 78.51%
Qwen3-Max ~340-407B 75.44%
GPT-4.1 ~1.7T 74.0%
Claude-3.7-Sonnet ~50-70B 72.1%
o4-mini ~25-50B 68.3%
GPT-4.1-mini ~8-20B 61.4%

Key Achievements

  • Outperforms GPT-4.1 (1.7T parameters) by 4.5 percentage points
  • Outperforms Qwen3-Max (340-407B parameters) by 3.1 percentage points
  • Only 2.9% behind Claude Opus 4, which is approximately 50x larger

Full Tau2-Bench Results

Domain Pass@1 Pass@2 Pass@3 Pass@4
Retail 78.51% 67.40% 60.53% 56.14%
Airline 55.50% 45.00% 38.50% 34.00%
Telecom 32.89% 24.71% 19.96% 16.67%

Usage

vLLM Serving (Recommended)

vllm serve amityco/amity-sigma-thinking-v3r \
    --reasoning-parser deepseek_r1 \
    --enable-auto-tool-choice \
    --tool-call-parser hermes \
    --max-model-len 45000 \
    --trust-remote-code

For multi-GPU setups:

vllm serve amityco/amity-sigma-thinking-v3r \
    --reasoning-parser deepseek_r1 \
    --enable-auto-tool-choice \
    --tool-call-parser hermes \
    --max-model-len 45000 \
    --trust-remote-code \
    --tensor-parallel-size 8

Hardware Requirements

Setup GPU Requirements Notes
Minimum 1x A100 40GB Full precision
Alternative 2x RTX 4090 Consumer GPUs
Recommended 1x A100 80GB Best performance

Training Methodology

Qwen3-4B-Thinking (Base)
    ↓
ROAD Optimization (Policy Enhancement)
    ↓
Synthetic Data Generation (Amity Sigma)
    ↓
Human-in-the-Loop Feedback
    ↓
GRPO Fine-tuning
    ↓
Amity Sigma v3r

Training Details

  • Base Model: Qwen3-4B-Thinking
  • Optimization: ROAD (Auto Optimization for Agentic Tasks)
  • Fine-tuning: GRPO on multi-turn interaction data
  • Data: Synthetic customer service conversations with human-in-the-loop feedback

Cost Efficiency

Estimated Cost per 1,000 Customer Interactions

Model Estimated Cost Comparison
Amity Sigma v3r (self-hosted) ~$1-2 Baseline
Claude Sonnet 4 ~$6-10 4-6x more
GPT-4.1 ~$15-25 10-15x more
Claude Opus 4 ~$30-50 20-30x more

Self-Hosting Benefits

  • Run on modest hardware (single A100 or 2x RTX 4090)
  • Predictable costs without per-token pricing
  • Data privacy - no external API calls
  • Low latency with local inference

Intended Use

This model is designed for:

  • Customer service automation
  • Order management and support
  • Multi-turn conversational agents
  • Tool-calling and function execution

Limitations

  • Optimized primarily for retail domain; airline and telecom performance is lower
  • Requires vLLM with specific configuration for optimal tool-calling
  • Best suited for English language interactions

Links

Citation

@misc{amity-sigma-v3r,
  title={Amity Sigma Thinking v3r: Efficient Agentic Model for Customer Service},
  author={Amity},
  year={2026},
  url={https://huggingface.co/amityco/amity-sigma-thinking-v3r}
}

References

Contact

For questions or feedback: touchapon@amity.co

Downloads last month
3
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for amityco/amity-sigma-thinking-v3r

Base model

Qwen/Qwen3-4B-Base
Finetuned
Qwen/Qwen3-4B
Finetuned
(397)
this model

Paper for amityco/amity-sigma-thinking-v3r