STT-Agent-SFT

This repository contains the STT-Agent-RL model throught online RL training based on STT-Agent-SFT.

📊 Performance on STT-Arena

Below is the overall Pass@1 performance of STT-Agent compared to other frontier models:

Ablation: Effect of Iterative Trajectory Refinement

Model	Easy	Medium	Hard	Impossible	Overall	Avg. Calls
Qwen-3-4B (baseline)	18.31	9.46	2.82	10.00	10.57	7.63
STT-Agent (w/o refine)	28.17	16.92	11.86	47.01	23.10	32.70
{model_name} (with refine)	26.76	17.41	13.56	61.11	25.11	15.30

Trajectory refinement significantly improves both accuracy and efficiency (reduces average API calls).

🚀 Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "{model_name}"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Example tool-use prompt
prompt = "User: Book the cheapest flight from PVG to CDG.\n"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0]))

🧪 Training Details

Base model: Qwen-3-4B-Base SFT: 2,212 refined trajectories RL strategy: REINFORCE++ Compute: 4× NVIDIA H200 GPUs

📄 Citation

@misc{hui2026sttarenarealisticenvironmenttoolusing,
      title={STT-Arena: A More Realistic Environment for Tool-Using with Spatio-Temporal Dynamics}, 
      author={Tingfeng Hui and Hao Xu and Pengyu Zhu and Hongsheng Xin and Kun Zhan and Sen Su and Chunxiao Liu and Ning Miao},
      year={2026},
      eprint={2605.18548},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2605.18548}, 
}

Downloads last month: 1

Safetensors

Model size

196k params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Miaow-Lab/STT-Agent-RL

STT-Arena

Collection

benchmark data, training data, and STT-Agent from our paper "STT-Arena: A More Realistic Environment for Tool-Using with Spatio-Temporal Dynamics" • 4 items • Updated about 18 hours ago • 1

Paper for Miaow-Lab/STT-Agent-RL

STT-Arena: A More Realistic Environment for Tool-Using with Spatio-Temporal Dynamics

Paper • 2605.18548 • Published 2 days ago • 1