EnvScaler-Qwen3-4B
Model Description
EnvScaler-Qwen3-4B is a tool-enhanced language model based on Qwen3-4B (Thinking Mode), trained using the EnvScaler framework for tool-interactive agent tasks. This model has been trained through Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL).
Training Process
This model was trained using a two-stage approach:
Stage 1: Supervised Fine-Tuning (SFT)
- Training Data: 9,022 trajectories from agent-environment interactions
- Data Source: EnvScaler-SFT-Traj-9K
- Scenarios: 4,684 SFT scenarios from EnvScaler-SFT-Scenario
- Environments: 141 synthesized environments from EnvScaler-191-Env
Stage 2: Reinforcement Learning (RL)
- Training Data: 2,550 RL scenarios from EnvScaler-RL-Scenario
- Environments: 50 synthesized environments from EnvScaler-191-Env
- Framework: Based on the ROLL framework
The training process enables the model to learn from both demonstration trajectories (SFT) and reinforcement signals (RL), resulting in improved performance on complex tool-interactive tasks.
How to Use
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "XXHStudyHard/EnvScaler-Qwen3-4B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Use with function calling interface
# See EnvScaler project for full interaction examples
With EnvScaler Framework
For full integration with tool-interactive environments, please refer to the EnvScaler project documentation.
Related Resources
- Project Homepage: EnvScaler GitHub
- Paper: arXiv
- Training Datasets:
- Other Models:
Citation
If you use this model, please cite our work:
@misc{song2026envscalerscalingtoolinteractiveenvironments,
title={EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis},
author={Xiaoshuai Song and Haofei Chang and Guanting Dong and Yutao Zhu and Zhicheng Dou and Ji-Rong Wen},
year={2026},
eprint={2601.05808},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2601.05808},
}
License
This model is licensed under the Apache 2.0 License, following the base Qwen3 model license.
Contact
For any questions or feedback, please contact: songxiaoshuai@ruc.edu.cn
- Downloads last month
- 18