EnvScaler-Qwen3-4B

Model Description

EnvScaler-Qwen3-4B is a tool-enhanced language model based on Qwen3-4B (Thinking Mode), trained using the EnvScaler framework for tool-interactive agent tasks. This model has been trained through Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL).

Training Process

This model was trained using a two-stage approach:

Stage 1: Supervised Fine-Tuning (SFT)

Training Data: 9,022 trajectories from agent-environment interactions
Data Source: EnvScaler-SFT-Traj-9K
Scenarios: 4,684 SFT scenarios from EnvScaler-SFT-Scenario
Environments: 141 synthesized environments from EnvScaler-191-Env

Stage 2: Reinforcement Learning (RL)

Training Data: 2,550 RL scenarios from EnvScaler-RL-Scenario
Environments: 50 synthesized environments from EnvScaler-191-Env
Framework: Based on the ROLL framework

The training process enables the model to learn from both demonstration trajectories (SFT) and reinforcement signals (RL), resulting in improved performance on complex tool-interactive tasks.

How to Use

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "XXHStudyHard/EnvScaler-Qwen3-4B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Use with function calling interface
# See EnvScaler project for full interaction examples

With EnvScaler Framework

For full integration with tool-interactive environments, please refer to the EnvScaler project documentation.

Related Resources

Citation

If you use this model, please cite our work:

@article{song2026envscaler,
  title={EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis},
  author={Song, Xiaoshuai and Chang, Haofei and Dong, Guanting and Zhu, Yutao and Dou, Zhicheng and Wen, Ji-Rong},
  journal={arXiv preprint arXiv:2601.05808},
  year={2026}
}