EnvScaler-Qwen3-4B

Model Description

EnvScaler-Qwen3-4B is a tool-enhanced language model based on Qwen3-4B (Thinking Mode), trained using the EnvScaler framework for tool-interactive agent tasks. This model has been trained through Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL).

Training Process

This model was trained using a two-stage approach:

Stage 1: Supervised Fine-Tuning (SFT)

Stage 2: Reinforcement Learning (RL)

The training process enables the model to learn from both demonstration trajectories (SFT) and reinforcement signals (RL), resulting in improved performance on complex tool-interactive tasks.

How to Use

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "XXHStudyHard/EnvScaler-Qwen3-4B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Use with function calling interface
# See EnvScaler project for full interaction examples

With EnvScaler Framework

For full integration with tool-interactive environments, please refer to the EnvScaler project documentation.

Related Resources

Citation

If you use this model, please cite our work:

@misc{song2026envscalerscalingtoolinteractiveenvironments,
      title={EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis}, 
      author={Xiaoshuai Song and Haofei Chang and Guanting Dong and Yutao Zhu and Zhicheng Dou and Ji-Rong Wen},
      year={2026},
      eprint={2601.05808},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2601.05808}, 
}

License

This model is licensed under the Apache 2.0 License, following the base Qwen3 model license.

Contact

For any questions or feedback, please contact: songxiaoshuai@ruc.edu.cn

Downloads last month
18
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for XXHStudyHard/EnvScaler-Qwen3-4B

Quantizations
1 model

Collection including XXHStudyHard/EnvScaler-Qwen3-4B

Paper for XXHStudyHard/EnvScaler-Qwen3-4B