RUC-AIBOX
/

SWE-Master-4B-RL

+---
+license: mit
+library_name: transformers
+pipeline_tag: text-generation
+---
+# SWE-Master-4B-RL
+SWE-Master is an open-source and fully reproducible post-training framework for building effective software engineering (SWE) agents. This checkpoint corresponds to the **4B version** of the model, which has been optimized using **Reinforcement Learning with real execution feedback (RLVR)**.
+- **Paper:** [SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training](https://huggingface.co/papers/2602.03411)
+- **Repository:** [RUCAIBox/SWE-Master](https://github.com/RUCAIBox/SWE-Master)
+## Model Description
+SWE-Master systematically explores the complete agent development pipeline to elicit strong long-horizon SWE task-solving abilities from open-source models. The framework includes:
+- **Teacher-trajectory synthesis and data curation:** Integrating multiple SWE datasets and filtering trajectories.
+- **Long-horizon Supervised Fine-Tuning (SFT):** Training on curated agentic trajectories.
+- **Reinforcement Learning (RLVR):** Utilizing execution feedback and Group Relative Policy Optimization (GRPO) to enhance performance.
+- **Inference framework design:** Supporting advanced tool-use (LSP-integrated tools) and Test-Time Scaling (TTS).
+This model serves as a practical foundation for advancing reproducible research on software engineering agents.
+## Citation
+If you find this work helpful, please cite the following paper:
+```bibtex
+@misc{song2026swemasterunleashingpotentialsoftware,
+      title={SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training},
+      author={Huatong Song and Lisheng Huang and Shuang Sun and Jinhao Jiang and Ran Le and Daixuan Cheng and Guoxin Chen and Yiwen Hu and Zongchao Chen and Wayne Xin Zhao and Yang Song and Tao Zhang and Ji-Rong Wen},
+      year={2026},
+      eprint={2602.03411},
+      archivePrefix={arXiv},
+      primaryClass={cs.SE},
+      url={https://arxiv.org/abs/2602.03411},
+}
+```