Add model card for SWE-Master-32B-RL
Browse filesHi! I'm Niels from the Hugging Face community science team. I'm opening this PR to add a model card for this repository.
This model was presented in the technical report [SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training](https://huggingface.co/papers/2602.03411). Adding this model card helps users understand the model's purpose, license, and how to use it, while also ensuring it is correctly indexed on the Hugging Face Hub.
I've added the appropriate pipeline tag, library name, and base model information based on the repository's configuration files and the accompanying paper.
Please review and merge if it looks good!
README.md
ADDED
|
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
library_name: transformers
|
| 4 |
+
pipeline_tag: text-generation
|
| 5 |
+
base_model: Qwen/Qwen2.5-Coder-32B
|
| 6 |
+
---
|
| 7 |
+
|
| 8 |
+
# SWE-Master-32B-RL
|
| 9 |
+
|
| 10 |
+
This repository contains the 32B Reinforcement Learning (RL) checkpoint for **SWE-Master**, as described in the technical report [SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training](https://huggingface.co/papers/2602.03411).
|
| 11 |
+
|
| 12 |
+
- **Paper:** [SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training](https://huggingface.co/papers/2602.03411)
|
| 13 |
+
- **Repository:** [GitHub - RUCAIBox/SWE-Master](https://github.com/RUCAIBox/SWE-Master)
|
| 14 |
+
|
| 15 |
+
## Introduction
|
| 16 |
+
|
| 17 |
+
SWE-Master is an open-source and fully reproducible post-training framework for building effective software engineering (SWE) agents. It explores the complete agent development pipeline, including:
|
| 18 |
+
- Teacher-trajectory synthesis and data curation.
|
| 19 |
+
- Long-horizon Supervised Fine-Tuning (SFT).
|
| 20 |
+
- Reinforcement Learning with real execution feedback (RLVR) using GRPO.
|
| 21 |
+
- Inference framework design with LSP-integrated tools.
|
| 22 |
+
|
| 23 |
+
Starting from the Qwen2.5-Coder-32B base model, SWE-Master demonstrates how systematic optimization can elicit strong long-horizon SWE task-solving abilities, achieving a resolve rate of 61.4% on SWE-bench Verified.
|
| 24 |
+
|
| 25 |
+
## Usage
|
| 26 |
+
|
| 27 |
+
For detailed instructions on installation, data preparation, and running inference (including integration with LSP tools and the R2E-Gym framework), please refer to the official [GitHub repository](https://github.com/RUCAIBox/SWE-Master).
|
| 28 |
+
|
| 29 |
+
## Citation
|
| 30 |
+
|
| 31 |
+
```bibtex
|
| 32 |
+
@misc{song2026swemasterunleashingpotentialsoftware,
|
| 33 |
+
title={SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training},
|
| 34 |
+
author={Huatong Song and Lisheng Huang and Shuang Sun and Jinhao Jiang and Ran Le and Daixuan Cheng and Guoxin Chen and Yiwen Hu and Zongchao Chen and Wayne Xin Zhao and Yang Song and Tao Zhang and Ji-Rong Wen},
|
| 35 |
+
year={2026},
|
| 36 |
+
eprint={2602.03411},
|
| 37 |
+
archivePrefix={arXiv},
|
| 38 |
+
primaryClass={cs.SE},
|
| 39 |
+
url={https://arxiv.org/abs/2602.03411},
|
| 40 |
+
}
|
| 41 |
+
```
|