nielsr HF Staff commited on
Commit
83dd42f
·
verified ·
1 Parent(s): fcd73fd

Add model card for SWE-Master-32B-RL

Browse files

Hi! I'm Niels from the Hugging Face community science team. I'm opening this PR to add a model card for this repository.

This model was presented in the technical report [SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training](https://huggingface.co/papers/2602.03411). Adding this model card helps users understand the model's purpose, license, and how to use it, while also ensuring it is correctly indexed on the Hugging Face Hub.

I've added the appropriate pipeline tag, library name, and base model information based on the repository's configuration files and the accompanying paper.

Please review and merge if it looks good!

Files changed (1) hide show
  1. README.md +41 -0
README.md ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: transformers
4
+ pipeline_tag: text-generation
5
+ base_model: Qwen/Qwen2.5-Coder-32B
6
+ ---
7
+
8
+ # SWE-Master-32B-RL
9
+
10
+ This repository contains the 32B Reinforcement Learning (RL) checkpoint for **SWE-Master**, as described in the technical report [SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training](https://huggingface.co/papers/2602.03411).
11
+
12
+ - **Paper:** [SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training](https://huggingface.co/papers/2602.03411)
13
+ - **Repository:** [GitHub - RUCAIBox/SWE-Master](https://github.com/RUCAIBox/SWE-Master)
14
+
15
+ ## Introduction
16
+
17
+ SWE-Master is an open-source and fully reproducible post-training framework for building effective software engineering (SWE) agents. It explores the complete agent development pipeline, including:
18
+ - Teacher-trajectory synthesis and data curation.
19
+ - Long-horizon Supervised Fine-Tuning (SFT).
20
+ - Reinforcement Learning with real execution feedback (RLVR) using GRPO.
21
+ - Inference framework design with LSP-integrated tools.
22
+
23
+ Starting from the Qwen2.5-Coder-32B base model, SWE-Master demonstrates how systematic optimization can elicit strong long-horizon SWE task-solving abilities, achieving a resolve rate of 61.4% on SWE-bench Verified.
24
+
25
+ ## Usage
26
+
27
+ For detailed instructions on installation, data preparation, and running inference (including integration with LSP tools and the R2E-Gym framework), please refer to the official [GitHub repository](https://github.com/RUCAIBox/SWE-Master).
28
+
29
+ ## Citation
30
+
31
+ ```bibtex
32
+ @misc{song2026swemasterunleashingpotentialsoftware,
33
+ title={SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training},
34
+ author={Huatong Song and Lisheng Huang and Shuang Sun and Jinhao Jiang and Ran Le and Daixuan Cheng and Guoxin Chen and Yiwen Hu and Zongchao Chen and Wayne Xin Zhao and Yang Song and Tao Zhang and Ji-Rong Wen},
35
+ year={2026},
36
+ eprint={2602.03411},
37
+ archivePrefix={arXiv},
38
+ primaryClass={cs.SE},
39
+ url={https://arxiv.org/abs/2602.03411},
40
+ }
41
+ ```