nielsr HF Staff commited on
Commit
f06821f
·
verified ·
1 Parent(s): 258b363

Add model card and metadata

Browse files

This PR adds a model card for SWE-Master-32B-RL. It includes:
- Metadata for `pipeline_tag`, `library_name`, `license`, and `base_model`.
- Links to the paper [SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training](https://huggingface.co/papers/2602.03411).
- A link to the official GitHub repository.
- Model description and citation information.

Files changed (1) hide show
  1. README.md +47 -0
README.md ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: transformers
4
+ pipeline_tag: text-generation
5
+ base_model: Qwen/Qwen2.5-Coder-32B
6
+ tags:
7
+ - code
8
+ - software-engineering
9
+ - agent
10
+ ---
11
+
12
+ # SWE-Master-32B-RL
13
+
14
+ **SWE-Master** is an open-source and fully reproducible post-training framework for building effective software engineering agents. This repository contains the 32B model variant optimized via Reinforcement Learning with execution feedback (RLVR) using the Group Relative Policy Optimization (GRPO) algorithm.
15
+
16
+ - **Paper:** [SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training](https://huggingface.co/papers/2602.03411)
17
+ - **Repository:** [https://github.com/RUCAIBox/SWE-Master](https://github.com/RUCAIBox/SWE-Master)
18
+ - **Model Collection:** [SWE-Master Series](https://huggingface.co/collections/RUC-AIBOX/swe-agent-series)
19
+
20
+ ## Model Description
21
+ SWE-Master systematically explores the complete agent development pipeline, including:
22
+ 1. **Trajectory Synthesis & Curation:** Integrating multiple open-source SWE datasets and generating rollouts with teacher models.
23
+ 2. **Long-Horizon Supervised Fine-Tuning (SFT):** Fine-tuning the base model on high-quality filtered trajectories.
24
+ 3. **Reinforcement Learning with Execution Feedback (RLVR):** Optimizing the policy using real execution feedback and the GRPO algorithm to enhance task-solving stability.
25
+ 4. **Test-Time Scaling (TTS):** Utilizing LLM-based environment feedback for simulated verification and ranking during inference.
26
+
27
+ This specific checkpoint is the **32B RL** version, which achieves significant performance gains on software engineering benchmarks by leveraging long-horizon reasoning and deterministic tool use.
28
+
29
+ ## Performance
30
+ On the **SWE-bench Verified** benchmark, SWE-Master-32B-RL achieves:
31
+ - **61.4%** resolve rate (Pass@1).
32
+ - **70.8%** resolve rate with Test-Time Scaling (TTS@8).
33
+
34
+ ## Citation
35
+ If you find our work helpful, please cite our technical report:
36
+
37
+ ```bibtex
38
+ @misc{song2026swemasterunleashingpotentialsoftware,
39
+ title={SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training},
40
+ author={Huatong Song and Lisheng Huang and Shuang Sun and Jinhao Jiang and Ran Le and Daixuan Cheng and Guoxin Chen and Yiwen Hu and Zongchao Chen and Wayne Xin Zhao and Yang Song and Tao Zhang and Ji-Rong Wen},
41
+ year={2026},
42
+ eprint={2602.03411},
43
+ archivePrefix={arXiv},
44
+ primaryClass={cs.SE},
45
+ url={https://arxiv.org/abs/2602.03411},
46
+ }
47
+ ```