Add model card and metadata

This PR adds a model card for SWE-Master-32B-RL. It includes:
- Metadata for `pipeline_tag`, `library_name`, `license`, and `base_model`.
- Links to the paper [SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training](https://huggingface.co/papers/2602.03411).
- A link to the official GitHub repository.
- Model description and citation information.

Files changed (1) hide show

README.md +47 -0

README.md ADDED Viewed

	@@ -0,0 +1,47 @@

+---
+license: mit
+library_name: transformers
+pipeline_tag: text-generation
+base_model: Qwen/Qwen2.5-Coder-32B
+tags:
+- code
+- software-engineering
+- agent
+---
+# SWE-Master-32B-RL
+**SWE-Master** is an open-source and fully reproducible post-training framework for building effective software engineering agents. This repository contains the 32B model variant optimized via Reinforcement Learning with execution feedback (RLVR) using the Group Relative Policy Optimization (GRPO) algorithm.
+- **Paper:** [SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training](https://huggingface.co/papers/2602.03411)
+- **Repository:** [https://github.com/RUCAIBox/SWE-Master](https://github.com/RUCAIBox/SWE-Master)
+- **Model Collection:** [SWE-Master Series](https://huggingface.co/collections/RUC-AIBOX/swe-agent-series)
+## Model Description
+SWE-Master systematically explores the complete agent development pipeline, including:
+1. **Trajectory Synthesis & Curation:** Integrating multiple open-source SWE datasets and generating rollouts with teacher models.
+2. **Long-Horizon Supervised Fine-Tuning (SFT):** Fine-tuning the base model on high-quality filtered trajectories.
+3. **Reinforcement Learning with Execution Feedback (RLVR):** Optimizing the policy using real execution feedback and the GRPO algorithm to enhance task-solving stability.
+4. **Test-Time Scaling (TTS):** Utilizing LLM-based environment feedback for simulated verification and ranking during inference.
+This specific checkpoint is the **32B RL** version, which achieves significant performance gains on software engineering benchmarks by leveraging long-horizon reasoning and deterministic tool use.
+## Performance
+On the **SWE-bench Verified** benchmark, SWE-Master-32B-RL achieves:
+- **61.4%** resolve rate (Pass@1).
+- **70.8%** resolve rate with Test-Time Scaling (TTS@8).
+## Citation
+If you find our work helpful, please cite our technical report:
+```bibtex
+@misc{song2026swemasterunleashingpotentialsoftware,
+      title={SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training},
+      author={Huatong Song and Lisheng Huang and Shuang Sun and Jinhao Jiang and Ran Le and Daixuan Cheng and Guoxin Chen and Yiwen Hu and Zongchao Chen and Wayne Xin Zhao and Yang Song and Tao Zhang and Ji-Rong Wen},
+      year={2026},
+      eprint={2602.03411},
+      archivePrefix={arXiv},
+      primaryClass={cs.SE},
+      url={https://arxiv.org/abs/2602.03411},
+}
+```