Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,43 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
# SWE-World Reward Model (SWR-32B-w-cot)
|
| 6 |
+
|
| 7 |
+
This repository provides **SWE-World Reward Model (SWR-32B-w-cot)**, a reward model designed for evaluating candidate solutions in software engineering tasks.
|
| 8 |
+
|
| 9 |
+
## Overview
|
| 10 |
+
|
| 11 |
+
SWR is a key component of **SWE-World**, a Docker-free framework for training and evaluating software engineering agents. Instead of executing unit tests inside containerized environments, SWR simulates the final test feedback and produces a binary success signal indicating whether a candidate patch resolves the issue.
|
| 12 |
+
|
| 13 |
+
The **SWR-32B-w-cot** variant incorporates **Chain-of-Thought (CoT)** reasoning to improve reward prediction reliability. It first generates a structured reasoning process and then produces a simulated test report along with the final reward.
|
| 14 |
+
|
| 15 |
+
## Usage
|
| 16 |
+
|
| 17 |
+
The model is primarily used for:
|
| 18 |
+
|
| 19 |
+
- **Reward simulation** during reinforcement learning of SWE agents
|
| 20 |
+
- **Candidate ranking** in Test-Time Scaling (TTS)
|
| 21 |
+
- **Offline evaluation** of generated patches
|
| 22 |
+
|
| 23 |
+
Given a reward context (e.g., repository state, patch, and execution traces), the model outputs:
|
| 24 |
+
|
| 25 |
+
- A simulated test report
|
| 26 |
+
- A binary reward signal indicating success or failure
|
| 27 |
+
|
| 28 |
+
## Relation to SWE-World
|
| 29 |
+
|
| 30 |
+
SWR works together with:
|
| 31 |
+
|
| 32 |
+
- **SWT (Transition Model)** – simulates step-level execution feedback
|
| 33 |
+
- **Sandbox environment** – handles navigation and file editing actions
|
| 34 |
+
|
| 35 |
+
Together they enable **fully Docker-free training and evaluation of SWE agents**.
|
| 36 |
+
|
| 37 |
+
## More Information
|
| 38 |
+
|
| 39 |
+
For detailed methodology and the full framework, please refer to:
|
| 40 |
+
|
| 41 |
+
- SWE-World repository:
|
| 42 |
+
https://github.com/RUCAIBox/SWE-World
|
| 43 |
+
|