Files changed (1) hide show
  1. README.md +43 -0
README.md ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ # SWE-World Reward Model (SWR-32B-w-cot)
6
+
7
+ This repository provides **SWE-World Reward Model (SWR-32B-w-cot)**, a reward model designed for evaluating candidate solutions in software engineering tasks.
8
+
9
+ ## Overview
10
+
11
+ SWR is a key component of **SWE-World**, a Docker-free framework for training and evaluating software engineering agents. Instead of executing unit tests inside containerized environments, SWR simulates the final test feedback and produces a binary success signal indicating whether a candidate patch resolves the issue.
12
+
13
+ The **SWR-32B-w-cot** variant incorporates **Chain-of-Thought (CoT)** reasoning to improve reward prediction reliability. It first generates a structured reasoning process and then produces a simulated test report along with the final reward.
14
+
15
+ ## Usage
16
+
17
+ The model is primarily used for:
18
+
19
+ - **Reward simulation** during reinforcement learning of SWE agents
20
+ - **Candidate ranking** in Test-Time Scaling (TTS)
21
+ - **Offline evaluation** of generated patches
22
+
23
+ Given a reward context (e.g., repository state, patch, and execution traces), the model outputs:
24
+
25
+ - A simulated test report
26
+ - A binary reward signal indicating success or failure
27
+
28
+ ## Relation to SWE-World
29
+
30
+ SWR works together with:
31
+
32
+ - **SWT (Transition Model)** – simulates step-level execution feedback
33
+ - **Sandbox environment** – handles navigation and file editing actions
34
+
35
+ Together they enable **fully Docker-free training and evaluation of SWE agents**.
36
+
37
+ ## More Information
38
+
39
+ For detailed methodology and the full framework, please refer to:
40
+
41
+ - SWE-World repository:
42
+ https://github.com/RUCAIBox/SWE-World
43
+