RUC-AIBOX
/

SWE-World-SWR-32B-w-cot

Model card Files Files and versions

Create README.md

#1

by SNHE - opened Mar 6

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

Files changed (1) hide show

README.md +43 -0

README.md ADDED Viewed

	@@ -0,0 +1,43 @@

+---
+license: mit
+---
+# SWE-World Reward Model (SWR-32B-w-cot)
+This repository provides **SWE-World Reward Model (SWR-32B-w-cot)**, a reward model designed for evaluating candidate solutions in software engineering tasks.
+## Overview
+SWR is a key component of **SWE-World**, a Docker-free framework for training and evaluating software engineering agents. Instead of executing unit tests inside containerized environments, SWR simulates the final test feedback and produces a binary success signal indicating whether a candidate patch resolves the issue.
+The **SWR-32B-w-cot** variant incorporates **Chain-of-Thought (CoT)** reasoning to improve reward prediction reliability. It first generates a structured reasoning process and then produces a simulated test report along with the final reward.
+## Usage
+The model is primarily used for:
+- **Reward simulation** during reinforcement learning of SWE agents
+- **Candidate ranking** in Test-Time Scaling (TTS)
+- **Offline evaluation** of generated patches
+Given a reward context (e.g., repository state, patch, and execution traces), the model outputs:
+- A simulated test report
+- A binary reward signal indicating success or failure
+## Relation to SWE-World
+SWR works together with:
+- **SWT (Transition Model)** – simulates step-level execution feedback
+- **Sandbox environment** – handles navigation and file editing actions
+Together they enable **fully Docker-free training and evaluation of SWE agents**.
+## More Information
+For detailed methodology and the full framework, please refer to:
+- SWE-World repository:
+  https://github.com/RUCAIBox/SWE-World