Spaces:

R2E-Gym
/

README

Running

App Files Files Community

README / README.md

jsingh

Update README.md

c39073f verified 11 months ago

preview code

raw

history blame contribute delete

4.6 kB

	---
	title: README
	emoji: 💻
	colorFrom: indigo
	colorTo: purple
	sdk: static
	pinned: true
	thumbnail: >-
	https://cdn-uploads.huggingface.co/production/uploads/60cc389a0844fb1605fef405/CRHpoi7_GxVx7DhVCVK5e.png
	---

	<h1 align="center"> R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents </h1>

	<p align="center">
	<a href="https://naman-ntc.github.io/" style="text-decoration: none;">Naman Jain<sup>*,1</sup></a>,
	<a href="https://1jsingh.github.io/" style="text-decoration: none;">Jaskirat Singh<sup>*,2</sup></a>,
	<a href="https://manishs.org/" style="text-decoration: none;">Manish Shetty<sup>1</sup></a>,
	<a href="https://scholar.google.com/citations?user=vNHqr3oAAAAJ&hl=en" style="text-decoration: none;">Liang Zheng<sup>2</sup></a>,
	<a href="https://scholar.google.com/citations?user=Vn3L_ioAAAAJ&hl=en" style="text-decoration: none;">Koushik Sen<sup>1</sup></a>,
	<a href="https://scholar.google.com/citations?user=vN-is70AAAAJ&hl=en" style="text-decoration: none;">Ion Stoica<sup>1</sup></a>
	</p>

	<p align="center">
	<sup>1</sup>UC Berkeley, <sup>2</sup>ANU </br>
	<sub><sup>*</sup>Equal contribution, <sup>^</sup>Equal supervision</sub>
	</p>

	<!-- paper . data and models . project page -->
	<p align="center">
	<a href="https://github.com/R2E-Gym/R2E-Gym">💻 Code </a>
	•
	<a href="./docs/paper.pdf">📃 Paper</a>
	•
	<a href="https://huggingface.co/R2E-Gym" >🤗 Data & Models</a>
	•
	<!-- project page -->
	<a href="https://r2e-gym.github.io/" >🌐 Project Page</a>
	</p>

	---

	We present R2E-Gym, the largest procedurally curated environment for training real-world SWE-Agents.
	We show that R2E-Gym enables more scalable train and test-time scaling, achieving 51% on the SWE-Bench Verified benchmark, reflecting a new state-of-the-art for open-weight SWE-Agents and for first time being competitive with proprietary models such as o1 and sonnet-3.5-v2 with tools.


	<p align="center">
	<img src="https://github.com/R2E-Gym/R2E-Gym/raw/main/assets/docs-teaser-v1.png" width="100%" alt="teaser">
	</p>
	<p align="left">
	<!-- <em> -->
	<!-- <small> -->
	<b>R2E-Gym</b> is powered by two main contributions: (a) <b>SWE-GEN: a synthetic data curation recipe</b> for curating executable training environments w/o relying on human tests and issues. (b) <b>Hybrid Inference Time Scaling</b>: showing that while both execution-based and execution-free verifiers elicit inference-time gains; significantly better performance can be achieved by leveraging the strengths of both. (c) Overall, the final approach reflects <b>SOTA performance for open-weight SWE-Agents</b>, while also being competitive with some proprietary model baselines.
	<!-- </small> -->
	<!-- </em> -->
	</p>

	---

	<!-- ## Synthetic Data Enables Scalable Training

	We propose SWE-GEN — a novel synthetic data curation recipe that enables collection of a large number of executable training environments without reliance on human-written pull requests (PRs) or unit tests. We show that instead of using human-written PRs, good-quality execution environments can directly be curated from commits.
	Compared to PR-based data collection (SWE-Gym), this approach enables more scalable data curation and agent-training, resulting in a SOTA pass@1 performance of 34.4% on the challenging SWE-Bench Verified benchmark.

	<img src="https://github.com/R2E-Gym/R2E-Gym/raw/main/docs/docs-training-v1.png" alt="Synthetic Data Enables Scalable Training" width="80%">

	## Hybrid Test-time Scaling

	We also propose Hybrid Test-time Scaling, a novel approach for scaling SWE-Agents at test-time. We show that while both execution-based and execution-free verifiers elicit inference-time gains; significantly better performance can be achieved by leveraging the strengths of both.

	<img src="https://github.com/R2E-Gym/R2E-Gym/raw/main/docs/bestk_plot_agent_nopass.png" alt="Hybrid Test-time Scaling" width="80%">
	-->

	## Usage and Training

	Please refer our [Github Repo](https://github.com/R2E-Gym/R2E-Gym) for detailed notes on Gym Environment Usage, Training, Inference and Executable SWE Environment Generation.

	## 📚 Citation

	```bibtex
	@misc{jain2025r2e-gym,
	title={R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents},
	author={Jain Naman and Singh Jaskirat and Shetty Manish and Zheng Liang and Sen Koushik and Stoica Ion},
	year={2025},
	eprint={xxx.xxxx},
	archivePrefix={arXiv},
	primaryClass={cs.SE},
	url={https://arxiv.org/abs/xxx.xxxx},
	}
	```