README / README.md
jsingh's picture
Update README.md
c39073f verified
---
title: README
emoji: πŸ’»
colorFrom: indigo
colorTo: purple
sdk: static
pinned: true
thumbnail: >-
https://cdn-uploads.huggingface.co/production/uploads/60cc389a0844fb1605fef405/CRHpoi7_GxVx7DhVCVK5e.png
---
<h1 align="center"> R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents </h1>
<p align="center">
<a href="https://naman-ntc.github.io/" style="text-decoration: none;">Naman Jain<sup>*,1</sup></a>,
<a href="https://1jsingh.github.io/" style="text-decoration: none;">Jaskirat Singh<sup>*,2</sup></a>,
<a href="https://manishs.org/" style="text-decoration: none;">Manish Shetty<sup>1</sup></a>,
<a href="https://scholar.google.com/citations?user=vNHqr3oAAAAJ&hl=en" style="text-decoration: none;">Liang Zheng<sup>2</sup></a>,
<a href="https://scholar.google.com/citations?user=Vn3L_ioAAAAJ&hl=en" style="text-decoration: none;">Koushik Sen<sup>1</sup></a>,
<a href="https://scholar.google.com/citations?user=vN-is70AAAAJ&hl=en" style="text-decoration: none;">Ion Stoica<sup>1</sup></a>
</p>
<p align="center">
<sup>1</sup>UC Berkeley, <sup>2</sup>ANU </br>
<sub><sup>*</sup>Equal contribution, <sup>^</sup>Equal supervision</sub>
</p>
<!-- paper . data and models . project page -->
<p align="center">
<a href="https://github.com/R2E-Gym/R2E-Gym">πŸ’» Code </a>
β€’
<a href="./docs/paper.pdf">πŸ“ƒ Paper</a>
β€’
<a href="https://huggingface.co/R2E-Gym" >πŸ€— Data & Models</a>
β€’
<!-- project page -->
<a href="https://r2e-gym.github.io/" >🌐 Project Page</a>
</p>
---
We present **R2E-Gym**, the largest procedurally curated environment for training real-world SWE-Agents.
We show that R2E-Gym enables more scalable train and test-time scaling, achieving **51% on the SWE-Bench Verified benchmark**, reflecting a new state-of-the-art for open-weight SWE-Agents and for first time being competitive with proprietary models such as o1 and sonnet-3.5-v2 with tools.
<p align="center">
<img src="https://github.com/R2E-Gym/R2E-Gym/raw/main/assets/docs-teaser-v1.png" width="100%" alt="teaser">
</p>
<p align="left">
<!-- <em> -->
<!-- <small> -->
<b>R2E-Gym</b> is powered by two main contributions: (a) <b>SWE-GEN: a synthetic data curation recipe</b> for curating executable training environments w/o relying on human tests and issues. (b) <b>Hybrid Inference Time Scaling</b>: showing that while both execution-based and execution-free verifiers elicit inference-time gains; significantly better performance can be achieved by leveraging the strengths of both. (c) Overall, the final approach reflects <b>SOTA performance for open-weight SWE-Agents</b>, while also being competitive with some proprietary model baselines.
<!-- </small> -->
<!-- </em> -->
</p>
---
<!-- ## Synthetic Data Enables Scalable Training
We propose SWE-GEN β€” a novel synthetic data curation recipe that enables collection of a large number of executable training environments without reliance on human-written pull requests (PRs) or unit tests. We show that instead of using human-written PRs, good-quality execution environments can directly be curated from *commits*.
Compared to PR-based data collection (SWE-Gym), this approach enables more scalable data curation and agent-training, resulting in a SOTA pass@1 performance of 34.4% on the challenging SWE-Bench Verified benchmark.
<img src="https://github.com/R2E-Gym/R2E-Gym/raw/main/docs/docs-training-v1.png" alt="Synthetic Data Enables Scalable Training" width="80%">
## Hybrid Test-time Scaling
We also propose Hybrid Test-time Scaling, a novel approach for scaling SWE-Agents at test-time. We show that while both execution-based and execution-free verifiers elicit inference-time gains; significantly better performance can be achieved by leveraging the strengths of both.
<img src="https://github.com/R2E-Gym/R2E-Gym/raw/main/docs/bestk_plot_agent_nopass.png" alt="Hybrid Test-time Scaling" width="80%">
-->
## Usage and Training
Please refer our [Github Repo](https://github.com/R2E-Gym/R2E-Gym) for detailed notes on Gym Environment Usage, Training, Inference and Executable SWE Environment Generation.
## πŸ“š Citation
```bibtex
@misc{jain2025r2e-gym,
title={R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents},
author={Jain Naman and Singh Jaskirat and Shetty Manish and Zheng Liang and Sen Koushik and Stoica Ion},
year={2025},
eprint={xxx.xxxx},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/xxx.xxxx},
}
```