--- title: README emoji: 💻 colorFrom: indigo colorTo: purple sdk: static pinned: true thumbnail: >- https://cdn-uploads.huggingface.co/production/uploads/60cc389a0844fb1605fef405/CRHpoi7_GxVx7DhVCVK5e.png ---
Naman Jain*,1, Jaskirat Singh*,2, Manish Shetty1, Liang Zheng2, Koushik Sen1, Ion Stoica1
1UC Berkeley, 2ANU *Equal contribution, ^Equal supervision
💻 Code • 📃 Paper • 🤗 Data & Models • 🌐 Project Page
--- We present **R2E-Gym**, the largest procedurally curated environment for training real-world SWE-Agents. We show that R2E-Gym enables more scalable train and test-time scaling, achieving **51% on the SWE-Bench Verified benchmark**, reflecting a new state-of-the-art for open-weight SWE-Agents and for first time being competitive with proprietary models such as o1 and sonnet-3.5-v2 with tools.
R2E-Gym is powered by two main contributions: (a) SWE-GEN: a synthetic data curation recipe for curating executable training environments w/o relying on human tests and issues. (b) Hybrid Inference Time Scaling: showing that while both execution-based and execution-free verifiers elicit inference-time gains; significantly better performance can be achieved by leveraging the strengths of both. (c) Overall, the final approach reflects SOTA performance for open-weight SWE-Agents, while also being competitive with some proprietary model baselines.
--- ## Usage and Training Please refer our [Github Repo](https://github.com/R2E-Gym/R2E-Gym) for detailed notes on Gym Environment Usage, Training, Inference and Executable SWE Environment Generation. ## 📚 Citation ```bibtex @misc{jain2025r2e-gym, title={R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents}, author={Jain Naman and Singh Jaskirat and Shetty Manish and Zheng Liang and Sen Koushik and Stoica Ion}, year={2025}, eprint={xxx.xxxx}, archivePrefix={arXiv}, primaryClass={cs.SE}, url={https://arxiv.org/abs/xxx.xxxx}, } ```