SWE-Next: Scalable Real-World Software Engineering Tasks for Agents
Abstract
SWE-Next is a framework that efficiently collects scalable software engineering tasks by executing commit pairs from real pull requests and reusing repository environments to reduce costs.
Executable software engineering data is valuable for training SWE agents, but scaling it remains difficult for two reasons: only a small fraction of real repository changes yield verifiable, high-signal task instances, and naively building repository-specific environments quickly becomes the dominant systems cost. We present SWE-Next, an execution-grounded framework for scalable SWE task and trajectory collection. On the data side, SWE-Next mines real merged pull requests, executes candidate base/merged commit pairs, and retains only those that produce strict test improvements without regressions, yielding self-verifying instances. It also applies strict submission gating so that collected trajectories remain evidence-driven rather than speculative. On the systems side, SWE-Next introduces reusable repo-quarter profiles, which reuse the same environment across nearby commits in time while keeping each task run separate and reproducible. Using only 30 hours and 639GB of environment storage, SWE-Next processes 3,971 seed repositories and 102,582 candidate commit pairs mined from real merged PRs to construct a dataset of 2,308 self-verifying instances. Experiments show that SWE-Next improves downstream pass@1 with fewer or comparable training trajectories, indicating that its gains come not from a stronger trajectory generator, but from higher-signal execution-grounded supervision and more efficient data collection.
Community
๐ Introducing SWE-Next: a scalable, execution-grounded framework for building SWE training data from real merged PRs. SWE-Next processes 3,971 repositories and 102K commit pairs to construct 2,308 verified instances, and collecting the full dataset takes just 30 hours and 639 GB.
๐งฉ Key idea: repo-quarter profiles โ reuse a single environment across temporally nearby commits, cutting storage from over 30 TB to just 639 GB.
๐ SFT results: with only 3K+ high-quality trajectories, our models reach 17.4% on SWE-Bench Verified at 7B and 30.0% at 14B.
Building executable SWE environments is expensive:
โ One Docker image per commit = storage explodes at scale
โ Most real PRs don't yield verifiable training signal (~74.5% don't improve tests)
โ Leaky prompts + weak submission gating โ low-quality trajectories
๐งฉ Repo-quarter profiles: Instead of building a new environment per commit, we map each commit to a (repo, quarter) profile โ a shared, reusable Docker image for that repo's dependency regime in that time window.
The image caches system packages + a venv but never bakes in source code.
At runtime: mount the commit snapshot โ copy-on-start โ run tests in isolation.
One image. Many commits. No rebuilding.
Everything is open
๐๏ธ Paper: arxiv.org/abs/2603.20691
๐ค Dataset: huggingface.co/datasets/TIGER-Lab/SWE-Next
๐ SFT Trajectories: huggingface.co/datasets/TIGER-Lab/SWE-Next-SFT-Trajectories
๐ค SWE-Next-7B: huggingface.co/TIGER-Lab/SWE-Next-7B
๐ค SWE-Next-14B: huggingface.co/TIGER-Lab/SWE-Next-14B
๐ป Code: github.com/TIGER-AI-Lab/SWE-Next
Get this paper in your agent:
hf papers read 2603.20691 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 2
Datasets citing this paper 4
Spaces citing this paper 0
No Space linking this paper