| # SWE-Next: Scalable Real-World Software Engineering Tasks for Agents |
|
|
| <p align="left"> |
| <a href="https://arxiv.org/abs/2603.20691"><img alt="Paper" src="https://img.shields.io/badge/Paper-arXiv-b31b1b?style=for-the-badge&logo=arxiv&logoColor=white"></a> |
| <a href="https://tiger-ai-lab.github.io/SWE-Next/"><img alt="Project Page" src="https://img.shields.io/badge/Project%20Page-Website-4285F4?style=for-the-badge&logo=googlechrome&logoColor=white"></a> |
| <a href="https://github.com/TIGER-AI-Lab/SWE-Next"><img alt="Code" src="https://img.shields.io/badge/Code-GitHub-181717?style=for-the-badge&logo=github&logoColor=white"></a> |
| <a href="https://huggingface.co/datasets/TIGER-Lab/SWE-Next-SFT-Trajectories"><img alt="SFT Trajs" src="https://img.shields.io/badge/SFT%20Trajs-HuggingFace-FFD21E?style=for-the-badge&logo=huggingface&logoColor=000"></a> |
| <a href="https://huggingface.co/datasets/TIGER-Lab/SWE-Next"><img alt="Dataset" src="https://img.shields.io/badge/Dataset-HuggingFace-FFD21E?style=for-the-badge&logo=huggingface&logoColor=000"></a> |
| <a href="https://huggingface.co/TIGER-Lab/SWE-Next-7B"><img alt="Model 7B" src="https://img.shields.io/badge/Model%207B-HuggingFace-FFD21E?style=for-the-badge&logo=huggingface&logoColor=000"></a> |
| <a href="https://huggingface.co/TIGER-Lab/SWE-Next-14B"><img alt="Model 14B" src="https://img.shields.io/badge/Model%2014B-HuggingFace-FFD21E?style=for-the-badge&logo=huggingface&logoColor=000"></a> |
| </p> |
|
|
| ## π° News |
|
|
| - **2026-04-07**: SWE-Next is now publicly released! |
|
|
| ## π Introduction |
|
|
| **SWE-Next** introduces reusable **repo-quarter profiles**, which reuse the same environment across nearby commits in time while keeping each task run separate and reproducible. Using only **30 hours** and **639GB** of environment storage, SWE-Next processes **3,971** seed repositories and **102,582** candidate commit pairs mined from real merged PRs to construct a dataset of **2,308** self-verifying instances. SWE-Next improves downstream pass@1 on SWE-Bench Verified and SWE-Bench Lite with fewer or comparable training trajectories, making large-scale executable data collection far more practical and accessible for research. |
|
|
|
|
|
|
| ## β¨ Highlights |
|
|
| - **Scaled Environment Generation** β SWE-Next is an execution-grounded framework that turns real merged-PR commits into self-verifying SWE tasks, and pairs them with high-signal trajectories. |
|
|
| - **Repo-quarter Profiles** - A reusable environment mechanism that amortizes build and storage cost across temporally nearby commits, substantially reducing resource requirements and accelerating large-scale executable SWE data collection. |
|
|
|
|
| ## π οΈ Setup |
|
|
| ### Prerequisites |
|
|
| - Python 3.10+ |
| - Docker (for environment execution) |
| - [uv](https://github.com/astral-sh/uv) package manager |
|
|
| ### Installation |
|
|
| ```bash |
| curl -LsSf https://astral.sh/uv/install.sh | sh |
| source $HOME/.local/bin/env |
| |
| git clone https://github.com/TIGER-AI-Lab/SWE-Next.git |
| cd SWE-Next |
| uv venv && source .venv/bin/activate |
| uv sync && uv pip install -e . |
| ``` |
|
|
| ## π€ Data & Models |
|
|
| Pre-built artifacts are available on HuggingFace. Download them into `data/` before running the pipeline: |
|
|
| | Artifact | Description | Download | |
| |----------|-------------|---------| |
| | `packages_python_filtered` | 3,900+ Python package list used as pipeline input | `huggingface-cli download TIGER-Lab/packages_python_filtered --repo-type dataset --local-dir data/packages_python_filtered` | |
| | `new_commit_better_repos` | Repos with confirmed NEW_COMMIT_BETTER commits | `huggingface-cli download TIGER-Lab/new_commit_better_repos --repo-type dataset --local-dir data/new_commit_better_repos` | |
| | `SWE-Next` | Final curated dataset (2,308 instances) | `huggingface-cli download TIGER-Lab/SWE-Next --repo-type dataset --local-dir data/SWE-Next` | |
| | `SWE-Next-SFT-Trajectories` | SFT training trajectories | `huggingface-cli download TIGER-Lab/SWE-Next-SFT-Trajectories --repo-type dataset --local-dir data/SWE-Next-SFT-Trajectories` | |
|
|
| Pre-trained models: |
|
|
| | Model | Download | |
| |-------|---------| |
| | SWE-Next-7B | `huggingface-cli download TIGER-Lab/SWE-Next-7B --repo-type model --local-dir LlamaFactory/saves/SWE_Next_7B` | |
| | SWE-Next-14B | `huggingface-cli download TIGER-Lab/SWE-Next-14B --repo-type model --local-dir LlamaFactory/saves/SWE_Next_14B` | |
|
|
| ## π³ Environment Generation |
|
|
| SWE-Next extends environment generation to 3,900+ Python packages. |
|
|
| The supported package list is maintained in [`data/packages_python_filtered/packages_python_filtered.csv`](data/packages_python_filtered/packages_python_filtered.csv) and target repositories in [`data/new_commit_better_repos/new_commit_better_repos.csv`](data/new_commit_better_repos/new_commit_better_repos.csv). |
|
|
| ## π Data Pipeline (One-Click) |
|
|
| `run_pr_pipeline.zsh` automates the full data collection pipeline. It reads `data/packages_python_filtered/packages_python_filtered.csv`, clones the repos automatically, and processes them end-to-end. If the CSV is not present it falls back to repos already cloned under `outputs/upstream_repos/`. |
|
|
| **Prerequisites:** copy `.env.template` to `.env` and fill in your credentials: |
| ``` |
| OPENAI_API_KEY=... # required for synthetic issue generation |
| GITHUB_TOKEN=... # required for fetching PRs |
| DOCKERHUB_USERNAME=... # required for pushing Docker images |
| DOCKERHUB_TOKEN=... |
| DOCKERHUB_NAMESPACE=... # your Docker Hub namespace |
| ``` |
|
|
| **Option 1 β Dataset only** (runs until `outputs/all_new_commit_better_pr.jsonl` is produced, no trajectories): |
| ```bash |
| PR_GEN_TRAJ=0 zsh run_pr_pipeline.zsh |
| ``` |
|
|
| **Option 2 β Dataset + trajectories** (continues to run GPT-5-mini on the collected instances): |
| ```bash |
| PR_GEN_TRAJ=1 PR_TRAJ_LLM_NAME=gpt-5-mini zsh run_pr_pipeline.zsh |
| ``` |
|
|
| To process a specific repo only: |
| ```bash |
| PR_GEN_TRAJ=0 zsh run_pr_pipeline.zsh owner/repo |
| ``` |
|
|
| ## ποΈ Training |
|
|
| ### Step 1 β Generate SFT Trajectories |
|
|
| Download the SWE-Next dataset first (see [Data & Models](#data--models)), then collect trajectories using a frontier LLM: |
|
|
| ```bash |
| python src/swenext/agenthub/run/edit.py runagent_multiple \ |
| --dataset "data/SWE-Next/SWE_Next_dataset.jsonl" \ |
| --traj_dir "./traj/swe_next_sft" \ |
| --max_workers 8 \ |
| --k -1 \ |
| --llm_name "gpt-5-mini" \ |
| --use_fn_calling True \ |
| --temperature 0.2 \ |
| --max_steps 40 \ |
| --backend "docker" |
| ``` |
|
|
| Or skip this step and use the pre-collected trajectories from HuggingFace (download `SWE-Next-SFT-Trajectories` above). |
|
|
| ### Step 2 β SFT Training |
|
|
| Clone [LlamaFactory](https://github.com/hiyouga/LLaMA-Factory) into the project root first: |
|
|
| ```bash |
| git clone https://github.com/hiyouga/LLaMA-Factory.git LlamaFactory |
| ``` |
|
|
| Install LlamaFactory dependencies, then train (run from the project root): |
|
|
| ```bash |
| cd LlamaFactory && pip install -e ".[torch,metrics]" && cd .. |
| |
| # Train 7B agent |
| llamafactory-cli train train/swe_next_7B.yaml |
| |
| # Train 14B agent |
| llamafactory-cli train train/swe_next_14B.yaml |
| ``` |
|
|
| Trained model checkpoints will be saved to `LlamaFactory/saves/SWE_Next_7B` and `LlamaFactory/saves/SWE_Next_14B`. |
|
|
| ### Step 3 β Evaluate on SWE-Bench Verified |
|
|
| Start a vLLM server with the trained model, then run evaluation: |
|
|
| ```bash |
| # Start vLLM server (in a separate terminal) |
| vllm serve LlamaFactory/saves/SWE_Next_7B \ |
| --served-model-name SWE-Next-7B \ |
| --port 8000 |
| |
| # Run evaluation on SWE-Bench Verified (8 parallel workers) |
| export LLM_BASE_URL="http://127.0.0.1:8000/v1" |
| |
| python src/swenext/agenthub/run/edit.py runagent_multiple \ |
| --dataset "R2E-Gym/SWE-Bench-Verified" \ |
| --split "test" \ |
| --traj_dir "./traj/swe_bench_verified" \ |
| --max_workers 8 \ |
| --k -1 \ |
| --llm_name "openai/SWE-Next-7B" \ |
| --use_fn_calling False \ |
| --temperature 1 \ |
| --max_steps 40 \ |
| --backend "docker" |
| ``` |
|
|
| > Use the official [SWE-Bench evaluation harness](https://github.com/SWE-bench/SWE-bench) for final reported scores. |
|
|
| ## π Citation |
|
|
| ```bibtex |
| @misc{liang2026swenextscalablerealworldsoftware, |
| title={SWE-Next: Scalable Real-World Software Engineering Tasks for Agents}, |
| author={Jiarong Liang and Zhiheng Lyu and Zijie Liu and Xiangchao Chen and Ping Nie and Kai Zou and Wenhu Chen}, |
| year={2026}, |
| eprint={2603.20691}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.SE}, |
| url={https://arxiv.org/abs/2603.20691}, |
| } |
| ``` |
|
|