File size: 8,312 Bytes
db21e3a
 
21071a7
db21e3a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
# SWE-Next: Scalable Real-World Software Engineering Tasks for Agents

<p align="left">
  <a href="https://arxiv.org/abs/2603.20691"><img alt="Paper" src="https://img.shields.io/badge/Paper-arXiv-b31b1b?style=for-the-badge&logo=arxiv&logoColor=white"></a>
  <a href="https://tiger-ai-lab.github.io/SWE-Next/"><img alt="Project Page" src="https://img.shields.io/badge/Project%20Page-Website-4285F4?style=for-the-badge&logo=googlechrome&logoColor=white"></a>
  <a href="https://github.com/TIGER-AI-Lab/SWE-Next"><img alt="Code" src="https://img.shields.io/badge/Code-GitHub-181717?style=for-the-badge&logo=github&logoColor=white"></a>
  <a href="https://huggingface.co/datasets/TIGER-Lab/SWE-Next-SFT-Trajectories"><img alt="SFT Trajs" src="https://img.shields.io/badge/SFT%20Trajs-HuggingFace-FFD21E?style=for-the-badge&logo=huggingface&logoColor=000"></a>
  <a href="https://huggingface.co/datasets/TIGER-Lab/SWE-Next"><img alt="Dataset" src="https://img.shields.io/badge/Dataset-HuggingFace-FFD21E?style=for-the-badge&logo=huggingface&logoColor=000"></a>
  <a href="https://huggingface.co/TIGER-Lab/SWE-Next-7B"><img alt="Model 7B" src="https://img.shields.io/badge/Model%207B-HuggingFace-FFD21E?style=for-the-badge&logo=huggingface&logoColor=000"></a>
  <a href="https://huggingface.co/TIGER-Lab/SWE-Next-14B"><img alt="Model 14B" src="https://img.shields.io/badge/Model%2014B-HuggingFace-FFD21E?style=for-the-badge&logo=huggingface&logoColor=000"></a>
</p>

## πŸ“° News

- **2026-04-07**: SWE-Next is now publicly released!

## πŸ“– Introduction

**SWE-Next** introduces reusable **repo-quarter profiles**, which reuse the same environment across nearby commits in time while keeping each task run separate and reproducible. Using only **30 hours** and **639GB** of environment storage, SWE-Next processes **3,971** seed repositories and **102,582** candidate commit pairs mined from real merged PRs to construct a dataset of **2,308** self-verifying instances. SWE-Next improves downstream pass@1 on SWE-Bench Verified and SWE-Bench Lite with fewer or comparable training trajectories, making large-scale executable data collection far more practical and accessible for research.



## ✨ Highlights

- **Scaled Environment Generation** β€” SWE-Next is an execution-grounded framework that turns real merged-PR commits into self-verifying SWE tasks, and pairs them with high-signal trajectories.

- **Repo-quarter Profiles** - A reusable environment mechanism that amortizes build and storage cost across temporally nearby commits, substantially reducing resource requirements and accelerating large-scale executable SWE data collection.


## πŸ› οΈ Setup

### Prerequisites

- Python 3.10+
- Docker (for environment execution)
- [uv](https://github.com/astral-sh/uv) package manager

### Installation

```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env

git clone https://github.com/TIGER-AI-Lab/SWE-Next.git
cd SWE-Next
uv venv && source .venv/bin/activate
uv sync && uv pip install -e .
```

## πŸ€— Data & Models

Pre-built artifacts are available on HuggingFace. Download them into `data/` before running the pipeline:

| Artifact | Description | Download |
|----------|-------------|---------|
| `packages_python_filtered` | 3,900+ Python package list used as pipeline input | `huggingface-cli download TIGER-Lab/packages_python_filtered --repo-type dataset --local-dir data/packages_python_filtered` |
| `new_commit_better_repos` | Repos with confirmed NEW_COMMIT_BETTER commits | `huggingface-cli download TIGER-Lab/new_commit_better_repos --repo-type dataset --local-dir data/new_commit_better_repos` |
| `SWE-Next` | Final curated dataset (2,308 instances) | `huggingface-cli download TIGER-Lab/SWE-Next --repo-type dataset --local-dir data/SWE-Next` |
| `SWE-Next-SFT-Trajectories` | SFT training trajectories | `huggingface-cli download TIGER-Lab/SWE-Next-SFT-Trajectories --repo-type dataset --local-dir data/SWE-Next-SFT-Trajectories` |

Pre-trained models:

| Model | Download |
|-------|---------|
| SWE-Next-7B | `huggingface-cli download TIGER-Lab/SWE-Next-7B --repo-type model --local-dir LlamaFactory/saves/SWE_Next_7B` |
| SWE-Next-14B | `huggingface-cli download TIGER-Lab/SWE-Next-14B --repo-type model --local-dir LlamaFactory/saves/SWE_Next_14B` |

## 🐳 Environment Generation

SWE-Next extends environment generation to 3,900+ Python packages.

The supported package list is maintained in [`data/packages_python_filtered/packages_python_filtered.csv`](data/packages_python_filtered/packages_python_filtered.csv) and target repositories in [`data/new_commit_better_repos/new_commit_better_repos.csv`](data/new_commit_better_repos/new_commit_better_repos.csv).

## πŸš€ Data Pipeline (One-Click)

`run_pr_pipeline.zsh` automates the full data collection pipeline. It reads `data/packages_python_filtered/packages_python_filtered.csv`, clones the repos automatically, and processes them end-to-end. If the CSV is not present it falls back to repos already cloned under `outputs/upstream_repos/`.

**Prerequisites:** copy `.env.template` to `.env` and fill in your credentials:
```
OPENAI_API_KEY=...        # required for synthetic issue generation
GITHUB_TOKEN=...          # required for fetching PRs
DOCKERHUB_USERNAME=...    # required for pushing Docker images
DOCKERHUB_TOKEN=...
DOCKERHUB_NAMESPACE=...   # your Docker Hub namespace
```

**Option 1 β€” Dataset only** (runs until `outputs/all_new_commit_better_pr.jsonl` is produced, no trajectories):
```bash
PR_GEN_TRAJ=0 zsh run_pr_pipeline.zsh
```

**Option 2 β€” Dataset + trajectories** (continues to run GPT-5-mini on the collected instances):
```bash
PR_GEN_TRAJ=1 PR_TRAJ_LLM_NAME=gpt-5-mini zsh run_pr_pipeline.zsh
```

To process a specific repo only:
```bash
PR_GEN_TRAJ=0 zsh run_pr_pipeline.zsh owner/repo
```

## πŸ‹οΈ Training

### Step 1 β€” Generate SFT Trajectories

Download the SWE-Next dataset first (see [Data & Models](#data--models)), then collect trajectories using a frontier LLM:

```bash
python src/swenext/agenthub/run/edit.py runagent_multiple \
  --dataset "data/SWE-Next/SWE_Next_dataset.jsonl" \
  --traj_dir "./traj/swe_next_sft" \
  --max_workers 8 \
  --k -1 \
  --llm_name "gpt-5-mini" \
  --use_fn_calling True \
  --temperature 0.2 \
  --max_steps 40 \
  --backend "docker"
```

Or skip this step and use the pre-collected trajectories from HuggingFace (download `SWE-Next-SFT-Trajectories` above).

### Step 2 β€” SFT Training

Clone [LlamaFactory](https://github.com/hiyouga/LLaMA-Factory) into the project root first:

```bash
git clone https://github.com/hiyouga/LLaMA-Factory.git LlamaFactory
```

Install LlamaFactory dependencies, then train (run from the project root):

```bash
cd LlamaFactory && pip install -e ".[torch,metrics]" && cd ..

# Train 7B agent
llamafactory-cli train train/swe_next_7B.yaml

# Train 14B agent
llamafactory-cli train train/swe_next_14B.yaml
```

Trained model checkpoints will be saved to `LlamaFactory/saves/SWE_Next_7B` and `LlamaFactory/saves/SWE_Next_14B`.

### Step 3 β€” Evaluate on SWE-Bench Verified

Start a vLLM server with the trained model, then run evaluation:

```bash
# Start vLLM server (in a separate terminal)
vllm serve LlamaFactory/saves/SWE_Next_7B \
  --served-model-name SWE-Next-7B \
  --port 8000

# Run evaluation on SWE-Bench Verified (8 parallel workers)
export LLM_BASE_URL="http://127.0.0.1:8000/v1"

python src/swenext/agenthub/run/edit.py runagent_multiple \
  --dataset "R2E-Gym/SWE-Bench-Verified" \
  --split "test" \
  --traj_dir "./traj/swe_bench_verified" \
  --max_workers 8 \
  --k -1 \
  --llm_name "openai/SWE-Next-7B" \
  --use_fn_calling False \
  --temperature 1 \
  --max_steps 40 \
  --backend "docker"
```

> Use the official [SWE-Bench evaluation harness](https://github.com/SWE-bench/SWE-bench) for final reported scores.

## πŸ“ Citation

```bibtex
@misc{liang2026swenextscalablerealworldsoftware,
      title={SWE-Next: Scalable Real-World Software Engineering Tasks for Agents}, 
      author={Jiarong Liang and Zhiheng Lyu and Zijie Liu and Xiangchao Chen and Ping Nie and Kai Zou and Wenhu Chen},
      year={2026},
      eprint={2603.20691},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2603.20691}, 
}
```