File size: 4,290 Bytes
81516f4
 
3b90791
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
81516f4
 
3b90791
81516f4
3b90791
81516f4
3b90791
81516f4
3b90791
 
81516f4
3b90791
81516f4
3b90791
 
 
 
81516f4
3b90791
81516f4
3b90791
81516f4
 
3b90791
 
 
 
 
 
 
 
81516f4
 
3b90791
81516f4
3b90791
81516f4
3b90791
 
 
 
 
 
 
 
81516f4
3b90791
81516f4
3b90791
81516f4
3b90791
81516f4
3b90791
81516f4
3b90791
81516f4
3b90791
81516f4
3b90791
81516f4
3b90791
 
 
 
81516f4
3b90791
81516f4
 
3b90791
 
 
 
 
 
 
 
81516f4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
---
library_name: transformers
license: apache-2.0
language:
- en
base_model: Qwen/Qwen3-1.7B
pipeline_tag: text-generation
tags:
- code-search
- code-localization
- reinforcement-learning
- agent
- software-engineering
- GSPO
- OpenHands
- SWE-Bench
datasets:
- OpenHands/SWE-smith-py-code-search
- OpenHands/SWE-Gym-code-search
- OpenHands/CodeScout_Training_Rollouts
---

# CodeScout-1.7B-RFT

[📄 Paper](https://arxiv.org/abs/2507.02875) • [💻 Code](https://github.com/OpenHands/codescout) • [🤗 Collection](https://huggingface.co/collections/OpenHands/codescout-69b9a6adcf21f348f4db937f)

**Pre-RL checkpoint — rejection fine-tuned on expert trajectories from CodeScout-14B.**

CodeScout-1.7B-RFT is part of the **CodeScout** family of open-source RL-trained code search agents.
CodeScout models achieve state-of-the-art repository-level code localization using *nothing more than a standard Unix terminal* — no static analysis, no repository graphs, no language-specific tooling.

## Key Highlights

- Warm-start checkpoint for [CodeScout-1.7B](https://huggingface.co/OpenHands/CodeScout-1.7B) RL training
- Distilled from CodeScout-14B expert trajectories with rejection sampling
- Useful for researchers studying the effect of RFT vs. RL in agent training pipelines
- Can be used as a base for custom RL experiments on code search

## Results

Performance on SWE-Bench code localization (instance-averaged F1 scores):


| Benchmark | CodeScout-1.7B | CodeScout-4B | CodeScout-14B |
|---|---|---|---|
| **SWE-Bench Verified** — File F1 | 55.46 | 68.52 | **68.57** |
| **SWE-Bench Verified** — Func F1 | 28.22 | 36.78 | **40.32** |
| **SWE-Bench Pro** — File F1 | 40.96 | 51.77 | **53.63** |
| **SWE-Bench Pro** — Func F1 | 18.24 | **29.03** | 28.74 |
| **SWE-Bench Lite** — File F1 | 56.57 | 67.03 | **71.84** |
| **SWE-Bench Lite** — Func F1 | 27.07 | 39.87 | **44.43** |


## Training

CodeScout-1.7B-RFT is the intermediate checkpoint produced by rejection fine-tuning (RFT) `Qwen3-1.7B` on expert trajectories from CodeScout-14B, before the final RL stage.

- **Teacher model:** [CodeScout-14B](https://huggingface.co/OpenHands/CodeScout-14B)
- **Source trajectories:** Rollouts from CodeScout-14B on 7,700 training instances
- **Filtered data:** 4K trajectories with perfect scores (F1 = 1.0 at file, module, and function level)
- **SFT epochs:** 1
- **Learning rate:** 5e-5 with cosine scheduler (warmup ratio 0.1)
- **Batch size:** 8
- **Optimizer:** AdamW
- **Framework:** [veRL](https://github.com/volcengine/verl)

This checkpoint serves as the starting point for RL training of [CodeScout-1.7B](https://huggingface.co/OpenHands/CodeScout-1.7B).

## How It Works

CodeScout uses the **OpenHands-Bash** scaffold — an agent equipped with only a `Terminal` tool (supporting standard Unix commands like `rg`, `find`, `grep`, `ls`) and a `LocalizationFinish` tool for structured output submission. The agent iteratively navigates the repository to identify relevant files, classes, and functions related to a given issue.

The model is trained with **GSPO** (Group Sequence Policy Optimization) using multi-level F1 rewards at the file, module, and function level.

## Intended Use

CodeScout-1.7B-RFT is designed for **repository-level code localization**: given a GitHub issue description and a code repository, it identifies the relevant files, classes, and functions that need to be modified. It is intended to be used as a localization subagent within larger coding agent pipelines.

## Limitations

- Trained and evaluated exclusively on **Python** repositories
- Designed for code *localization*, not code *editing* or issue resolution
- Performance may vary on repositories significantly different from the training distribution
- Requires the OpenHands-Bash scaffold for optimal performance

## Citation


```bibtex
@article{sutawika2025codescout,
  title={CodeScout: An Effective Recipe for Reinforcement Learning of Code Search Agents},
  author={Sutawika, Lintang and Soni, Aditya Bharat and R R, Bharath Sriraam and Gandhi, Apurva and Yassine, Taha and Vijayvargiya, Sanidhya and Li, Yuchen and Zhou, Xuhui and Zhang, Yilin and Maben, Leander Melroy and Neubig, Graham},
  journal={arXiv preprint arXiv:2507.02875},
  year={2025}
}
```