File size: 4,938 Bytes
262fd50 2de530d 262fd50 2de530d 262fd50 0c672a0 262fd50 2de530d 262fd50 469de86 2de530d 262fd50 2de530d 262fd50 2de530d 262fd50 2de530d 262fd50 2de530d 262fd50 2de530d 262fd50 469de86 2de530d 262fd50 2de530d 262fd50 2de530d 262fd50 2de530d 262fd50 2de530d 262fd50 2de530d 262fd50 2de530d 262fd50 2de530d 262fd50 2de530d 262fd50 2de530d 262fd50 2de530d 262fd50 2de530d 262fd50 2de530d 262fd50 2de530d 0c672a0 2de530d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 | ---
library_name: transformers
license: apache-2.0
language:
- en
base_model: Qwen/Qwen3-1.7B
pipeline_tag: text-generation
tags:
- code-search
- code-localization
- reinforcement-learning
- agent
- software-engineering
- GSPO
- OpenHands
- SWE-Bench
datasets:
- OpenHands/SWE-smith-py-code-search
- OpenHands/SWE-Gym-code-search
- OpenHands/CodeScout_Training_Rollouts
---
# CodeScout-1.7B
[π Paper](https://arxiv.org/abs/2603.17829) β’ [π» Code](https://github.com/OpenHands/codescout) β’ [π€ Collection](https://huggingface.co/collections/OpenHands/codescout-69b9a6adcf21f348f4db937f)
**Compact yet powerful β outperforms 8Γ larger Qwen3-14B using only a Unix terminal.**
<p align="center">
<img src="codescout_overview.png" alt="CodeScout Overview" width="100%">
</p>
CodeScout-1.7B is part of the **CodeScout** family of open-source RL-trained code search agents.
CodeScout models achieve state-of-the-art repository-level code localization using *nothing more than a standard Unix terminal* β no static analysis, no repository graphs, no language-specific tooling.
## Key Highlights
- Outperforms **8Γ larger Qwen3-14B** with absolute F1 gains of 11β18% for files and 10β15% for functions
- Competitive with **18Γ larger Qwen3-32B (Thinking)**, surpassing it by 3β6% in function F1
- Matches RepoNavigator-7B performance while being **4Γ smaller**
- Demonstrates that RL + distillation can compress strong code search into a 1.7B model
## Results
Performance on SWE-Bench code localization (instance-averaged F1 scores):
| Benchmark | CodeScout-1.7B | CodeScout-4B | CodeScout-14B |
|---|---|---|---|
| **SWE-Bench Verified** β File F1 | 55.46 | 68.52 | **68.57** |
| **SWE-Bench Verified** β Func F1 | 28.22 | 36.78 | **40.32** |
| **SWE-Bench Pro** β File F1 | 40.96 | 51.77 | **53.63** |
| **SWE-Bench Pro** β Func F1 | 18.24 | **29.03** | 28.74 |
| **SWE-Bench Lite** β File F1 | 56.57 | 67.03 | **71.84** |
| **SWE-Bench Lite** β Func F1 | 27.07 | 39.87 | **44.43** |
<p align="center">
<img src="f1_vs_params_file.png" alt="File-level F1 vs Model Size" width="48%">
<img src="f1_vs_params_function.png" alt="Function-level F1 vs Model Size" width="48%">
</p>
<p align="center"><em>Code localization performance on SWE-Bench Verified. CodeScout (β) achieves superior or competitive results over larger open-source LLMs and narrows the gap with closed-source frontier models.</em></p>
## Training
CodeScout-1.7B is trained in two stages:
**Stage 1 β Rejection Fine-Tuning (RFT):** `Qwen3-1.7B` is warm-started via supervised fine-tuning on 4K perfect-score trajectories (F1 = 1.0 at all granularities) sampled from CodeScout-14B, yielding the [CodeScout-1.7B-RFT](https://huggingface.co/OpenHands/CodeScout-1.7B-RFT) checkpoint.
**Stage 2 β RL Training:** CodeScout-1.7B-RFT is further trained with GSPO reinforcement learning.
- **Training data (RL):** 800 instances (disjoint from RFT data)
- **RL steps:** 100
- **Batch size:** 8, with 8 rollouts per instance
- **Max context length:** 32K tokens
- **Max turns per episode:** 4
- **Reward:** Multi-level F1 (file + module + function)
- **Hardware:** 8ΓH100 GPUs
- **Learning rate:** 1e-6 (constant)
## How It Works
CodeScout uses the **OpenHands-Bash** scaffold β an agent equipped with only a `Terminal` tool (supporting standard Unix commands like `rg`, `find`, `grep`, `ls`) and a `LocalizationFinish` tool for structured output submission. The agent iteratively navigates the repository to identify relevant files, classes, and functions related to a given issue.
The model is trained with **GSPO** (Group Sequence Policy Optimization) using multi-level F1 rewards at the file, module, and function level.
## Intended Use
CodeScout-1.7B is designed for **repository-level code localization**: given a GitHub issue description and a code repository, it identifies the relevant files, classes, and functions that need to be modified. It is intended to be used as a localization subagent within larger coding agent pipelines.
## Limitations
- Trained and evaluated exclusively on **Python** repositories
- Designed for code *localization*, not code *editing* or issue resolution
- Performance may vary on repositories significantly different from the training distribution
- Requires the OpenHands-Bash scaffold for optimal performance
## Citation
```bibtex
@misc{sutawika2026codescouteffectiverecipereinforcement,
title={CodeScout: An Effective Recipe for Reinforcement Learning of Code Search Agents},
author={Lintang Sutawika and Aditya Bharat Soni and Bharath Sriraam R R and Apurva Gandhi and Taha Yassine and Sanidhya Vijayvargiya and Yuchen Li and Xuhui Zhou and Yilin Zhang and Leander Melroy Maben and Graham Neubig},
year={2026},
eprint={2603.17829},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2603.17829},
}
```
|