Add README with training recipe and usage instructions
Browse files
README.md
CHANGED
|
@@ -1,85 +1,85 @@
|
|
| 1 |
-
---
|
| 2 |
-
language:
|
| 3 |
-
- en
|
| 4 |
-
license: apache-2.0
|
| 5 |
-
base_model: Qwen/Qwen3-8B
|
| 6 |
-
tags:
|
| 7 |
-
- code
|
| 8 |
-
- agent
|
| 9 |
-
- bash
|
| 10 |
-
- grep
|
| 11 |
-
- code-search
|
| 12 |
-
- swe-agent
|
| 13 |
-
- tool-use
|
| 14 |
-
- sft
|
| 15 |
-
- lora
|
| 16 |
-
- trl
|
| 17 |
-
datasets:
|
| 18 |
-
- SWE-bench/SWE-smith-trajectories
|
| 19 |
-
pipeline_tag: text-generation
|
| 20 |
-
---
|
| 21 |
-
|
| 22 |
# Qwen3-8B Code Navigator
|
| 23 |
|
| 24 |
-
|
| 25 |
|
| 26 |
-
## Training
|
| 27 |
|
| 28 |
-
|
| 29 |
-
- **Base model:** Qwen/Qwen3-8B (8.2B params)
|
| 30 |
-
- **Training method:** SFT with LoRA (r=64, alpha=128)
|
| 31 |
-
- **Loss masking:** Assistant-only loss (following SWE-Master's multi-turn masking strategy)
|
| 32 |
-
- **Trainable parameters:** ~3.5% of total
|
| 33 |
|
| 34 |
-
|
| 35 |
-
- **[SWE-bench/SWE-smith-trajectories](https://
|
| 36 |
-
-
|
| 37 |
-
-
|
| 38 |
-
-
|
|
|
|
|
|
|
| 39 |
|
| 40 |
-
##
|
| 41 |
-
| Parameter | Value |
|
| 42 |
-
|-----------|-------|
|
| 43 |
-
| Epochs | 2 |
|
| 44 |
-
| Effective batch size | 8 (1 × 8 grad accum) |
|
| 45 |
-
| Learning rate | 1e-4 |
|
| 46 |
-
| LR scheduler | Cosine with 10% warmup |
|
| 47 |
-
| Max sequence length | 32,768 tokens |
|
| 48 |
-
| Weight decay | 0.01 |
|
| 49 |
-
| Precision | bf16 |
|
| 50 |
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
|
| 56 |
-
##
|
| 57 |
|
| 58 |
-
|
|
|
|
| 59 |
|
| 60 |
-
|
| 61 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 62 |
|
| 63 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
|
| 65 |
messages = [
|
| 66 |
-
{"role": "system", "content": "You are
|
| 67 |
-
{"role": "user", "content": "Find all Python files
|
| 68 |
]
|
| 69 |
|
| 70 |
-
|
|
|
|
|
|
|
|
|
|
| 71 |
```
|
| 72 |
|
| 73 |
-
|
|
|
|
|
|
|
| 74 |
```
|
| 75 |
<function=bash>
|
| 76 |
-
<parameter=command>grep -
|
| 77 |
</function>
|
| 78 |
```
|
| 79 |
|
| 80 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 81 |
|
| 82 |
-
-
|
| 83 |
-
-
|
| 84 |
-
-
|
| 85 |
-
- SWE-agent style code understanding tasks
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# Qwen3-8B Code Navigator
|
| 2 |
|
| 3 |
+
Fine-tuned Qwen3-8B to efficiently use `grep`, `find`, `bash`, and file editing tools for code repository navigation.
|
| 4 |
|
| 5 |
+
## Training Recipe
|
| 6 |
|
| 7 |
+
Based on **SWE-Master (2025)** and **SWE-Dev (2025)** methodologies:
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
+
- **Base model**: [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B)
|
| 10 |
+
- **Dataset**: [SWE-bench/SWE-smith-trajectories](https://huggingface.co/datasets/SWE-bench/SWE-smith-trajectories) (tool split, resolved only → ~5K trajectories)
|
| 11 |
+
- **Method**: SFT with LoRA (r=64, α=128) + assistant-only loss masking
|
| 12 |
+
- **Tools**: `bash` (grep, find, cat, etc.) + `str_replace_editor` (view/edit files)
|
| 13 |
+
- **Context**: 16K tokens
|
| 14 |
+
- **Epochs**: 2
|
| 15 |
+
- **Hardware**: A100-80GB recommended
|
| 16 |
|
| 17 |
+
## Key Design Decisions
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
+
1. **Assistant-only loss**: Only trains on model's reasoning + tool calls, not on bash outputs or system prompts (following SWE-Master §3.3)
|
| 20 |
+
2. **Proper tool_calls format**: Converts SWE-smith's XML `<function=bash>` format to Qwen3's native `<tool_call>` format
|
| 21 |
+
3. **Observation truncation**: Long bash outputs are truncated to prevent wasting context on noise
|
| 22 |
+
4. **LoRA targets**: All attention + MLP projections for maximum expressiveness
|
| 23 |
|
| 24 |
+
## How to Train
|
| 25 |
|
| 26 |
+
```bash
|
| 27 |
+
pip install transformers trl torch datasets trackio accelerate peft flash-attn
|
| 28 |
|
| 29 |
+
# Single GPU (A100-80GB)
|
| 30 |
+
python train.py
|
| 31 |
+
|
| 32 |
+
# Multi-GPU with accelerate
|
| 33 |
+
accelerate launch train.py
|
| 34 |
+
```
|
| 35 |
|
| 36 |
+
## How to Use
|
| 37 |
+
|
| 38 |
+
```python
|
| 39 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 40 |
+
from peft import PeftModel
|
| 41 |
+
|
| 42 |
+
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B", torch_dtype="bfloat16")
|
| 43 |
+
model = PeftModel.from_pretrained(base_model, "ShubhamRasal/qwen3-8b-code-navigator")
|
| 44 |
+
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")
|
| 45 |
+
|
| 46 |
+
tools = [
|
| 47 |
+
{"type": "function", "function": {
|
| 48 |
+
"name": "bash",
|
| 49 |
+
"description": "Execute a bash command",
|
| 50 |
+
"parameters": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}
|
| 51 |
+
}}
|
| 52 |
+
]
|
| 53 |
|
| 54 |
messages = [
|
| 55 |
+
{"role": "system", "content": "You are an expert software engineer that navigates code repositories using bash commands."},
|
| 56 |
+
{"role": "user", "content": "Find all Python files that implement authentication in this Django project at /repo"}
|
| 57 |
]
|
| 58 |
|
| 59 |
+
text = tokenizer.apply_chat_template(messages, tools=tools, tokenize=False, add_generation_prompt=True)
|
| 60 |
+
inputs = tokenizer(text, return_tensors="pt").to(model.device)
|
| 61 |
+
outputs = model.generate(**inputs, max_new_tokens=512)
|
| 62 |
+
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:]))
|
| 63 |
```
|
| 64 |
|
| 65 |
+
## Dataset Preprocessing
|
| 66 |
+
|
| 67 |
+
The SWE-smith trajectories use an XML function call format:
|
| 68 |
```
|
| 69 |
<function=bash>
|
| 70 |
+
<parameter=command>grep -r "auth" /testbed --include="*.py"</parameter>
|
| 71 |
</function>
|
| 72 |
```
|
| 73 |
|
| 74 |
+
This is converted to Qwen3's native tool calling format:
|
| 75 |
+
```
|
| 76 |
+
<tool_call>
|
| 77 |
+
{"name": "bash", "arguments": {"command": "grep -r \"auth\" /testbed --include=\"*.py\""}}
|
| 78 |
+
</tool_call>
|
| 79 |
+
```
|
| 80 |
+
|
| 81 |
+
## References
|
| 82 |
|
| 83 |
+
- [SWE-Master: Multi-Turn SFT for SWE Agents](https://arxiv.org/abs/2504.11862) (2025)
|
| 84 |
+
- [SWE-smith: Scaling Data for Software Engineering Agents](https://arxiv.org/abs/2504.21798) (2025)
|
| 85 |
+
- [SWE-Dev: Training LLMs for SWE with Execution Feedback](https://arxiv.org/abs/2505.06641) (2025)
|
|
|