ShubhamRasal commited on
Commit
f87298e
·
verified ·
1 Parent(s): 2407ec5

Add README with training recipe and usage instructions

Browse files
Files changed (1) hide show
  1. README.md +62 -62
README.md CHANGED
@@ -1,85 +1,85 @@
1
- ---
2
- language:
3
- - en
4
- license: apache-2.0
5
- base_model: Qwen/Qwen3-8B
6
- tags:
7
- - code
8
- - agent
9
- - bash
10
- - grep
11
- - code-search
12
- - swe-agent
13
- - tool-use
14
- - sft
15
- - lora
16
- - trl
17
- datasets:
18
- - SWE-bench/SWE-smith-trajectories
19
- pipeline_tag: text-generation
20
- ---
21
-
22
  # Qwen3-8B Code Navigator
23
 
24
- A LoRA fine-tune of [Qwen/Qwen3-8B](https://hf.co/Qwen/Qwen3-8B) trained to use **bash, grep, and find** commands for efficient code navigation and search in repositories.
25
 
26
- ## Training Details
27
 
28
- ### Method
29
- - **Base model:** Qwen/Qwen3-8B (8.2B params)
30
- - **Training method:** SFT with LoRA (r=64, alpha=128)
31
- - **Loss masking:** Assistant-only loss (following SWE-Master's multi-turn masking strategy)
32
- - **Trainable parameters:** ~3.5% of total
33
 
34
- ### Dataset
35
- - **[SWE-bench/SWE-smith-trajectories](https://hf.co/datasets/SWE-bench/SWE-smith-trajectories)** (tool split)
36
- - ~5,000 successful (resolved) agent trajectories
37
- - Multi-turn conversations with bash/grep/find tool calls
38
- - Generated by Claude 3.7 Sonnet solving real GitHub issues
 
 
39
 
40
- ### Hyperparameters
41
- | Parameter | Value |
42
- |-----------|-------|
43
- | Epochs | 2 |
44
- | Effective batch size | 8 (1 × 8 grad accum) |
45
- | Learning rate | 1e-4 |
46
- | LR scheduler | Cosine with 10% warmup |
47
- | Max sequence length | 32,768 tokens |
48
- | Weight decay | 0.01 |
49
- | Precision | bf16 |
50
 
51
- ### References
52
- - **SWE-Master** ([arXiv:2602.03411](https://arxiv.org/abs/2602.03411)): Post-training framework for SWE agents
53
- - **SWE-smith** ([arXiv:2504.21798](https://arxiv.org/abs/2504.21798)): Large-scale SWE training data
54
- - **TRL SFTTrainer** ([docs](https://huggingface.co/docs/trl/sft_trainer))
55
 
56
- ## Usage
57
 
58
- The model generates bash commands in XML function-call format:
 
59
 
60
- ```python
61
- from transformers import pipeline
 
 
 
 
62
 
63
- pipe = pipeline("text-generation", model="ShubhamRasal/qwen3-8b-code-navigator")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64
 
65
  messages = [
66
- {"role": "system", "content": "You are a helpful assistant that can interact with a computer to solve tasks. You have access to bash."},
67
- {"role": "user", "content": "Find all Python files in /project that contain the word 'authenticate'"}
68
  ]
69
 
70
- response = pipe(messages, max_new_tokens=512)
 
 
 
71
  ```
72
 
73
- The model will respond with commands like:
 
 
74
  ```
75
  <function=bash>
76
- <parameter=command>grep -rn "authenticate" /project --include="*.py"</parameter>
77
  </function>
78
  ```
79
 
80
- ## Intended Use
 
 
 
 
 
 
 
81
 
82
- - Code search and navigation in large repositories
83
- - Finding relevant files for bug fixes
84
- - Repository exploration using shell tools
85
- - SWE-agent style code understanding tasks
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # Qwen3-8B Code Navigator
2
 
3
+ Fine-tuned Qwen3-8B to efficiently use `grep`, `find`, `bash`, and file editing tools for code repository navigation.
4
 
5
+ ## Training Recipe
6
 
7
+ Based on **SWE-Master (2025)** and **SWE-Dev (2025)** methodologies:
 
 
 
 
8
 
9
+ - **Base model**: [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B)
10
+ - **Dataset**: [SWE-bench/SWE-smith-trajectories](https://huggingface.co/datasets/SWE-bench/SWE-smith-trajectories) (tool split, resolved only → ~5K trajectories)
11
+ - **Method**: SFT with LoRA (r=64, α=128) + assistant-only loss masking
12
+ - **Tools**: `bash` (grep, find, cat, etc.) + `str_replace_editor` (view/edit files)
13
+ - **Context**: 16K tokens
14
+ - **Epochs**: 2
15
+ - **Hardware**: A100-80GB recommended
16
 
17
+ ## Key Design Decisions
 
 
 
 
 
 
 
 
 
18
 
19
+ 1. **Assistant-only loss**: Only trains on model's reasoning + tool calls, not on bash outputs or system prompts (following SWE-Master §3.3)
20
+ 2. **Proper tool_calls format**: Converts SWE-smith's XML `<function=bash>` format to Qwen3's native `<tool_call>` format
21
+ 3. **Observation truncation**: Long bash outputs are truncated to prevent wasting context on noise
22
+ 4. **LoRA targets**: All attention + MLP projections for maximum expressiveness
23
 
24
+ ## How to Train
25
 
26
+ ```bash
27
+ pip install transformers trl torch datasets trackio accelerate peft flash-attn
28
 
29
+ # Single GPU (A100-80GB)
30
+ python train.py
31
+
32
+ # Multi-GPU with accelerate
33
+ accelerate launch train.py
34
+ ```
35
 
36
+ ## How to Use
37
+
38
+ ```python
39
+ from transformers import AutoModelForCausalLM, AutoTokenizer
40
+ from peft import PeftModel
41
+
42
+ base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B", torch_dtype="bfloat16")
43
+ model = PeftModel.from_pretrained(base_model, "ShubhamRasal/qwen3-8b-code-navigator")
44
+ tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")
45
+
46
+ tools = [
47
+ {"type": "function", "function": {
48
+ "name": "bash",
49
+ "description": "Execute a bash command",
50
+ "parameters": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}
51
+ }}
52
+ ]
53
 
54
  messages = [
55
+ {"role": "system", "content": "You are an expert software engineer that navigates code repositories using bash commands."},
56
+ {"role": "user", "content": "Find all Python files that implement authentication in this Django project at /repo"}
57
  ]
58
 
59
+ text = tokenizer.apply_chat_template(messages, tools=tools, tokenize=False, add_generation_prompt=True)
60
+ inputs = tokenizer(text, return_tensors="pt").to(model.device)
61
+ outputs = model.generate(**inputs, max_new_tokens=512)
62
+ print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:]))
63
  ```
64
 
65
+ ## Dataset Preprocessing
66
+
67
+ The SWE-smith trajectories use an XML function call format:
68
  ```
69
  <function=bash>
70
+ <parameter=command>grep -r "auth" /testbed --include="*.py"</parameter>
71
  </function>
72
  ```
73
 
74
+ This is converted to Qwen3's native tool calling format:
75
+ ```
76
+ <tool_call>
77
+ {"name": "bash", "arguments": {"command": "grep -r \"auth\" /testbed --include=\"*.py\""}}
78
+ </tool_call>
79
+ ```
80
+
81
+ ## References
82
 
83
+ - [SWE-Master: Multi-Turn SFT for SWE Agents](https://arxiv.org/abs/2504.11862) (2025)
84
+ - [SWE-smith: Scaling Data for Software Engineering Agents](https://arxiv.org/abs/2504.21798) (2025)
85
+ - [SWE-Dev: Training LLMs for SWE with Execution Feedback](https://arxiv.org/abs/2505.06641) (2025)