ShubhamRasal
/

qwen3-8b-code-navigator

Model card Files Files and versions

xet

Community

ShubhamRasal commited on Apr 22

Commit

f87298e

verified ·

1 Parent(s): 2407ec5

Add README with training recipe and usage instructions

Browse files

Files changed (1) hide show

README.md +62 -62

README.md CHANGED Viewed

@@ -1,85 +1,85 @@
----
-language:
-- en
-license: apache-2.0
-base_model: Qwen/Qwen3-8B
-tags:
-- code
-- agent
-- bash
-- grep
-- code-search
-- swe-agent
-- tool-use
-- sft
-- lora
-- trl
-datasets:
-- SWE-bench/SWE-smith-trajectories
-pipeline_tag: text-generation
----
 # Qwen3-8B Code Navigator
-A LoRA fine-tune of [Qwen/Qwen3-8B](https://hf.co/Qwen/Qwen3-8B) trained to use **bash, grep, and find** commands for efficient code navigation and search in repositories.
-## Training Details
-### Method
-- **Base model:** Qwen/Qwen3-8B (8.2B params)
-- **Training method:** SFT with LoRA (r=64, alpha=128)
-- **Loss masking:** Assistant-only loss (following SWE-Master's multi-turn masking strategy)
-- **Trainable parameters:** ~3.5% of total
-### Dataset
-- **[SWE-bench/SWE-smith-trajectories](https://hf.co/datasets/SWE-bench/SWE-smith-trajectories)** (tool split)
-- ~5,000 successful (resolved) agent trajectories
-- Multi-turn conversations with bash/grep/find tool calls
-- Generated by Claude 3.7 Sonnet solving real GitHub issues
-### Hyperparameters
-| Parameter | Value |
-|-----------|-------|
-| Epochs | 2 |
-| Effective batch size | 8 (1 × 8 grad accum) |
-| Learning rate | 1e-4 |
-| LR scheduler | Cosine with 10% warmup |
-| Max sequence length | 32,768 tokens |
-| Weight decay | 0.01 |
-| Precision | bf16 |
-### References
-- **SWE-Master** ([arXiv:2602.03411](https://arxiv.org/abs/2602.03411)): Post-training framework for SWE agents
-- **SWE-smith** ([arXiv:2504.21798](https://arxiv.org/abs/2504.21798)): Large-scale SWE training data
-- **TRL SFTTrainer** ([docs](https://huggingface.co/docs/trl/sft_trainer))
-## Usage
-The model generates bash commands in XML function-call format:
-```python
-from transformers import pipeline
-pipe = pipeline("text-generation", model="ShubhamRasal/qwen3-8b-code-navigator")
 messages = [
-    {"role": "system", "content": "You are a helpful assistant that can interact with a computer to solve tasks. You have access to bash."},
-    {"role": "user", "content": "Find all Python files in /project that contain the word 'authenticate'"}
 ]
-response = pipe(messages, max_new_tokens=512)
 ```
-The model will respond with commands like:
 ```
 <function=bash>
-<parameter=command>grep -rn "authenticate" /project --include="*.py"</parameter>
 </function>
 ```
-## Intended Use
-- Code search and navigation in large repositories
-- Finding relevant files for bug fixes
-- Repository exploration using shell tools
-- SWE-agent style code understanding tasks

 # Qwen3-8B Code Navigator
+Fine-tuned Qwen3-8B to efficiently use `grep`, `find`, `bash`, and file editing tools for code repository navigation.
+## Training Recipe
+Based on **SWE-Master (2025)** and **SWE-Dev (2025)** methodologies:
+- **Base model**: [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B)
+- **Dataset**: [SWE-bench/SWE-smith-trajectories](https://huggingface.co/datasets/SWE-bench/SWE-smith-trajectories) (tool split, resolved only → ~5K trajectories)
+- **Method**: SFT with LoRA (r=64, α=128) + assistant-only loss masking
+- **Tools**: `bash` (grep, find, cat, etc.) + `str_replace_editor` (view/edit files)
+- **Context**: 16K tokens
+- **Epochs**: 2
+- **Hardware**: A100-80GB recommended
+## Key Design Decisions
+1. **Assistant-only loss**: Only trains on model's reasoning + tool calls, not on bash outputs or system prompts (following SWE-Master §3.3)
+2. **Proper tool_calls format**: Converts SWE-smith's XML `<function=bash>` format to Qwen3's native `<tool_call>` format
+3. **Observation truncation**: Long bash outputs are truncated to prevent wasting context on noise
+4. **LoRA targets**: All attention + MLP projections for maximum expressiveness
+## How to Train
+```bash
+pip install transformers trl torch datasets trackio accelerate peft flash-attn
+# Single GPU (A100-80GB)
+python train.py
+# Multi-GPU with accelerate
+accelerate launch train.py
+```
+## How to Use
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel
+base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-8B", torch_dtype="bfloat16")
+model = PeftModel.from_pretrained(base_model, "ShubhamRasal/qwen3-8b-code-navigator")
+tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-8B")
+tools = [
+    {"type": "function", "function": {
+        "name": "bash",
+        "description": "Execute a bash command",
+        "parameters": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}
+    }}
+]
 messages = [
+    {"role": "system", "content": "You are an expert software engineer that navigates code repositories using bash commands."},
+    {"role": "user", "content": "Find all Python files that implement authentication in this Django project at /repo"}
 ]
+text = tokenizer.apply_chat_template(messages, tools=tools, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(text, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=512)
+print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:]))
 ```
+## Dataset Preprocessing
+The SWE-smith trajectories use an XML function call format:
 ```
 <function=bash>
+<parameter=command>grep -r "auth" /testbed --include="*.py"</parameter>
 </function>
 ```
+This is converted to Qwen3's native tool calling format:
+```
+<tool_call>
+{"name": "bash", "arguments": {"command": "grep -r \"auth\" /testbed --include=\"*.py\""}}
+</tool_call>
+```
+## References
+- [SWE-Master: Multi-Turn SFT for SWE Agents](https://arxiv.org/abs/2504.11862) (2025)
+- [SWE-smith: Scaling Data for Software Engineering Agents](https://arxiv.org/abs/2504.21798) (2025)
+- [SWE-Dev: Training LLMs for SWE with Execution Feedback](https://arxiv.org/abs/2505.06641) (2025)