File size: 4,348 Bytes
063cab2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 | # Qwen3-8B CodeAgent π€π»
**A coding & agentic reasoning expert built on Qwen3-8B**
> Expert at coding, step-by-step reasoning, data visualization, tool calling, and research paper analysis
## π― Capabilities
| Capability | How it was trained | Dataset |
|---|---|---|
| **Coding** (any language) | SFT on code instructions + competitions | CodeFeedback + Magicoder + OpenCodeReasoning |
| **Agentic Reasoning** | Chain-of-thought with `<think>` blocks | nvidia/OpenCodeReasoning (R1-style traces) |
| **Data Visualization** | Chart/graph code generation | TIGER-Lab/VisCode-200K |
| **Tool Calling** | Function calling with JSON schemas | glaive-function-calling-v2 |
| **Anti-hallucination** | Step-by-step verification, assistant-only loss masking | All datasets with system prompt enforcement |
## ποΈ Architecture
- **Base Model**: [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) (8.2B params, Apache 2.0)
- **Fine-tuning**: QLoRA (4-bit NF4, r=64, alpha=16, RSLoRA)
- **Target modules**: all-linear (attention + MLP)
- **Training**: SFT with assistant-only loss masking
- **Context**: 4096 tokens (native 32K, extendable to 131K with YaRN)
## π Training Recipe
Based on research from:
- **Qwen3-Coder-Next** (arxiv: 2603.00729) β agentic coding training pipeline
- **Qwen2.5-Coder** (arxiv: 2409.12186) β coarse-to-fine SFT methodology
- **LoRA Without Regret** β high-rank LoRA with RSLoRA scaling
- **VisCoder** (arxiv: 2506.03930) β visualization code generation
- **FLAME** (arxiv: 2405.01525) β factuality-aware alignment
### Hyperparameters
| Parameter | Value |
|---|---|
| Learning rate | 2e-4 (10Γ base for LoRA) |
| LR scheduler | Cosine with 5% warmup |
| Epochs | 2 |
| Batch size | 16 (2 Γ 8 grad accum) |
| Max sequence length | 4096 |
| LoRA rank | 64 |
| LoRA alpha | 16 |
| Weight decay | 0.01 |
| Optimizer | AdamW |
| Precision | BF16 + TF32 |
### Dataset Mix (~50K samples)
| Dataset | Samples | Purpose |
|---|---|---|
| TIGER-Lab/VisCode-200K | 12,000 | Visualization & chart generation |
| m-a-p/CodeFeedback-Filtered-Instruction | 10,000 | Code instruction following |
| nvidia/OpenCodeReasoning | 10,000 | Code reasoning with `<think>` traces |
| glaiveai/glaive-function-calling-v2 | 8,000 | Function/tool calling |
| ise-uiuc/Magicoder-OSS-Instruct-75K | 10,000 | Code generation |
## π Quick Start
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "sukritvemula/Qwen3-8B-CodeAgent"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")
messages = [
{"role": "system", "content": "You are an expert coding assistant."},
{"role": "user", "content": "Write a Python function to visualize a binary tree using matplotlib."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.6, top_p=0.95, top_k=20)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
```
## π§ Inference Speed
| Hardware | Speed (tok/s) | Notes |
|---|---|---|
| A100 80GB (BF16) | ~100-150 | Full precision |
| A10G 24GB (BF16) | ~40-50 | Meets 40 tok/s target |
| RTX 4090 (BF16) | ~60-80 | Consumer GPU |
| Any GPU (AWQ INT4) | 2Γ above | Minimal quality loss |
**Recommended deployment**: [vLLM](https://github.com/vllm-project/vllm) or [SGLang](https://github.com/sgl-project/sglang)
```bash
# vLLM
vllm serve sukritvemula/Qwen3-8B-CodeAgent --enable-reasoning --reasoning-parser deepseek_r1
# SGLang
python -m sglang.launch_server --model-path sukritvemula/Qwen3-8B-CodeAgent --reasoning-parser qwen3
```
## π Training Script
See `train_coding_agent.py` in this repo for the full training pipeline.
## πΊοΈ Roadmap (Next Steps)
1. **Stage 2 β GRPO**: Reinforcement learning with code execution reward for improved reasoning
2. **Stage 3 β DPO**: Factuality alignment using FLAME methodology
3. **Multimodal**: Fine-tune Qwen3-VL-7B variant for image understanding + code generation
4. **Scale up**: Increase to 200K+ training samples across all domains
## π License
Apache 2.0 (inherited from Qwen3-8B)
|