| # Qwen3-8B CodeAgent π€π» |
|
|
| **A coding & agentic reasoning expert built on Qwen3-8B** |
|
|
| > Expert at coding, step-by-step reasoning, data visualization, tool calling, and research paper analysis |
|
|
| ## π― Capabilities |
|
|
| | Capability | How it was trained | Dataset | |
| |---|---|---| |
| | **Coding** (any language) | SFT on code instructions + competitions | CodeFeedback + Magicoder + OpenCodeReasoning | |
| | **Agentic Reasoning** | Chain-of-thought with `<think>` blocks | nvidia/OpenCodeReasoning (R1-style traces) | |
| | **Data Visualization** | Chart/graph code generation | TIGER-Lab/VisCode-200K | |
| | **Tool Calling** | Function calling with JSON schemas | glaive-function-calling-v2 | |
| | **Anti-hallucination** | Step-by-step verification, assistant-only loss masking | All datasets with system prompt enforcement | |
|
|
| ## ποΈ Architecture |
|
|
| - **Base Model**: [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) (8.2B params, Apache 2.0) |
| - **Fine-tuning**: QLoRA (4-bit NF4, r=64, alpha=16, RSLoRA) |
| - **Target modules**: all-linear (attention + MLP) |
| - **Training**: SFT with assistant-only loss masking |
| - **Context**: 4096 tokens (native 32K, extendable to 131K with YaRN) |
|
|
| ## π Training Recipe |
|
|
| Based on research from: |
| - **Qwen3-Coder-Next** (arxiv: 2603.00729) β agentic coding training pipeline |
| - **Qwen2.5-Coder** (arxiv: 2409.12186) β coarse-to-fine SFT methodology |
| - **LoRA Without Regret** β high-rank LoRA with RSLoRA scaling |
| - **VisCoder** (arxiv: 2506.03930) β visualization code generation |
| - **FLAME** (arxiv: 2405.01525) β factuality-aware alignment |
|
|
| ### Hyperparameters |
| | Parameter | Value | |
| |---|---| |
| | Learning rate | 2e-4 (10Γ base for LoRA) | |
| | LR scheduler | Cosine with 5% warmup | |
| | Epochs | 2 | |
| | Batch size | 16 (2 Γ 8 grad accum) | |
| | Max sequence length | 4096 | |
| | LoRA rank | 64 | |
| | LoRA alpha | 16 | |
| | Weight decay | 0.01 | |
| | Optimizer | AdamW | |
| | Precision | BF16 + TF32 | |
|
|
| ### Dataset Mix (~50K samples) |
| | Dataset | Samples | Purpose | |
| |---|---|---| |
| | TIGER-Lab/VisCode-200K | 12,000 | Visualization & chart generation | |
| | m-a-p/CodeFeedback-Filtered-Instruction | 10,000 | Code instruction following | |
| | nvidia/OpenCodeReasoning | 10,000 | Code reasoning with `<think>` traces | |
| | glaiveai/glaive-function-calling-v2 | 8,000 | Function/tool calling | |
| | ise-uiuc/Magicoder-OSS-Instruct-75K | 10,000 | Code generation | |
|
|
| ## π Quick Start |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| import torch |
| |
| model_id = "sukritvemula/Qwen3-8B-CodeAgent" |
| tokenizer = AutoTokenizer.from_pretrained(model_id) |
| model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto") |
| |
| messages = [ |
| {"role": "system", "content": "You are an expert coding assistant."}, |
| {"role": "user", "content": "Write a Python function to visualize a binary tree using matplotlib."} |
| ] |
| |
| text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
| inputs = tokenizer(text, return_tensors="pt").to(model.device) |
| outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.6, top_p=0.95, top_k=20) |
| print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)) |
| ``` |
|
|
| ## π§ Inference Speed |
|
|
| | Hardware | Speed (tok/s) | Notes | |
| |---|---|---| |
| | A100 80GB (BF16) | ~100-150 | Full precision | |
| | A10G 24GB (BF16) | ~40-50 | Meets 40 tok/s target | |
| | RTX 4090 (BF16) | ~60-80 | Consumer GPU | |
| | Any GPU (AWQ INT4) | 2Γ above | Minimal quality loss | |
|
|
| **Recommended deployment**: [vLLM](https://github.com/vllm-project/vllm) or [SGLang](https://github.com/sgl-project/sglang) |
|
|
| ```bash |
| # vLLM |
| vllm serve sukritvemula/Qwen3-8B-CodeAgent --enable-reasoning --reasoning-parser deepseek_r1 |
| |
| # SGLang |
| python -m sglang.launch_server --model-path sukritvemula/Qwen3-8B-CodeAgent --reasoning-parser qwen3 |
| ``` |
|
|
| ## π Training Script |
|
|
| See `train_coding_agent.py` in this repo for the full training pipeline. |
|
|
| ## πΊοΈ Roadmap (Next Steps) |
|
|
| 1. **Stage 2 β GRPO**: Reinforcement learning with code execution reward for improved reasoning |
| 2. **Stage 3 β DPO**: Factuality alignment using FLAME methodology |
| 3. **Multimodal**: Fine-tune Qwen3-VL-7B variant for image understanding + code generation |
| 4. **Scale up**: Increase to 200K+ training samples across all domains |
|
|
| ## π License |
|
|
| Apache 2.0 (inherited from Qwen3-8B) |
|
|