Add training script and README for Qwen3-8B CodeAgent

063cab2 verified 26 days ago

4.35 kB

	# Qwen3-8B CodeAgent 🤖💻

	A coding & agentic reasoning expert built on Qwen3-8B

	> Expert at coding, step-by-step reasoning, data visualization, tool calling, and research paper analysis

	## 🎯 Capabilities

	\| Capability \| How it was trained \| Dataset \|
	\|---\|---\|---\|
	\| Coding (any language) \| SFT on code instructions + competitions \| CodeFeedback + Magicoder + OpenCodeReasoning \|
	\| Agentic Reasoning \| Chain-of-thought with `<think>` blocks \| nvidia/OpenCodeReasoning (R1-style traces) \|
	\| Data Visualization \| Chart/graph code generation \| TIGER-Lab/VisCode-200K \|
	\| Tool Calling \| Function calling with JSON schemas \| glaive-function-calling-v2 \|
	\| Anti-hallucination \| Step-by-step verification, assistant-only loss masking \| All datasets with system prompt enforcement \|

	## 🏗️ Architecture

	- Base Model: [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) (8.2B params, Apache 2.0)
	- Fine-tuning: QLoRA (4-bit NF4, r=64, alpha=16, RSLoRA)
	- Target modules: all-linear (attention + MLP)
	- Training: SFT with assistant-only loss masking
	- Context: 4096 tokens (native 32K, extendable to 131K with YaRN)

	## 📊 Training Recipe

	Based on research from:
	- Qwen3-Coder-Next (arxiv: 2603.00729) — agentic coding training pipeline
	- Qwen2.5-Coder (arxiv: 2409.12186) — coarse-to-fine SFT methodology
	- LoRA Without Regret — high-rank LoRA with RSLoRA scaling
	- VisCoder (arxiv: 2506.03930) — visualization code generation
	- FLAME (arxiv: 2405.01525) — factuality-aware alignment

	### Hyperparameters
	\| Parameter \| Value \|
	\|---\|---\|
	\| Learning rate \| 2e-4 (10× base for LoRA) \|
	\| LR scheduler \| Cosine with 5% warmup \|
	\| Epochs \| 2 \|
	\| Batch size \| 16 (2 × 8 grad accum) \|
	\| Max sequence length \| 4096 \|
	\| LoRA rank \| 64 \|
	\| LoRA alpha \| 16 \|
	\| Weight decay \| 0.01 \|
	\| Optimizer \| AdamW \|
	\| Precision \| BF16 + TF32 \|

	### Dataset Mix (~50K samples)
	\| Dataset \| Samples \| Purpose \|
	\|---\|---\|---\|
	\| TIGER-Lab/VisCode-200K \| 12,000 \| Visualization & chart generation \|
	\| m-a-p/CodeFeedback-Filtered-Instruction \| 10,000 \| Code instruction following \|
	\| nvidia/OpenCodeReasoning \| 10,000 \| Code reasoning with `<think>` traces \|
	\| glaiveai/glaive-function-calling-v2 \| 8,000 \| Function/tool calling \|
	\| ise-uiuc/Magicoder-OSS-Instruct-75K \| 10,000 \| Code generation \|

	## 🚀 Quick Start

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model_id = "sukritvemula/Qwen3-8B-CodeAgent"
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")

	messages = [
	{"role": "system", "content": "You are an expert coding assistant."},
	{"role": "user", "content": "Write a Python function to visualize a binary tree using matplotlib."}
	]

	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(text, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.6, top_p=0.95, top_k=20)
	print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
	```

	## 🔧 Inference Speed

	\| Hardware \| Speed (tok/s) \| Notes \|
	\|---\|---\|---\|
	\| A100 80GB (BF16) \| ~100-150 \| Full precision \|
	\| A10G 24GB (BF16) \| ~40-50 \| Meets 40 tok/s target \|
	\| RTX 4090 (BF16) \| ~60-80 \| Consumer GPU \|
	\| Any GPU (AWQ INT4) \| 2× above \| Minimal quality loss \|

	Recommended deployment: [vLLM](https://github.com/vllm-project/vllm) or [SGLang](https://github.com/sgl-project/sglang)

	```bash
	# vLLM
	vllm serve sukritvemula/Qwen3-8B-CodeAgent --enable-reasoning --reasoning-parser deepseek_r1

	# SGLang
	python -m sglang.launch_server --model-path sukritvemula/Qwen3-8B-CodeAgent --reasoning-parser qwen3
	```

	## 📝 Training Script

	See `train_coding_agent.py` in this repo for the full training pipeline.

	## 🗺️ Roadmap (Next Steps)

	1. Stage 2 — GRPO: Reinforcement learning with code execution reward for improved reasoning
	2. Stage 3 — DPO: Factuality alignment using FLAME methodology
	3. Multimodal: Fine-tune Qwen3-VL-7B variant for image understanding + code generation
	4. Scale up: Increase to 200K+ training samples across all domains

	## 📄 License

	Apache 2.0 (inherited from Qwen3-8B)

	# Qwen3-8B CodeAgent 🤖💻

	A coding & agentic reasoning expert built on Qwen3-8B

	> Expert at coding, step-by-step reasoning, data visualization, tool calling, and research paper analysis

	## 🎯 Capabilities

	\| Capability \| How it was trained \| Dataset \|
	\|---\|---\|---\|
	\| Coding (any language) \| SFT on code instructions + competitions \| CodeFeedback + Magicoder + OpenCodeReasoning \|
	\| Agentic Reasoning \| Chain-of-thought with `<think>` blocks \| nvidia/OpenCodeReasoning (R1-style traces) \|
	\| Data Visualization \| Chart/graph code generation \| TIGER-Lab/VisCode-200K \|
	\| Tool Calling \| Function calling with JSON schemas \| glaive-function-calling-v2 \|
	\| Anti-hallucination \| Step-by-step verification, assistant-only loss masking \| All datasets with system prompt enforcement \|

	## 🏗️ Architecture

	- Base Model: [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) (8.2B params, Apache 2.0)
	- Fine-tuning: QLoRA (4-bit NF4, r=64, alpha=16, RSLoRA)
	- Target modules: all-linear (attention + MLP)
	- Training: SFT with assistant-only loss masking
	- Context: 4096 tokens (native 32K, extendable to 131K with YaRN)

	## 📊 Training Recipe

	Based on research from:
	- Qwen3-Coder-Next (arxiv: 2603.00729) — agentic coding training pipeline
	- Qwen2.5-Coder (arxiv: 2409.12186) — coarse-to-fine SFT methodology
	- LoRA Without Regret — high-rank LoRA with RSLoRA scaling
	- VisCoder (arxiv: 2506.03930) — visualization code generation
	- FLAME (arxiv: 2405.01525) — factuality-aware alignment

	### Hyperparameters
	\| Parameter \| Value \|
	\|---\|---\|
	\| Learning rate \| 2e-4 (10× base for LoRA) \|
	\| LR scheduler \| Cosine with 5% warmup \|
	\| Epochs \| 2 \|
	\| Batch size \| 16 (2 × 8 grad accum) \|
	\| Max sequence length \| 4096 \|
	\| LoRA rank \| 64 \|
	\| LoRA alpha \| 16 \|
	\| Weight decay \| 0.01 \|
	\| Optimizer \| AdamW \|
	\| Precision \| BF16 + TF32 \|

	### Dataset Mix (~50K samples)
	\| Dataset \| Samples \| Purpose \|
	\|---\|---\|---\|
	\| TIGER-Lab/VisCode-200K \| 12,000 \| Visualization & chart generation \|
	\| m-a-p/CodeFeedback-Filtered-Instruction \| 10,000 \| Code instruction following \|
	\| nvidia/OpenCodeReasoning \| 10,000 \| Code reasoning with `<think>` traces \|
	\| glaiveai/glaive-function-calling-v2 \| 8,000 \| Function/tool calling \|
	\| ise-uiuc/Magicoder-OSS-Instruct-75K \| 10,000 \| Code generation \|

	## 🚀 Quick Start

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model_id = "sukritvemula/Qwen3-8B-CodeAgent"
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")

	messages = [
	{"role": "system", "content": "You are an expert coding assistant."},
	{"role": "user", "content": "Write a Python function to visualize a binary tree using matplotlib."}
	]

	text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(text, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.6, top_p=0.95, top_k=20)
	print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
	```

	## 🔧 Inference Speed

	\| Hardware \| Speed (tok/s) \| Notes \|
	\|---\|---\|---\|
	\| A100 80GB (BF16) \| ~100-150 \| Full precision \|
	\| A10G 24GB (BF16) \| ~40-50 \| Meets 40 tok/s target \|
	\| RTX 4090 (BF16) \| ~60-80 \| Consumer GPU \|
	\| Any GPU (AWQ INT4) \| 2× above \| Minimal quality loss \|

	Recommended deployment: [vLLM](https://github.com/vllm-project/vllm) or [SGLang](https://github.com/sgl-project/sglang)

	```bash
	# vLLM
	vllm serve sukritvemula/Qwen3-8B-CodeAgent --enable-reasoning --reasoning-parser deepseek_r1

	# SGLang
	python -m sglang.launch_server --model-path sukritvemula/Qwen3-8B-CodeAgent --reasoning-parser qwen3
	```

	## 📝 Training Script

	See `train_coding_agent.py` in this repo for the full training pipeline.

	## 🗺️ Roadmap (Next Steps)

	1. Stage 2 — GRPO: Reinforcement learning with code execution reward for improved reasoning
	2. Stage 3 — DPO: Factuality alignment using FLAME methodology
	3. Multimodal: Fine-tune Qwen3-VL-7B variant for image understanding + code generation
	4. Scale up: Increase to 200K+ training samples across all domains

	## 📄 License

	Apache 2.0 (inherited from Qwen3-8B)