Qwen3-8B CodeAgent 🤖💻

A coding & agentic reasoning expert built on Qwen3-8B

Expert at coding, step-by-step reasoning, data visualization, tool calling, and research paper analysis

🎯 Capabilities

Capability	How it was trained	Dataset
Coding (any language)	SFT on code instructions + competitions	CodeFeedback + Magicoder + OpenCodeReasoning
Agentic Reasoning	Chain-of-thought with `<think>` blocks	nvidia/OpenCodeReasoning (R1-style traces)
Data Visualization	Chart/graph code generation	TIGER-Lab/VisCode-200K
Tool Calling	Function calling with JSON schemas	glaive-function-calling-v2
Anti-hallucination	Step-by-step verification, assistant-only loss masking	All datasets with system prompt enforcement

🏗️ Architecture

Base Model: Qwen/Qwen3-8B (8.2B params, Apache 2.0)
Fine-tuning: QLoRA (4-bit NF4, r=64, alpha=16, RSLoRA)
Target modules: all-linear (attention + MLP)
Training: SFT with assistant-only loss masking
Context: 4096 tokens (native 32K, extendable to 131K with YaRN)

📊 Training Recipe

Based on research from:

Qwen3-Coder-Next (arxiv: 2603.00729) — agentic coding training pipeline
Qwen2.5-Coder (arxiv: 2409.12186) — coarse-to-fine SFT methodology
LoRA Without Regret — high-rank LoRA with RSLoRA scaling
VisCoder (arxiv: 2506.03930) — visualization code generation
FLAME (arxiv: 2405.01525) — factuality-aware alignment

Hyperparameters

Parameter	Value
Learning rate	2e-4 (10× base for LoRA)
LR scheduler	Cosine with 5% warmup
Epochs	2
Batch size	16 (2 × 8 grad accum)
Max sequence length	4096
LoRA rank	64
LoRA alpha	16
Weight decay	0.01
Optimizer	AdamW
Precision	BF16 + TF32

Dataset Mix (~50K samples)

Dataset	Samples	Purpose
TIGER-Lab/VisCode-200K	12,000	Visualization & chart generation
m-a-p/CodeFeedback-Filtered-Instruction	10,000	Code instruction following
nvidia/OpenCodeReasoning	10,000	Code reasoning with `<think>` traces
glaiveai/glaive-function-calling-v2	8,000	Function/tool calling
ise-uiuc/Magicoder-OSS-Instruct-75K	10,000	Code generation

🚀 Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "sukritvemula/Qwen3-8B-CodeAgent"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")

messages = [
    {"role": "system", "content": "You are an expert coding assistant."},
    {"role": "user", "content": "Write a Python function to visualize a binary tree using matplotlib."}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.6, top_p=0.95, top_k=20)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

🔧 Inference Speed

Hardware	Speed (tok/s)	Notes
A100 80GB (BF16)	~100-150	Full precision
A10G 24GB (BF16)	~40-50	Meets 40 tok/s target
RTX 4090 (BF16)	~60-80	Consumer GPU
Any GPU (AWQ INT4)	2× above	Minimal quality loss

Recommended deployment: vLLM or SGLang

# vLLM
vllm serve sukritvemula/Qwen3-8B-CodeAgent --enable-reasoning --reasoning-parser deepseek_r1

# SGLang
python -m sglang.launch_server --model-path sukritvemula/Qwen3-8B-CodeAgent --reasoning-parser qwen3

📝 Training Script

See train_coding_agent.py in this repo for the full training pipeline.

🗺️ Roadmap (Next Steps)

Stage 2 — GRPO: Reinforcement learning with code execution reward for improved reasoning
Stage 3 — DPO: Factuality alignment using FLAME methodology
Multimodal: Fine-tune Qwen3-VL-7B variant for image understanding + code generation
Scale up: Increase to 200K+ training samples across all domains

📄 License

Apache 2.0 (inherited from Qwen3-8B)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support