YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Qwen3-8B CodeAgent πŸ€–πŸ’»

A coding & agentic reasoning expert built on Qwen3-8B

Expert at coding, step-by-step reasoning, data visualization, tool calling, and research paper analysis

🎯 Capabilities

Capability How it was trained Dataset
Coding (any language) SFT on code instructions + competitions CodeFeedback + Magicoder + OpenCodeReasoning
Agentic Reasoning Chain-of-thought with <think> blocks nvidia/OpenCodeReasoning (R1-style traces)
Data Visualization Chart/graph code generation TIGER-Lab/VisCode-200K
Tool Calling Function calling with JSON schemas glaive-function-calling-v2
Anti-hallucination Step-by-step verification, assistant-only loss masking All datasets with system prompt enforcement

πŸ—οΈ Architecture

  • Base Model: Qwen/Qwen3-8B (8.2B params, Apache 2.0)
  • Fine-tuning: QLoRA (4-bit NF4, r=64, alpha=16, RSLoRA)
  • Target modules: all-linear (attention + MLP)
  • Training: SFT with assistant-only loss masking
  • Context: 4096 tokens (native 32K, extendable to 131K with YaRN)

πŸ“Š Training Recipe

Based on research from:

  • Qwen3-Coder-Next (arxiv: 2603.00729) β€” agentic coding training pipeline
  • Qwen2.5-Coder (arxiv: 2409.12186) β€” coarse-to-fine SFT methodology
  • LoRA Without Regret β€” high-rank LoRA with RSLoRA scaling
  • VisCoder (arxiv: 2506.03930) β€” visualization code generation
  • FLAME (arxiv: 2405.01525) β€” factuality-aware alignment

Hyperparameters

Parameter Value
Learning rate 2e-4 (10Γ— base for LoRA)
LR scheduler Cosine with 5% warmup
Epochs 2
Batch size 16 (2 Γ— 8 grad accum)
Max sequence length 4096
LoRA rank 64
LoRA alpha 16
Weight decay 0.01
Optimizer AdamW
Precision BF16 + TF32

Dataset Mix (~50K samples)

Dataset Samples Purpose
TIGER-Lab/VisCode-200K 12,000 Visualization & chart generation
m-a-p/CodeFeedback-Filtered-Instruction 10,000 Code instruction following
nvidia/OpenCodeReasoning 10,000 Code reasoning with <think> traces
glaiveai/glaive-function-calling-v2 8,000 Function/tool calling
ise-uiuc/Magicoder-OSS-Instruct-75K 10,000 Code generation

πŸš€ Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "sukritvemula/Qwen3-8B-CodeAgent"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")

messages = [
    {"role": "system", "content": "You are an expert coding assistant."},
    {"role": "user", "content": "Write a Python function to visualize a binary tree using matplotlib."}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.6, top_p=0.95, top_k=20)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))

πŸ”§ Inference Speed

Hardware Speed (tok/s) Notes
A100 80GB (BF16) ~100-150 Full precision
A10G 24GB (BF16) ~40-50 Meets 40 tok/s target
RTX 4090 (BF16) ~60-80 Consumer GPU
Any GPU (AWQ INT4) 2Γ— above Minimal quality loss

Recommended deployment: vLLM or SGLang

# vLLM
vllm serve sukritvemula/Qwen3-8B-CodeAgent --enable-reasoning --reasoning-parser deepseek_r1

# SGLang
python -m sglang.launch_server --model-path sukritvemula/Qwen3-8B-CodeAgent --reasoning-parser qwen3

πŸ“ Training Script

See train_coding_agent.py in this repo for the full training pipeline.

πŸ—ΊοΈ Roadmap (Next Steps)

  1. Stage 2 β€” GRPO: Reinforcement learning with code execution reward for improved reasoning
  2. Stage 3 β€” DPO: Factuality alignment using FLAME methodology
  3. Multimodal: Fine-tune Qwen3-VL-7B variant for image understanding + code generation
  4. Scale up: Increase to 200K+ training samples across all domains

πŸ“„ License

Apache 2.0 (inherited from Qwen3-8B)

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support