Agentic Qwen GGUF Models

Quantized Qwen2.5-Coder-Instruct models optimized for agentic CLI tasks on 6GB VRAM GPUs.

Models

Model	Size	Context	Use Case
`qwen-0.5b-q4_k_m.gguf`	380MB	32k	Fast, simple tool calls
`qwen-1.5b-q4_k_m.gguf`	941MB	32k	Smarter, still fast

Quick Start with Ollama

# Download and create
wget https://huggingface.co/antoniostepien/agentic-qwen-gguf/resolve/main/qwen-1.5b-q4_k_m.gguf
wget https://huggingface.co/antoniostepien/agentic-qwen-gguf/resolve/main/Modelfile-1.5b

ollama create agentic-1.5b -f Modelfile-1.5b
ollama run agentic-1.5b

Use with Claude Code / OpenAI API

# Ollama serves OpenAI-compatible API on :11434
claude --model ollama/agentic-1.5b

# Or set base URL
export OPENAI_API_BASE=http://localhost:11434/v1

Tool Calling Format

The models use <tool_call> XML tags:

<tool_call>{"name": "shell", "arguments": {"command": "ls -la"}}</tool_call>

VRAM Usage (32k context)

0.5B Q4: ~2-3GB total
1.5B Q4: ~4-5GB total

Perfect for RTX 3060, RTX 4060, or similar 6GB cards.

License

Apache 2.0 (same as base Qwen2.5-Coder models)

Downloads last month: 57

GGUF

Model size

0.5B params

Architecture

qwen2

Hardware compatibility

4-bit