Agentic Qwen GGUF Models
Quantized Qwen2.5-Coder-Instruct models optimized for agentic CLI tasks on 6GB VRAM GPUs.
Models
| Model | Size | Context | Use Case |
|---|---|---|---|
qwen-0.5b-q4_k_m.gguf |
380MB | 32k | Fast, simple tool calls |
qwen-1.5b-q4_k_m.gguf |
941MB | 32k | Smarter, still fast |
Quick Start with Ollama
# Download and create
wget https://huggingface.co/antoniostepien/agentic-qwen-gguf/resolve/main/qwen-1.5b-q4_k_m.gguf
wget https://huggingface.co/antoniostepien/agentic-qwen-gguf/resolve/main/Modelfile-1.5b
ollama create agentic-1.5b -f Modelfile-1.5b
ollama run agentic-1.5b
Use with Claude Code / OpenAI API
# Ollama serves OpenAI-compatible API on :11434
claude --model ollama/agentic-1.5b
# Or set base URL
export OPENAI_API_BASE=http://localhost:11434/v1
Tool Calling Format
The models use <tool_call> XML tags:
<tool_call>{"name": "shell", "arguments": {"command": "ls -la"}}</tool_call>
VRAM Usage (32k context)
- 0.5B Q4: ~2-3GB total
- 1.5B Q4: ~4-5GB total
Perfect for RTX 3060, RTX 4060, or similar 6GB cards.
License
Apache 2.0 (same as base Qwen2.5-Coder models)
- Downloads last month
- 57
Hardware compatibility
Log In to add your hardware
4-bit