How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="rwiecekgmailcom/qwen35-claude-coder",
	filename="",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Qwen3.5 Claude Coder

Custom Qwen3.5 models tuned to act as autonomous coding / sysadmin agents inside Claude Code, fully local. They run tools instead of guessing, write files instead of pasting code, report only real tool output (no hallucinated hosts or numbers), and stay terse with thinking suppressed so they act immediately. 64K context, native tool-calling, Anthropic-compatible API.

What is in this repo (GGUF)

File Base Context Notes
gguf/qwen35-claude-coder-4b.gguf Qwen3.5 4B 64K Light, fast agent for 16GB Apple Silicon. ~30 tok/s.
gguf/qwen35-claude-coder-9b.gguf Qwen3.5 9B 64K Stronger, production-quality code. ~17 tok/s on 32GB, ~14 on 16GB.

Run via Ollama:

ollama run rafw007/qwen35-claude-coder:9b
ollama launch claude --model rafw007/qwen35-claude-coder:9b

⚠️ Note on the MLX variants

The MLX builds (*-mlx) exist ONLY inside a local Ollama install and were tested ONLY there. They are stored in Ollama internal MLX format (nvfp4) and were not pushed to the ollama.com registry, which currently rejects MLX-format manifests. They are not provided here as standalone mlx_lm weights and were not validated outside Ollama. This HF repo ships the portable GGUF weights plus the Modelfiles (full recipe) for every variant, including the MLX ones, so the build is reproducible. The published, downloadable models are the GGUF ones on ollama.com (rafw007/qwen35-claude-coder:4b and :9b).

Tested on

Real-terminal agent runs through Claude Code on a Mac Studio M2, 32GB RAM and Mac Mini M4 (16GB / 32GB), Ollama 0.24, Metal GPU. Disk and network agent tasks: correct tool calls, zero emoji, zero hallucination.

Recipe

See modelfiles/. Sampling: temperature 0.2, top_p 0.9, top_k 20, repeat_penalty 1.05, num_ctx 65536. System prompt enforces: act with tools now, write files, ground in real output, be terse, one language, never drift to Chinese.

How they were made

Built and tested with the help of Claude Opus — the idea that the best coding model should be able to create smaller models in its own image.

License

Apache 2.0 (inherited from base Qwen3.5).

Downloads last month
-
GGUF
Model size
5B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rwiecekgmailcom/qwen35-claude-coder

Finetuned
Qwen/Qwen3.5-9B
Quantized
(255)
this model