Instructions to use rwiecekgmailcom/qwen35-claude-coder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use rwiecekgmailcom/qwen35-claude-coder with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("rwiecekgmailcom/qwen35-claude-coder")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

llama-cpp-python

How to use rwiecekgmailcom/qwen35-claude-coder with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="rwiecekgmailcom/qwen35-claude-coder",
	filename="gguf/qwen35-claude-coder-4b.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use rwiecekgmailcom/qwen35-claude-coder with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf rwiecekgmailcom/qwen35-claude-coder
# Run inference directly in the terminal:
llama-cli -hf rwiecekgmailcom/qwen35-claude-coder

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf rwiecekgmailcom/qwen35-claude-coder
# Run inference directly in the terminal:
llama-cli -hf rwiecekgmailcom/qwen35-claude-coder

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf rwiecekgmailcom/qwen35-claude-coder
# Run inference directly in the terminal:
./llama-cli -hf rwiecekgmailcom/qwen35-claude-coder

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf rwiecekgmailcom/qwen35-claude-coder
# Run inference directly in the terminal:
./build/bin/llama-cli -hf rwiecekgmailcom/qwen35-claude-coder

Use Docker

docker model run hf.co/rwiecekgmailcom/qwen35-claude-coder

LM Studio
Jan

vLLM

How to use rwiecekgmailcom/qwen35-claude-coder with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "rwiecekgmailcom/qwen35-claude-coder"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rwiecekgmailcom/qwen35-claude-coder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/rwiecekgmailcom/qwen35-claude-coder

Ollama
How to use rwiecekgmailcom/qwen35-claude-coder with Ollama:
```
ollama run hf.co/rwiecekgmailcom/qwen35-claude-coder
```

Unsloth Studio new

How to use rwiecekgmailcom/qwen35-claude-coder with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for rwiecekgmailcom/qwen35-claude-coder to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for rwiecekgmailcom/qwen35-claude-coder to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for rwiecekgmailcom/qwen35-claude-coder to start chatting

Pi new

How to use rwiecekgmailcom/qwen35-claude-coder with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "rwiecekgmailcom/qwen35-claude-coder"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "rwiecekgmailcom/qwen35-claude-coder"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use rwiecekgmailcom/qwen35-claude-coder with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "rwiecekgmailcom/qwen35-claude-coder"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default rwiecekgmailcom/qwen35-claude-coder

Run Hermes

hermes

MLX LM

How to use rwiecekgmailcom/qwen35-claude-coder with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "rwiecekgmailcom/qwen35-claude-coder"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "rwiecekgmailcom/qwen35-claude-coder"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "rwiecekgmailcom/qwen35-claude-coder",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

Docker Model Runner
How to use rwiecekgmailcom/qwen35-claude-coder with Docker Model Runner:
```
docker model run hf.co/rwiecekgmailcom/qwen35-claude-coder
```

Lemonade

How to use rwiecekgmailcom/qwen35-claude-coder with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull rwiecekgmailcom/qwen35-claude-coder

Run and chat with the model

lemonade run user.qwen35-claude-coder-{{QUANT_TAG}}

List all available models

lemonade list

Qwen3.5 Claude Coder

Custom Qwen3.5 models tuned to act as autonomous coding / sysadmin agents inside Claude Code, fully local. They run tools instead of guessing, write files instead of pasting code, report only real tool output (no hallucinated hosts or numbers), and stay terse with thinking suppressed so they act immediately. 64K context, native tool-calling, Anthropic-compatible API.

What is in this repo (GGUF)

File	Base	Context	Notes
`gguf/qwen35-claude-coder-4b.gguf`	Qwen3.5 4B	64K	Light, fast agent for 16GB Apple Silicon. ~30 tok/s.
`gguf/qwen35-claude-coder-9b.gguf`	Qwen3.5 9B	64K	Stronger, production-quality code. ~17 tok/s on 32GB, ~14 on 16GB.

Run via Ollama:

ollama run rafw007/qwen35-claude-coder:9b
ollama launch claude --model rafw007/qwen35-claude-coder:9b

⚠️ Note on the MLX variants

The MLX builds (*-mlx) exist ONLY inside a local Ollama install and were tested ONLY there. They are stored in Ollama internal MLX format (nvfp4) and were not pushed to the ollama.com registry, which currently rejects MLX-format manifests. They are not provided here as standalone mlx_lm weights and were not validated outside Ollama. This HF repo ships the portable GGUF weights plus the Modelfiles (full recipe) for every variant, including the MLX ones, so the build is reproducible. The published, downloadable models are the GGUF ones on ollama.com (rafw007/qwen35-claude-coder:4b and :9b).

Tested on

Real-terminal agent runs through Claude Code on a Mac Studio M2, 32GB RAM and Mac Mini M4 (16GB / 32GB), Ollama 0.24, Metal GPU. Disk and network agent tasks: correct tool calls, zero emoji, zero hallucination.

Recipe

See modelfiles/. Sampling: temperature 0.2, top_p 0.9, top_k 20, repeat_penalty 1.05, num_ctx 65536. System prompt enforces: act with tools now, write files, ground in real output, be terse, one language, never drift to Chinese.

How they were made

Built and tested with the help of Claude Opus — the idea that the best coding model should be able to create smaller models in its own image.

License

Apache 2.0 (inherited from base Qwen3.5).

Downloads last month: -

GGUF

Model size

5B params

Architecture

qwen35

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Model tree for rwiecekgmailcom/qwen35-claude-coder

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Quantized

(255)

this model