Instructions to use rwiecekgmailcom/qwen35-claude-coder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use rwiecekgmailcom/qwen35-claude-coder with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("rwiecekgmailcom/qwen35-claude-coder") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - llama-cpp-python
How to use rwiecekgmailcom/qwen35-claude-coder with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="rwiecekgmailcom/qwen35-claude-coder", filename="gguf/qwen35-claude-coder-4b.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use rwiecekgmailcom/qwen35-claude-coder with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf rwiecekgmailcom/qwen35-claude-coder # Run inference directly in the terminal: llama-cli -hf rwiecekgmailcom/qwen35-claude-coder
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf rwiecekgmailcom/qwen35-claude-coder # Run inference directly in the terminal: llama-cli -hf rwiecekgmailcom/qwen35-claude-coder
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf rwiecekgmailcom/qwen35-claude-coder # Run inference directly in the terminal: ./llama-cli -hf rwiecekgmailcom/qwen35-claude-coder
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf rwiecekgmailcom/qwen35-claude-coder # Run inference directly in the terminal: ./build/bin/llama-cli -hf rwiecekgmailcom/qwen35-claude-coder
Use Docker
docker model run hf.co/rwiecekgmailcom/qwen35-claude-coder
- LM Studio
- Jan
- vLLM
How to use rwiecekgmailcom/qwen35-claude-coder with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "rwiecekgmailcom/qwen35-claude-coder" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rwiecekgmailcom/qwen35-claude-coder", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/rwiecekgmailcom/qwen35-claude-coder
- Ollama
How to use rwiecekgmailcom/qwen35-claude-coder with Ollama:
ollama run hf.co/rwiecekgmailcom/qwen35-claude-coder
- Unsloth Studio new
How to use rwiecekgmailcom/qwen35-claude-coder with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rwiecekgmailcom/qwen35-claude-coder to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rwiecekgmailcom/qwen35-claude-coder to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for rwiecekgmailcom/qwen35-claude-coder to start chatting
- Pi new
How to use rwiecekgmailcom/qwen35-claude-coder with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "rwiecekgmailcom/qwen35-claude-coder"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "rwiecekgmailcom/qwen35-claude-coder" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use rwiecekgmailcom/qwen35-claude-coder with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "rwiecekgmailcom/qwen35-claude-coder"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default rwiecekgmailcom/qwen35-claude-coder
Run Hermes
hermes
- MLX LM
How to use rwiecekgmailcom/qwen35-claude-coder with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "rwiecekgmailcom/qwen35-claude-coder"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "rwiecekgmailcom/qwen35-claude-coder" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rwiecekgmailcom/qwen35-claude-coder", "messages": [ {"role": "user", "content": "Hello"} ] }' - Docker Model Runner
How to use rwiecekgmailcom/qwen35-claude-coder with Docker Model Runner:
docker model run hf.co/rwiecekgmailcom/qwen35-claude-coder
- Lemonade
How to use rwiecekgmailcom/qwen35-claude-coder with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull rwiecekgmailcom/qwen35-claude-coder
Run and chat with the model
lemonade run user.qwen35-claude-coder-{{QUANT_TAG}}List all available models
lemonade list
Qwen3.5 Claude Coder
Custom Qwen3.5 models tuned to act as autonomous coding / sysadmin agents inside Claude Code, fully local. They run tools instead of guessing, write files instead of pasting code, report only real tool output (no hallucinated hosts or numbers), and stay terse with thinking suppressed so they act immediately. 64K context, native tool-calling, Anthropic-compatible API.
What is in this repo (GGUF)
| File | Base | Context | Notes |
|---|---|---|---|
gguf/qwen35-claude-coder-4b.gguf |
Qwen3.5 4B | 64K | Light, fast agent for 16GB Apple Silicon. ~30 tok/s. |
gguf/qwen35-claude-coder-9b.gguf |
Qwen3.5 9B | 64K | Stronger, production-quality code. ~17 tok/s on 32GB, ~14 on 16GB. |
Run via Ollama:
ollama run rafw007/qwen35-claude-coder:9b
ollama launch claude --model rafw007/qwen35-claude-coder:9b
⚠️ Note on the MLX variants
The MLX builds (*-mlx) exist ONLY inside a local Ollama install and were tested ONLY there.
They are stored in Ollama internal MLX format (nvfp4) and were not pushed to the ollama.com registry, which currently rejects MLX-format manifests. They are not provided here as standalone mlx_lm weights and were not validated outside Ollama. This HF repo ships the portable GGUF weights plus the Modelfiles (full recipe) for every variant, including the MLX ones, so the build is reproducible. The published, downloadable models are the GGUF ones on ollama.com (rafw007/qwen35-claude-coder:4b and :9b).
Tested on
Real-terminal agent runs through Claude Code on a Mac Studio M2, 32GB RAM and Mac Mini M4 (16GB / 32GB), Ollama 0.24, Metal GPU. Disk and network agent tasks: correct tool calls, zero emoji, zero hallucination.
Recipe
See modelfiles/. Sampling: temperature 0.2, top_p 0.9, top_k 20, repeat_penalty 1.05, num_ctx 65536. System prompt enforces: act with tools now, write files, ground in real output, be terse, one language, never drift to Chinese.
How they were made
Built and tested with the help of Claude Opus — the idea that the best coding model should be able to create smaller models in its own image.
License
Apache 2.0 (inherited from base Qwen3.5).
- Downloads last month
- -
We're not able to determine the quantization variants.