Instructions to use unsloth/Qwen3-Coder-Next-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use unsloth/Qwen3-Coder-Next-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="unsloth/Qwen3-Coder-Next-GGUF",
	filename="BF16/Qwen3-Coder-Next-BF16-00001-of-00004.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use unsloth/Qwen3-Coder-Next-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf unsloth/Qwen3-Coder-Next-GGUF:UD-Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf unsloth/Qwen3-Coder-Next-GGUF:UD-Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf unsloth/Qwen3-Coder-Next-GGUF:UD-Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf unsloth/Qwen3-Coder-Next-GGUF:UD-Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf unsloth/Qwen3-Coder-Next-GGUF:UD-Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf unsloth/Qwen3-Coder-Next-GGUF:UD-Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf unsloth/Qwen3-Coder-Next-GGUF:UD-Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf unsloth/Qwen3-Coder-Next-GGUF:UD-Q4_K_M

Use Docker

docker model run hf.co/unsloth/Qwen3-Coder-Next-GGUF:UD-Q4_K_M

LM Studio
Jan

vLLM

How to use unsloth/Qwen3-Coder-Next-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "unsloth/Qwen3-Coder-Next-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "unsloth/Qwen3-Coder-Next-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/unsloth/Qwen3-Coder-Next-GGUF:UD-Q4_K_M

Ollama
How to use unsloth/Qwen3-Coder-Next-GGUF with Ollama:
```
ollama run hf.co/unsloth/Qwen3-Coder-Next-GGUF:UD-Q4_K_M
```

Unsloth Studio

How to use unsloth/Qwen3-Coder-Next-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for unsloth/Qwen3-Coder-Next-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for unsloth/Qwen3-Coder-Next-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for unsloth/Qwen3-Coder-Next-GGUF to start chatting

How to use unsloth/Qwen3-Coder-Next-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf unsloth/Qwen3-Coder-Next-GGUF:UD-Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "unsloth/Qwen3-Coder-Next-GGUF:UD-Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use unsloth/Qwen3-Coder-Next-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf unsloth/Qwen3-Coder-Next-GGUF:UD-Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default unsloth/Qwen3-Coder-Next-GGUF:UD-Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use unsloth/Qwen3-Coder-Next-GGUF with Docker Model Runner:
```
docker model run hf.co/unsloth/Qwen3-Coder-Next-GGUF:UD-Q4_K_M
```

Lemonade

How to use unsloth/Qwen3-Coder-Next-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull unsloth/Qwen3-Coder-Next-GGUF:UD-Q4_K_M

Run and chat with the model

lemonade run user.Qwen3-Coder-Next-GGUF-UD-Q4_K_M

List all available models

lemonade list

Error jinja template LM Studio + Open Code or Qwen code or Kilo Code

by RGMC98 - opened Feb 4

Discussion

RGMC98

Feb 4

I have this error with the model : LMS api + codin gagent

Garf

Feb 4

Using opencode, I see a different kind of error, not related to the "safe" thing:

invalid [tool=write, error=Invalid input for tool write: JSON parsing failed: Text: {"content":"use ...

SlavikF

Feb 4

similar error with QWEN native GGUF:

https://huggingface.co/Qwen/Qwen3-Coder-Next-GGUF/discussions/1

Anton8264

Feb 5

the write tool in opencode with this model gives an error:
invalid [tool=write, error=Invalid input for tool write: JSON parsing failed: Text: { ... }.
Error message: JSON Parse error: Unrecognized token '/']

dhencio29

Feb 5

HOW?

SlavikF

Feb 5

I checked today, with latest build of llama.cpp ghcr.io/ggml-org/llama.cpp:server-cuda12-b7941
and everything appears to be working. Tools are working.

I think this PR fixed it:
https://github.com/ggml-org/llama.cpp/pull/19239

Tested with RooCode and opencode.

Garf

Feb 5

That fix is 3 days old so it would already have been in all my testing. It's definitely still broken with opencode.

Anton8264

Feb 5

•

edited Feb 5

write tool is still broken for me in opencode with llama.cpp version: 7948 (b828e18c7) and Qwen3-Coder-Next-UD-Q5_K_XL-00001-of-00003.gguf downloaded 5. feb.

Example prompt:

use write tool to write "{
"first_name": "Sammy",
"last_name": "Shark",
"location": "Ocean",
"online": true,
"followers": 987
}" into test.txt

Result:
← Write test.txt
Error: The write tool was called with invalid arguments: [
{
"expected": "string",
"code": "invalid_type",
"path": [
"content"
],
"message": "Invalid input: expected string, received object"
}
].
Please rewrite the input so it satisfies the expected schema.

SlavikF

Feb 5

Interesting. Below is the screenshot of the example how it works in Visual code + RooCode. Not sure why it works in some cases, but not others:

SBoys3

Feb 5

I am also having this problem. It consistently fails with write tool calls in opencode. Although other tool calls such as edit seem to work. It was failing with this error when I checked:
Invalid input for tool write: JSON parsing failed: Text: {"content":"valid code","filePath":"/path/to/file","filePath"/path/to/file"}.
Error message: JSON Parse error: Unrecognized token '/'

sulpher

Feb 6

I got it working with this reverse proxy which I wrote some time ago to connect a streaming client to llama-server when it wasn't able to stream when tool calling was used so it's unlikely an model issue.
https://github.com/crashr/llama-stream

meadowair

Feb 16

The PR for the fix has been tested, but it hasn’t been merged into the main branch yet.
https://github.com/pwilkin/llama.cpp/tree/autoparser

Anton8264

Feb 16

The PR for the fix has been tested, but it hasn’t been merged into the main branch yet.
https://github.com/pwilkin/llama.cpp/tree/autoparser

Man, thanks for the hint, this autoparser branch works like a charm. No tool errors anymore. Kudos to the developer.

robert1968

Feb 16

•

edited Feb 17

I got it working with this reverse proxy which I wrote some time ago to connect a streaming client to llama-server when it wasn't able to stream when tool calling was used so it's unlikely an model issue.
https://github.com/crashr/llama-stream

Many thanks for this! :)
it is fully solved the Opencode tool call issue!

I see now others also have segfaults on llama.cpp server randomly - and the autoparser solved this. so i dont suggest llama-stream as a full solution. it solved the save file tool call issue i had earlier, but does not solve the segfault.
i try now the autoparser.

UPDATE (2026.02.17)
Autoparser solve not only the call tool issue but also the random llama.cpp server segfault issues:
https://github.com/pwilkin/llama.cpp/tree/autoparser

Many thanks.

TimmyD21

Feb 17

Holy crap. After 2 straight days of trying to figure out why llama.cpp was segfaulting when using opencode and qwen3-coder-next and a full rebuild of my fedora 43 machine I dev on - pwilkin's autoparser branch solved all of my problems. Now up and running on llama with ROCm 7.2, and opencode is rock solid. THANK YOU!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment