How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Zoed/Qwen3-Coder-30B-A3B-Instruct",
	filename="Qwen3-Coder-30B-A3B-Instruct-f16-Q4_K_M.gguf",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Qwen3-Coder-30B-A3B-Instruct ยท Q4_K_M GGUF

This is a Q4_K_M GGUF quantization of Qwen/Qwen3-Coder-30B-A3B-Instruct, produced from the f16 base.

Property Value
Base model Qwen/Qwen3-Coder-30B-A3B-Instruct
Quantization Q4_K_M
Format GGUF
Parameters 30B (MoE, ~3B active)

About the base model

Qwen3-Coder-30B-A3B-Instruct is a Mixture-of-Experts (MoE) code-focused instruction model developed by Qwen Team, Alibaba Cloud. It features 30B total parameters with ~3B active parameters per token.

For full details, see the original model page.

Usage

llama.cpp

llama-cli \
  -m Qwen3-Coder-30B-A3B-Instruct-f16-Q4_K_M.gguf \
  --chat-template qwen3 \
  -p "Write a Python function that sorts a list of dictionaries by a given key." \
  -n 512

llama-server

llama-server \
  -m Qwen3-Coder-30B-A3B-Instruct-f16-Q4_K_M.gguf \
  --chat-template qwen3 \
  --port 8080

Ollama (via Modelfile)

FROM ./Qwen3-Coder-30B-A3B-Instruct-f16-Q4_K_M.gguf
PARAMETER num_ctx 32768
TEMPLATE "{{ ... }}"  # use Qwen3 chat template

Quantization details

File Quant Size (approx.)
Qwen3-Coder-30B-A3B-Instruct-f16-Q4_K_M.gguf Q4_K_M ~17 GB

Q4_K_M uses 4-bit quantization with K-quant method on most layers, providing a good balance between size and quality.

License

This quantized model is derived from Qwen/Qwen3-Coder-30B-A3B-Instruct and is released under the same Apache 2.0 License.

Per Qwen's terms, appropriate credit is given to the original authors:

Qwen3-Coder-30B-A3B-Instruct is developed by Qwen Team, Alibaba Cloud. Original model: https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct

Citation

@misc{qwen3coder,
  title        = {Qwen3-Coder},
  author       = {Qwen Team},
  year         = {2025},
  organization = {Alibaba Cloud},
  url          = {https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct}
}
Downloads last month
162
GGUF
Model size
31B params
Architecture
qwen3moe
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Zoed/Qwen3-Coder-30B-A3B-Instruct

Quantized
(137)
this model