These are quantizations of the model Jackrong / Qwopus3.5-9B-Coder
I've added the MTP layer on it.
My personal speed improvement on my 7900XTX with the vulkan backend has been from ~80 tps to around ~120 tps. An imatrix has been calulated for coding tasks, as such it is specialized for coding.

Quick Start

  1. Download the latest release of llama.cpp.
  2. Download your preferred model variant from below.
Downloads last month
-
GGUF
Model size
9B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for phucngodev/Qwopus3.5-9B-Coder-MTP

Finetuned
Qwen/Qwen3.5-9B
Quantized
(2)
this model