This is a custom GGUF quantization of Qwen3-Coder-Next, using the unsloth imatrix data with specific focus on retaining quality in embedding, output and attention tensors.

IQ4_XS quantization script:

QUANT="IQ4_XS"
llama-quantize \
  --output-tensor-type q8_0 \
  --token-embedding-type q8_0 \
  --tensor-type attn_qkv=bf16 \
  --tensor-type attn_v=bf16 \
  --tensor-type attn_q=bf16 \
  --tensor-type attn_k=bf16 \
  --tensor-type attn_gate=bf16 \
  --tensor-type attn_output=bf16 \
  --tensor-type ssm_ba=bf16 \
  --tensor-type ssm_beta=bf16 \
  --tensor-type ssm_alpha=bf16 \
  --tensor-type ssm_out=bf16 \
  --tensor-type ffn_down_shexp=bf16 \
  --tensor-type ffn_gate_shexp=bf16 \
  --tensor-type ffn_up_shexp=bf16 \
  --tensor-type ffn_down_exps=iq4_nl \
  --imatrix Qwen-Coder-Next-imatrix.gguf_file \
  BF16/Qwen3-Coder-Next-BF16-00001-of-00004.gguf \
  Qwen3-Coder-Next.${QUANT}.gguf \
  ${QUANT}

IQ3_S quantization script:

QUANT="IQ3_S"
llama-quantize \
  --output-tensor-type q6_k \
  --token-embedding-type q6_k \
  --tensor-type attn_qkv=bf16 \
  --tensor-type attn_v=bf16 \
  --tensor-type attn_q=bf16 \
  --tensor-type attn_k=bf16 \
  --tensor-type attn_gate=bf16 \
  --tensor-type attn_output=bf16 \
  --tensor-type ssm_ba=bf16 \
  --tensor-type ssm_beta=bf16 \
  --tensor-type ssm_alpha=bf16 \
  --tensor-type ssm_out=bf16 \
  --tensor-type ffn_down_shexp=bf16 \
  --tensor-type ffn_gate_shexp=bf16 \
  --tensor-type ffn_up_shexp=bf16 \
  --tensor-type ffn_down_exps=iq4_xs \
  --imatrix Qwen-Coder-Next-imatrix.gguf_file \
  BF16/Qwen3-Coder-Next-BF16-00001-of-00004.gguf \
  Qwen3-Coder-Next.${QUANT}.gguf \
  ${QUANT}
Downloads last month
1,081
GGUF
Model size
80B params
Architecture
qwen3next
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for dinerburger/Qwen3-Coder-Next-GGUF

Quantized
(82)
this model