This is a custom GGUF quantization of Qwen3-Coder-Next, using the unsloth imatrix data with specific focus on retaining quality in embedding, output and attention tensors.
IQ4_XS quantization script:
QUANT="IQ4_XS"
llama-quantize \
--output-tensor-type q8_0 \
--token-embedding-type q8_0 \
--tensor-type attn_qkv=bf16 \
--tensor-type attn_v=bf16 \
--tensor-type attn_q=bf16 \
--tensor-type attn_k=bf16 \
--tensor-type attn_gate=bf16 \
--tensor-type attn_output=bf16 \
--tensor-type ssm_ba=bf16 \
--tensor-type ssm_beta=bf16 \
--tensor-type ssm_alpha=bf16 \
--tensor-type ssm_out=bf16 \
--tensor-type ffn_down_shexp=bf16 \
--tensor-type ffn_gate_shexp=bf16 \
--tensor-type ffn_up_shexp=bf16 \
--tensor-type ffn_down_exps=iq4_nl \
--imatrix Qwen-Coder-Next-imatrix.gguf_file \
BF16/Qwen3-Coder-Next-BF16-00001-of-00004.gguf \
Qwen3-Coder-Next.${QUANT}.gguf \
${QUANT}
IQ3_S quantization script:
QUANT="IQ3_S"
llama-quantize \
--output-tensor-type q6_k \
--token-embedding-type q6_k \
--tensor-type attn_qkv=bf16 \
--tensor-type attn_v=bf16 \
--tensor-type attn_q=bf16 \
--tensor-type attn_k=bf16 \
--tensor-type attn_gate=bf16 \
--tensor-type attn_output=bf16 \
--tensor-type ssm_ba=bf16 \
--tensor-type ssm_beta=bf16 \
--tensor-type ssm_alpha=bf16 \
--tensor-type ssm_out=bf16 \
--tensor-type ffn_down_shexp=bf16 \
--tensor-type ffn_gate_shexp=bf16 \
--tensor-type ffn_up_shexp=bf16 \
--tensor-type ffn_down_exps=iq4_xs \
--imatrix Qwen-Coder-Next-imatrix.gguf_file \
BF16/Qwen3-Coder-Next-BF16-00001-of-00004.gguf \
Qwen3-Coder-Next.${QUANT}.gguf \
${QUANT}
- Downloads last month
- 1,081
Hardware compatibility
Log In to add your hardware
3-bit
4-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for dinerburger/Qwen3-Coder-Next-GGUF
Base model
Qwen/Qwen3-Coder-Next