Model Card for Qwen3-Coder-Next-GPTQ-Int4A16

This repository contains a 4-bit integer (INT4) quantized version of the Qwen/Qwen3-Coder-Next model, optimized using the GPTQ method.

Quantized in float16. The primary goal of this quantization is to enable high-performance inference on AMD Instinct MI100 GPUs and potentially other accelerators that may not have support for bfloat16.

As of vLLM v16 and ROCm 7.2 this model runs about 2x the speed of bfloat16 quants on AMD MI100.

Downloads last month
728
Safetensors
Model size
14B params
Tensor type
I64
I32
F16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for dazipe/Qwen3-Coder-Next-GPTQ-Int4A16

Quantized
(81)
this model