Model Card for Qwen3-Coder-Next-GPTQ-Int4A16

This repository contains a 4-bit integer (INT4) quantized version of the Qwen/Qwen3-Coder-Next model, optimized using the GPTQ method.

Quantized in float16. The primary goal of this quantization is to enable high-performance inference on AMD Instinct MI100 GPUs and potentially other accelerators that may not have support for bfloat16.

As of vLLM v16 and ROCm 7.2 this model runs about 2x the speed of bfloat16 quants on AMD MI100.