Model Card for Qwen3-Coder-Next-GPTQ-Int4A16
This repository contains a 4-bit integer (INT4) quantized version of the Qwen/Qwen3-Coder-Next model, optimized using the GPTQ method.
Quantized in float16. The primary goal of this quantization is to enable high-performance inference on AMD Instinct MI100 GPUs and potentially other accelerators that may not have support for bfloat16.
As of vLLM v16 and ROCm 7.2 this model runs about 2x the speed of bfloat16 quants on AMD MI100.
- Downloads last month
- 728
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support
Model tree for dazipe/Qwen3-Coder-Next-GPTQ-Int4A16
Base model
Qwen/Qwen3-Coder-Next