NVFP4
Collection
NVFP4 is an innovative 4-bit floating point format introduced with the NVIDIA Blackwell GPU architecture • 6 items • Updated • 1
Quantized version of Qwen/Qwen3.5-9B using NVFP4A16 (4-bit floating point weights, 16-bit activations).
lm_head, visual encoder (re:.*visual.*), linear attention (re:.*linear_attn.*), MTP modules (re:.*mtp.*)vllm serve 2imi9/Qwen3.5-9B-NVFP4A16