Qwen3.5-27B-NVFP4

This is a quantized version of Qwen/Qwen3.5-27B using the NVFP4 quantization scheme. Supports MTP and multi-modal.

Please use nightly vLLM for support.

Changelog

02/03/2026: Initial upload.

Calibration

Samples: 128 (64 from each dataset) - This model is not a MoE model so high sample count not necessary.
Datasets:
- HuggingFaceH4/ultrachat_200k (train_sft split)
- nvidia/Nemotron-Post-Training-Dataset-v2 (chat split)
Max sequence length: 4096

Creation

This model was created using VLLM's LLM Compressor. The quantization ignores lm_head, re:.*linear_attn.*, and re:model\.visual\..* layers, preserving the vision encoder and linear attention modules at full precision. The model was loaded with AutoModelForImageTextToText.

Downloads last month: 2,164

Safetensors

Model size

19B params

Tensor type

F32

BF16

F8_E4M3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Sehyo/Qwen3.5-27B-NVFP4

Base model

Qwen/Qwen3.5-27B

Quantized

(83)

this model