Qwen3.5-27B-NVFP4
This is a quantized version of Qwen/Qwen3.5-27B using the NVFP4 quantization scheme. Supports MTP and multi-modal.
Please use nightly vLLM for support.
Changelog
- 02/03/2026: Initial upload.
Calibration
- Samples: 128 (64 from each dataset) - This model is not a MoE model so high sample count not necessary.
- Datasets:
- HuggingFaceH4/ultrachat_200k (
train_sftsplit) - nvidia/Nemotron-Post-Training-Dataset-v2 (
chatsplit)
- HuggingFaceH4/ultrachat_200k (
- Max sequence length: 4096
Creation
This model was created using VLLM's LLM Compressor. The quantization ignores lm_head, re:.*linear_attn.*, and re:model\.visual\..* layers, preserving the vision encoder and linear attention modules at full precision. The model was loaded with AutoModelForImageTextToText.
- Downloads last month
- 2,164
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support
Model tree for Sehyo/Qwen3.5-27B-NVFP4
Base model
Qwen/Qwen3.5-27B