This is Qwen/Qwen3.5-27B quantized with llm-compressor to NVFP4. The model is compatible with vLLM (tested: v0.16.1rc1). Tested with an H200. Currently under evaluation.

Instructions

uv pip install vllm --torch-backend=auto --extra-index-url https://wheels.vllm.ai/nightly
uv pip install git+https://github.com/huggingface/transformers.git
vllm serve [this model ID]  --max-model-len 262144 --reasoning-parser qwen3

Acknowledgments

Thank you Verda for providing the needed compute. I used their H200s. Verda is a European, AI-focused cloud and GPU infrastructure provider with sovereignty, sustainability, data privacy, and performance at its core. Check them out if interested.

Downloads last month
53
Safetensors
Model size
19B params
Tensor type
F32
·
BF16
·
F8_E4M3
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kaitchup/Qwen3.5-27B-NVFP4

Base model

Qwen/Qwen3.5-27B
Quantized
(52)
this model

Collection including kaitchup/Qwen3.5-27B-NVFP4