mlx-community/Qwen3.5-4B-OptiQ-4bit
Text Generation β’ 0.8B β’ Updated β’ 5.01k β’ 14
Any plans on releasing flashhead for qwen3.5 models?
pip install flash-head
vllm serve embedl/Qwen3-1.7B-FlashHead-W4A16vllm.general_plugins entry point. No source patches, no custom imports.vllm bench latency --model embedl/Qwen3-1.7B-FlashHead-W4A16 --batch-size 1
# Baseline comparison
FLASHHEAD_ENABLED=0 vllm bench latency --model embedl/Qwen3-1.7B-FlashHead-W4A16 --batch-size 1