Qwen-Image-Bench — NVFP4 quantization

NVFP4 weight quantization of Qwen/Qwen-Image-Bench (Q-Judger) — a vision-language judge model fine-tuned by the Qwen team for automated evaluation of text-to-image generation quality.

What this is

  • Base model: Qwen/Qwen-Image-Bench (Apache 2.0)
  • Architecture: Qwen3.5 dense hybrid SSM + attention + vision tower (64 layers, 16 attention, 48 SSM, 0 experts, MRoPE)
  • Quantization: NVFP4 (4-bit) on language-model linears. The recipe syntax matches llm-compressor (vLLM project) — see recipe.yaml in the repo for the exact QuantizationModifier config.
  • Vision tower: preserved at higher precision (BF16), per the recipe ignore list
  • Size: ~19.7 GB safetensors
  • Target hardware: NVIDIA Blackwell GB10 (DGX Spark, SM121) with NVFP4 tensor-core support

Quantization recipe

default_stage:
  default_modifiers:
    QuantizationModifier:
      targets: [Linear]
      ignore: ['re:.*lm_head', 're:visual.*', 're:model.visual.*', 're:.*embed_tokens$']
      scheme: NVFP4
      bypass_divisibility_checks: false

Quantized on a RunPod node. The full recipe.yaml is included for reproducibility.

Intended use

Drop-in replacement for the BF16 base model on vLLM with NVFP4 support. Tested as the judge step of an image-generation evaluation pipeline (5-dim verdicts: overall_quality, prompt_match, aesthetic_appeal, lora_activation, confidence).

Known compatibility

  • vLLM: works (production validated)
  • Atlas (avarok/atlas-gb10): does NOT load as of 2026-06-14 due to missing (dense + vision) weight loader path in Atlas's qwen35_dense.rs factory branch. See [TODO: link to filed issue] for details.

License

Apache 2.0 — inherited from the upstream Qwen/Qwen-Image-Bench.

Citation

If you use this quant, please cite the original Qwen-Image-Bench paper:

@misc{li2026qwenimagebenchgenerationcreationtexttoimage,
      title={Qwen-Image-Bench: From Generation to Creation in Text-to-Image Evaluation},
      author={Niantong Li and Guangzheng Hu and Weixu Qiao and Ying Ba and Qichen Hong and Shijun Shen and Jinlin Wang and Fan Zhou and Jianye Kang and Xin Shang and Ziyi He and Wei Wang and Dalin Li and Jiahao Li and Jie Zhang and Kaiyuan Gao and Kun Yan and Lihan Jiang and Ningyuan Tang and Shengming Yin and Tianhe Wu and Xiao Xu and Xiaoyue Chen and Yuxiang Chen and Yan Shu and Yanran Zhang and Yilei Chen and Yixian Xu and Zekai Zhang and Zhendong Wang and Zihao Liu and Zikai Zhou and Hongzhu Shi and Yi Wang and Bing Zhao and Hu Wei and Lin Qu and Chenfei Wu},
      year={2026},
      eprint={2605.28091},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2605.28091},
}
Downloads last month
20
Safetensors
Model size
17B params
Tensor type
F32
·
BF16
·
F8_E4M3
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for flukethoughts/Qwen-Image-Bench-NVFP4

Base model

Qwen/Qwen3.6-27B
Quantized
(3)
this model

Paper for flukethoughts/Qwen-Image-Bench-NVFP4