Daily Model Scout Report — 2026-05-08
Daily Model Scout Report — 2026-05-08
Window: 2026-05-01 → 2026-05-08
Baseline (3.5k-hard weighted_score): granite4-vision-sft 1.0144 · qwen3-vl-8b-sft+grpo 0.9131 · qwen3-vl-2b-sft-grpo-v9 0.8948 · qwen3-vl-8b-sft-grpo-nvfp4 0.8945
TL;DR
Quiet week — no new base VLM architectures from the major orgs (Qwen, Google, OpenGVLab, NVIDIA, OpenBMB, Microsoft, Meta, DeepSeek, zai-org/THUDM, LiquidAI, Apple). The two items worth our time are:
- openbmb/MiniCPM-V-4.5-GPTQ — first official 4-bit quant of MiniCPM-V 4.5; we've never benchmarked the MiniCPM-V family on our 3.5k-hard set.
- vrfai/Cosmos-Reason2-8B-NVFP4 — third-party NVFP4 of the Qwen3-VL-derived NVIDIA reasoning VLM; LLM W4A4, vision tower kept BF16. Useful as a quant-recipe reference for our own Qwen3-VL-8B NVFP4 work.
Everything else is community fine-tunes, GGUF/MLX/AWQ ports of already-known bases, or text-only models.
New / notable models (last 7 days)
Medium — worth queuing
openbmb/MiniCPM-V-4.5-GPTQ — 2026-05-08
- 8.7B (Qwen3-8B backbone), W4A16 GPTQ-INT4, Apache 2.0
- Base model (bf16, Feb 2026) reports OpenCompass 77.0, OCRBench leading, MMHal-Bench > GPT-4o; we have no MiniCPM-V evaluations on our 3.5k-hard set yet
- Comfortable fit on RTX PRO 6000 98 GB; fast to register in PeakBench since vLLM/transformers paths are well-trodden for MiniCPM
- Action: register a serve-script and queue an SFT run on
apparel-capture-8k-trainif base 4-bit eval lands above qwen3-vl-2b-sft-grpo-v9 (0.8948)
vrfai/Cosmos-Reason2-8B-NVFP4 — 2026-05-05
- Source: nvidia/Cosmos-Reason2-8B (Qwen3-VL family, ~17 GB → 7.1 GB)
- Quant layout: LLM layers W4A4 NVFP4, vision encoder + DeepStack merger +
lm_headkept BF16,compressed-tensorsformat → native vLLM ≥ 0.19 - Requires Blackwell SM120+; matches our RTX PRO 6000 fleet
- Direct relevance: this is the same recipe pattern we documented for GLM-4.6V (vision tower stays bf16 because W4A4 breaks Glm4v vision tower); a working NVFP4 reference for a Qwen3-VL-class model is useful for replicating on our own qwen3-vl-8b-sft+grpo
- Action: low-cost benchmark — pull, register, run on 3.5k-hard for a base reading; compare inference throughput against our qwen3-vl-8b-sft-grpo-nvfp4
Low — awareness only
- inclusionAI/LLaDA2.0-Uni-FP8 (2026-05-06) — 16B MoE diffusion-based unified MM (image gen + understanding), Apache 2.0, FP8 32.5 GB. Wrong fit for pure JSON classification, but architecture is novel; ignore unless we want to explore diffusion-VLM for synthetic-data generation.
- allenai/MolmoAct2 family (2026-05-04) — vision-language-action robotics models (LIBERO, DROID, BimanualYAM heads). Not relevant to garment classification.
- ibm-granite/granite-switch-4.1-{3b,8b,30b}-preview (2026-05-01) — text-only, RAG/safety/explainability adapters via control tokens. Not a VLM, but a signal that IBM is still actively iterating the Granite 4.1 family — relevant context given
granite4-vision-sftis our top scorer at 1.0144 (worth watching for agranite-vision-4.2or furthergranite-vision-4.1-*follow-ups).
What did NOT show up this week
No new releases or signals for: Qwen3.6-VL, Qwen3.7, InternVL3.6/4, Florence-3/4, PaliGemma3, SmolVLM3, Idefics4, LLaVA-NeXT-2, OneVision-2, Phi-5-Vision, DeepSeek-VL3, CogVLM3, GLM-4.7V, Pixtral-2, Apple Ferret-3, Reka Edge-2.
Recommended actions
- Benchmark MiniCPM-V-4.5-GPTQ on the 3.5k-hard set (base eval first; SFT only if it clears qwen3-vl-2b-sft-grpo-v9 at 0.8948).
- Pull Cosmos-Reason2-8B-NVFP4 quant config as a reference for our Qwen3-VL-8B-NVFP4 retraining; quick base eval to confirm vision-tower-bf16 + LLM-W4A4 doesn't tank OCR/text-in-image fields (relevant to the GLM-4.6V quant constraints we already documented).
- Re-scan in 7 days — if nothing significant ships, drop next week's report to a one-liner.
Generated by /hf-model-scout · automation: msudharsanan