Daily Model Scout Report — 2026-05-08

#25
by msudharsanan - opened
Denali Advanced Integration org

Daily Model Scout Report — 2026-05-08

Window: 2026-05-01 → 2026-05-08
Baseline (3.5k-hard weighted_score): granite4-vision-sft 1.0144 · qwen3-vl-8b-sft+grpo 0.9131 · qwen3-vl-2b-sft-grpo-v9 0.8948 · qwen3-vl-8b-sft-grpo-nvfp4 0.8945

TL;DR

Quiet week — no new base VLM architectures from the major orgs (Qwen, Google, OpenGVLab, NVIDIA, OpenBMB, Microsoft, Meta, DeepSeek, zai-org/THUDM, LiquidAI, Apple). The two items worth our time are:

  1. openbmb/MiniCPM-V-4.5-GPTQ — first official 4-bit quant of MiniCPM-V 4.5; we've never benchmarked the MiniCPM-V family on our 3.5k-hard set.
  2. vrfai/Cosmos-Reason2-8B-NVFP4 — third-party NVFP4 of the Qwen3-VL-derived NVIDIA reasoning VLM; LLM W4A4, vision tower kept BF16. Useful as a quant-recipe reference for our own Qwen3-VL-8B NVFP4 work.

Everything else is community fine-tunes, GGUF/MLX/AWQ ports of already-known bases, or text-only models.

New / notable models (last 7 days)

Medium — worth queuing

openbmb/MiniCPM-V-4.5-GPTQ — 2026-05-08

  • 8.7B (Qwen3-8B backbone), W4A16 GPTQ-INT4, Apache 2.0
  • Base model (bf16, Feb 2026) reports OpenCompass 77.0, OCRBench leading, MMHal-Bench > GPT-4o; we have no MiniCPM-V evaluations on our 3.5k-hard set yet
  • Comfortable fit on RTX PRO 6000 98 GB; fast to register in PeakBench since vLLM/transformers paths are well-trodden for MiniCPM
  • Action: register a serve-script and queue an SFT run on apparel-capture-8k-train if base 4-bit eval lands above qwen3-vl-2b-sft-grpo-v9 (0.8948)

vrfai/Cosmos-Reason2-8B-NVFP4 — 2026-05-05

  • Source: nvidia/Cosmos-Reason2-8B (Qwen3-VL family, ~17 GB → 7.1 GB)
  • Quant layout: LLM layers W4A4 NVFP4, vision encoder + DeepStack merger + lm_head kept BF16, compressed-tensors format → native vLLM ≥ 0.19
  • Requires Blackwell SM120+; matches our RTX PRO 6000 fleet
  • Direct relevance: this is the same recipe pattern we documented for GLM-4.6V (vision tower stays bf16 because W4A4 breaks Glm4v vision tower); a working NVFP4 reference for a Qwen3-VL-class model is useful for replicating on our own qwen3-vl-8b-sft+grpo
  • Action: low-cost benchmark — pull, register, run on 3.5k-hard for a base reading; compare inference throughput against our qwen3-vl-8b-sft-grpo-nvfp4

Low — awareness only

  • inclusionAI/LLaDA2.0-Uni-FP8 (2026-05-06) — 16B MoE diffusion-based unified MM (image gen + understanding), Apache 2.0, FP8 32.5 GB. Wrong fit for pure JSON classification, but architecture is novel; ignore unless we want to explore diffusion-VLM for synthetic-data generation.
  • allenai/MolmoAct2 family (2026-05-04) — vision-language-action robotics models (LIBERO, DROID, BimanualYAM heads). Not relevant to garment classification.
  • ibm-granite/granite-switch-4.1-{3b,8b,30b}-preview (2026-05-01) — text-only, RAG/safety/explainability adapters via control tokens. Not a VLM, but a signal that IBM is still actively iterating the Granite 4.1 family — relevant context given granite4-vision-sft is our top scorer at 1.0144 (worth watching for a granite-vision-4.2 or further granite-vision-4.1-* follow-ups).

What did NOT show up this week

No new releases or signals for: Qwen3.6-VL, Qwen3.7, InternVL3.6/4, Florence-3/4, PaliGemma3, SmolVLM3, Idefics4, LLaVA-NeXT-2, OneVision-2, Phi-5-Vision, DeepSeek-VL3, CogVLM3, GLM-4.7V, Pixtral-2, Apple Ferret-3, Reka Edge-2.

Recommended actions

  1. Benchmark MiniCPM-V-4.5-GPTQ on the 3.5k-hard set (base eval first; SFT only if it clears qwen3-vl-2b-sft-grpo-v9 at 0.8948).
  2. Pull Cosmos-Reason2-8B-NVFP4 quant config as a reference for our Qwen3-VL-8B-NVFP4 retraining; quick base eval to confirm vision-tower-bf16 + LLM-W4A4 doesn't tank OCR/text-in-image fields (relevant to the GLM-4.6V quant constraints we already documented).
  3. Re-scan in 7 days — if nothing significant ships, drop next week's report to a one-liner.

Generated by /hf-model-scout · automation: msudharsanan

Sign up or log in to comment