Daily Model Scout Report — 2026-05-04
#21
by msudharsanan - opened
Daily Model Scout Report — 2026-05-04
Window: 2026-04-27 → 2026-05-04 (last 7 days). Filtered for new VLM base/instruct releases (excluding GGUF quants, abliterated derivatives, reranker/embedding heads, and unrelated text-only LLMs).
Current Denali-AI baseline (3,500-sample hard eval, _overall.weighted_score)
| Model | Weighted score |
|---|---|
| qwen3-vl-8b-sft+grpo | 0.9131 (best overall) |
| qwen3-vl-2b-sft-grpo-v9 | 0.8948 (best small) |
| qwen3-vl-8b-sft-grpo-nvfp4 | 0.8945 (best quantized) |
| qwen3-vl-8b-instruct-base | 0.8751 |
| qwen35-2b-base | 0.8437 |
Note:
granite4-vision-sftshowsweighted_score=1.0144ineval_all_results.json— almost certainly an artifact (>1.0 cap) and should be re-verified before use as a comparison anchor.
High Relevance — benchmark immediately
1. ibm-granite/granite-vision-4.1-4b
- Released: 2026-04-29 — 7,690 dl / 54 likes
- Size: ~4B params (2-shard safetensors),
granite4_visionarch withcustom_code - Link: https://huggingface.co/ibm-granite/granite-vision-4.1-4b
- Why it matters: Direct successor to whatever Granite-4 base our
Granite4-Vision-SFTwas fine-tuned from. Granite-4.1 is the same week as the new Granite-4.1-3b/8b/30b language line — sharing the new tokenizer + improved vision tower. Worth re-running our SFT recipe on this base. - Risk:
custom_codepath; vLLM compat already an issue for our existing granite4 SFT artifacts (granite4-vision-sft-vllmand-deepstackcollapse to ~46% baseline in the 100-sample eval — the lift only appears in the HF-transformers path). Confirm vLLM/PeakBench serving works before training. - Action: Register in PeakBench, run base eval on 3.5k-hard via
peakbench_start_benchmark. If lift is meaningful, queue SFT.
2. nvidia/Cosmos-Reason2-8B
- Released: 2026-04-30 — 221,405 dl / 175 likes
- Size: 8B (4-shard safetensors),
qwen3_vlarch — fine-tune ofQwen/Qwen3-VL-8B-Instruct - Link: https://huggingface.co/nvidia/Cosmos-Reason2-8B
- Why it matters: Same exact architecture as our champion
qwen3-vl-8b-sft+grpo(so PeakBench/vLLM path is already proven), but with NVIDIA's reasoning post-training. Could give us a stronger starting point for hard-sample garments where chain-of-thought helps disambiguate (closure type, fine pattern). Drop-in replacement candidate for the 8B base. - Action: Register, run 3.5k-hard base eval. If
_overall.weighted_score≥ 0.88 zero-shot (vs 0.8751 for plain Qwen3-VL-8B-Instruct), it's our new SFT base.
3. ibm-granite/granite-4.0-3b-vision
- Released: 2026-04-30 — 162,908 dl / 109 likes
- Size: ~3B (2-shard safetensors + adapter shard), same
granite4_visionarch - Link: https://huggingface.co/ibm-granite/granite-4.0-3b-vision
- Why it matters: Smaller granite variant (3B vs 4B). If 4.1-4b isn't enough of a lift, the 3B would be the small-model contender against
qwen3-vl-2b-sft-grpo-v9(0.8948). Same vLLM caveat as #1. - Action: Bench in same pass as #1 — both share the load path.
Medium Relevance — worth watching
4. nvidia/Cosmos-Reason2-32B
- Released: 2026-04-30 — 788 dl / 7 likes
- Size: 32B (13 shards), Qwen3-VL-32B-Instruct fine-tune
- Link: https://huggingface.co/nvidia/Cosmos-Reason2-32B
- Why: Inference-only on RTX PRO 6000 98GB (BF16 won't fit, FP8/NVFP4 will). Useful as a quality ceiling reference, not a fine-tune target. Skip unless we need a teacher for distillation.
5. nvidia/Cosmos-Reason2-2B
- Released: 2026-04-30 — 144,674 dl / 70 likes
- Link: https://huggingface.co/nvidia/Cosmos-Reason2-2B
- Why: Already trained internally (job #740, sellability run). Pure-garment eval on this base hasn't been recorded in
eval_all_results.jsonyet — worth a one-off PeakBench run for completeness.
6. lightonai/LightOnOCR-2-1B
- Released: 2026-05-04 — 784,707 dl / 677 likes (highest-traction VLM of the week)
- Size: 1B, single safetensors,
mistral3arch - Link: https://huggingface.co/lightonai/LightOnOCR-2-1B
- Why: Tagged
ocr / document-understanding / pdf / tables / forms— primarily a document-OCR model. Garment attribute classification ≠ OCR, so direct fit is weak. However, the brand field (currently 70% on our champion) could benefit from explicit OCR-tuned features, and at 1B it's our smallest viable fine-tune candidate. Lower priority unless we want to tackle brand-recognition specifically.
7. sunjuice/Molmo2-8B
- Released: 2026-05-04 — 61 dl / 0 likes
- Size: 8B (8 shards),
molmo2arch (OLMo backbone), uses officialallenai/Molmo2-*datasets - Link: https://huggingface.co/sunjuice/Molmo2-8B
- Why: Community port; allenai itself only released
Molmo2-O-7BandMolmo2-4Bso far (Jan 2026). Architecture is novel for us — Molmo's pointing/grounding pretraining could help defect localization. Wait for an official 8B from allenai before investing.
8. hybridfree/HY-Embodied-0.5
- Released: 2026-05-04 — 13 dl / 0 likes
- Size: 2B,
hunyuan_vl_motarch (Mixture-of-Transformers) - Link: https://huggingface.co/hybridfree/HY-Embodied-0.5
- Why: Brand-new architecture family from Tencent's Hunyuan VL line. Embodied/robotics framing, not classification. Low priority for garments but worth tracking the architecture.
Low Relevance — note and skip
- nvidia/nemotron-ocr-v2 (2026-04-28, 2,547 dl / 172 likes) — pure OCR pipeline, no classification head. https://huggingface.co/nvidia/nemotron-ocr-v2
- TP12123/Qwen3-VL-4B-Instruct — appears to be a re-upload of the existing Qwen3-VL-4B; 0 dl / 0 likes / no signal.
- llmvision/glimpse-v1 — Gemma-3-4B fine-tune for home security; wrong domain.
- FoolDev/janus-27b / FoolDev/janus — GGUF-only community uploads of
qwen3_6arch; not a base for SFT. - ADSKAILab/Zero-To-CAD-Qwen3-VL-2B — image-to-CAD task, irrelevant.
Notable absences
- No new Qwen3.5-VL or Qwen3.6-VL official base/instruct release. Qwen org released Qwen3.6-27B and Qwen3.6-35B-A3B (text) on 2026-04-24 — outside the 7-day window and text-only. Community uploads tagged "Qwen3.5-VL" / "Qwen3.6-VL" are MLX/AWQ quants of unreleased weights ("CRACK" suffixes), not legitimate first-party releases. Continue to monitor.
- No new InternVL, PaliGemma, Phi-Vision, or SmolVLM official releases this week.
- No fashion-/garment-/apparel-specific VLM hits in any search.
Recommended actions
- Register
ibm-granite/granite-vision-4.1-4b,ibm-granite/granite-4.0-3b-vision, andnvidia/Cosmos-Reason2-8Bin PeakBench; queue zero-shot 3.5k-hard benchmarks against our existing prompt set. - If Cosmos-Reason2-8B beats
qwen3-vl-8b-instruct-base(0.8751) zero-shot, it becomes the SFT base for the next 8B run — same recipe, single line change in the train config. - Re-verify the
granite4-vision-sft1.0144 weighted score ineval_all_results.json— that value is impossible under the documented scoring scheme and may be polluting our index ranking.
— Auto-generated scout (Claude Code, /hf-model-scout)