Daily Model Scout Report β€” 2026-04-07

#6
by msudharsanan - opened
Denali Advanced Integration org

Daily Model Scout Report β€” 2026-04-07

Scope: New / recently-active VLMs on HuggingFace (last ~7 days), evaluated for our 9-field garment JSON extraction task. Hardware: RTX PRO 6000 (98 GB).

Current bests on the 3,500-sample hard eval (_overall.weighted_score):

  • qwen3-vl-8b-sft+grpo β€” 0.9131 (best overall)
  • qwen3-vl-8b-sft-grpo-nvfp4 β€” 0.8945 (best quantized)
  • qwen3-vl-2b-sft-grpo-v9 β€” 0.8948 (best small)
  • qwen35-2b-base β€” 0.8437 (best Qwen3.5 base)

HIGH relevance β€” benchmark this week

1. Qwen3.5-VL family

  • HF: https://huggingface.co/collections/Qwen/qwen3-vl
  • Sizes: 0.8B, 2B, 4B, 9B dense; 35B-A3B and 122B-A10B MoE; 397B-A17B flagship.
  • Architecture: Gated Delta Networks + sparse MoE, early-fusion vision tokens, 262K native context, 201 languages.
  • Why: Most likely candidate to beat 0.9131 with our existing SFT+GRPO recipe. 9B dense fits trivially; 35B-A3B MoE activates only 3B/token and also fits.

2. InternVL3.5-8B

MEDIUM

3. IBM Granite 4.0 3B Vision

  • HF: https://huggingface.co/ibm-granite/granite-4.0-3b-vision
  • 3.5B base + 0.5B LoRA adapter. Apache-2.0.
  • DeepStack injection; trained for table / chart / KVP extraction β€” closer to our 9-field JSON than generic VQA. 85.5% in-domain zero-shot exact-match; #3 in 2–4B class on VAREX.
  • Risk: document-centric pretraining may not transfer to natural garment photography. Cheap to probe.

4. GLM-4.6V-Flash (9B)

5. MiniCPM-V 4.5 (8B)

  • HF: https://huggingface.co/openbmb/MiniCPM-V-4_5
  • Qwen3-8B + SigLIP2-400M encoder.
  • Why: SigLIP2 is one of the strongest open encoders for fine-grained color/pattern. Same LLM as InternVL3.5-8B β†’ clean encoder ablation.

LOW

  • Molmo2 β€” listed in 2026 OS-VLM survey, no fresh HF checkpoints in the last 7 days. Watchlist.
  • Fashion / clothing fine-tunes β€” none new this week. FashionCLIP 2.0 / EMaghakyan/fashion-clip are CLIP-class embedders only β€” possible re-ranker / auxiliary loss.

Recommended actions

  1. Zero-shot benchmark on 3.5k-hard (HIGH): Qwen/Qwen3.5-VL-2B, Qwen/Qwen3.5-VL-9B, OpenGVLab/InternVL3_5-8B.
  2. Zero-shot benchmark (MEDIUM): zai-org/GLM-4.6V-Flash, openbmb/MiniCPM-V-4_5, ibm-granite/granite-4.0-3b-vision.
  3. Any model with zero-shot β‰₯ ~0.78 weighted β†’ standard pipeline: SFT β†’ GRPO β†’ eval-3.5k β†’ update JSON/wiki β†’ upload to HF with full model card + charts.
  4. Watchlist: Molmo2, Kimi-VL successors, additional Qwen3.5-VL-MoE checkpoints.

Sign up or log in to comment