Daily Model Scout Report β€” 2026-05-01

#20
by msudharsanan - opened
Denali Advanced Integration org

Daily Model Scout Report β€” 2026-05-01

VLM landscape for the last 7 days (cutoff 2026-04-24). Several major upstream releases this week. Comparing against our current best (qwen3-vl-8b-sft+grpo @ 0.9131 weighted on the 3.5k-hard set; small-class best qwen3-vl-2b-sft-grpo-v9 @ 0.8948; quantized best qwen3-vl-8b-sft-grpo-nvfp4 @ 0.8945).

πŸ”₯ High relevance β€” benchmark immediately

Google Gemma-4 (released 2026-04-28, Apache 2.0)

First-party VLM family from Google, all image-text-to-text pipeline.

Model Size Notes
google/gemma-4-31B-it 31B dense Flagship β€” 7.4M dl, 2460 likes, fits FP8 on 98 GB
google/gemma-4-26B-A4B-it 26B MoE / 4B active Fast like 2B-class, capacity of 26B
google/gemma-4-E4B-it ~4B (any-to-any) Edge variant, multimodal in/out
google/gemma-4-E2B-it ~2B (any-to-any) Direct size-class match for our 2B small model

Why investigate: Apache-2.0, well-funded base. The 26B-A4B-it MoE in particular promises 2B-class inference speed with 26B-class quality β€” could reset our small-model leaderboard. E2B-it is a direct size match against our qwen3-vl-2b-sft-grpo-v9 (0.8948).

Qwen3.6 (released 2026-04-24, Apache 2.0)

Direct architectural successor to the Qwen3-VL/Qwen3.5-VL line we already use heavily.

Model Size Notes
Qwen/Qwen3.6-35B-A3B 35B MoE / 3B active 2.2M dl, 1543 likes
Qwen/Qwen3.6-35B-A3B-FP8 FP8 quant Fits on 98 GB
Qwen/Qwen3.6-27B 27B dense 906k dl, 1049 likes
Qwen/Qwen3.6-27B-FP8 FP8 quant Fits on 98 GB

Why investigate: MoE 35B-A3B should give Qwen3-VL-8B-class throughput with 35B-class capacity. Our SFT+GRPO+GTPO pipeline is already validated on Qwen3-VL β€” porting to 3.6 is a near-drop-in. Strong candidate to surpass our 0.9131 ceiling.

Qwen3.5 flagship VLMs (released 2026-04-24, Apache 2.0)

Larger Qwen3.5-VL variants that we previously skipped due to size.

Model Size Notes
Qwen/Qwen3.5-122B-A10B-FP8 122B MoE / 10B active, FP8 β‰ˆ61 GB weights β†’ fits 98 GB
Qwen/Qwen3.5-122B-A10B-GPTQ-Int4 122B MoE, Int4 β‰ˆ30 GB β†’ easily fits
Qwen/Qwen3.5-397B-A17B 397B flagship Too large for 98 GB even in Int4

Why investigate: We already eval the 2B and 9B Qwen3.5 sizes; the 122B-A10B FP8/Int4 is the missing scale-up data point. 10B active params β‰ˆ Qwen3-VL-8B inference cost.

⚠️ Medium relevance β€” worth watching

Mistral Medium 3.5 / Small 4 (2026-04-27 to 2026-04-30)

  • mistralai/Mistral-Medium-3.5-128B β€” 128B VLM (Mistral3ForConditionalGeneration, vision tower confirmed in config). License: other (Mistral non-commercial / research) β€” license blocker for production.
  • mistralai/Mistral-Small-4-119B-2603 β€” 119B VLM, Apache 2.0. Likely too large at full precision for our 98 GB GPU; will need community FP8/Int4.

Why medium: Vision-capable and Apache for the Small 4, but unproven on structured JSON extraction and untested in our SFT+GRPO pipeline. Watch for community quants; defer until size/license fit is clear.

IBM Granite 4.0 Vision (released 2026-04-30, Apache 2.0)

Why medium: Direct size competitor to our 2B small-model class. We already have a Granite4-Vision-SFT entry in eval results (weighted_score 1.0144 β€” suspicious / needs validation). The 4.0-3b is a fresh upstream release; worth a clean eval on the 3.5k-hard set before committing to a SFT run.

❌ Low relevance / skip

  • DeepSeek-V4 family (DeepSeek-V4-Pro, -Flash, etc., 2026-04-27) β€” text-generation only, no vision tower.
  • Qwen SAE-Res checkpoints β€” interpretability artifacts, not models.
  • Various community fine-tunes of the above (uncensored / GGUF / MLX) β€” no benefit over upstream for our SFT pipeline.

Recommendation

Order of effort:

  1. Eval base Qwen/Qwen3.6-35B-A3B-FP8, google/gemma-4-26B-A4B-it, google/gemma-4-E2B-it, Qwen/Qwen3.5-122B-A10B-Int4 on the 3.5k-hard set (no fine-tuning, just baseline). Cheap and tells us if any beat our base qwen3-vl-8b-instruct (0.8751).
  2. If any baseline β‰₯0.85, kick off SFT+GRPO using our existing pipeline.
  3. Re-validate Granite4-Vision-SFT 1.0144 score (out-of-range, possibly a scoring bug) before comparing with the new 3.0-3B vision base.

Auto-generated by /hf-model-scout on 2026-05-01.

Sign up or log in to comment