Daily Model Scout Report β 2026-05-01
Daily Model Scout Report β 2026-05-01
VLM landscape for the last 7 days (cutoff 2026-04-24). Several major upstream releases this week. Comparing against our current best (qwen3-vl-8b-sft+grpo @ 0.9131 weighted on the 3.5k-hard set; small-class best qwen3-vl-2b-sft-grpo-v9 @ 0.8948; quantized best qwen3-vl-8b-sft-grpo-nvfp4 @ 0.8945).
π₯ High relevance β benchmark immediately
Google Gemma-4 (released 2026-04-28, Apache 2.0)
First-party VLM family from Google, all image-text-to-text pipeline.
| Model | Size | Notes |
|---|---|---|
google/gemma-4-31B-it |
31B dense | Flagship β 7.4M dl, 2460 likes, fits FP8 on 98 GB |
google/gemma-4-26B-A4B-it |
26B MoE / 4B active | Fast like 2B-class, capacity of 26B |
google/gemma-4-E4B-it |
~4B (any-to-any) | Edge variant, multimodal in/out |
google/gemma-4-E2B-it |
~2B (any-to-any) | Direct size-class match for our 2B small model |
Why investigate: Apache-2.0, well-funded base. The 26B-A4B-it MoE in particular promises 2B-class inference speed with 26B-class quality β could reset our small-model leaderboard. E2B-it is a direct size match against our qwen3-vl-2b-sft-grpo-v9 (0.8948).
Qwen3.6 (released 2026-04-24, Apache 2.0)
Direct architectural successor to the Qwen3-VL/Qwen3.5-VL line we already use heavily.
| Model | Size | Notes |
|---|---|---|
Qwen/Qwen3.6-35B-A3B |
35B MoE / 3B active | 2.2M dl, 1543 likes |
Qwen/Qwen3.6-35B-A3B-FP8 |
FP8 quant | Fits on 98 GB |
Qwen/Qwen3.6-27B |
27B dense | 906k dl, 1049 likes |
Qwen/Qwen3.6-27B-FP8 |
FP8 quant | Fits on 98 GB |
Why investigate: MoE 35B-A3B should give Qwen3-VL-8B-class throughput with 35B-class capacity. Our SFT+GRPO+GTPO pipeline is already validated on Qwen3-VL β porting to 3.6 is a near-drop-in. Strong candidate to surpass our 0.9131 ceiling.
Qwen3.5 flagship VLMs (released 2026-04-24, Apache 2.0)
Larger Qwen3.5-VL variants that we previously skipped due to size.
| Model | Size | Notes |
|---|---|---|
Qwen/Qwen3.5-122B-A10B-FP8 |
122B MoE / 10B active, FP8 | β61 GB weights β fits 98 GB |
Qwen/Qwen3.5-122B-A10B-GPTQ-Int4 |
122B MoE, Int4 | β30 GB β easily fits |
Qwen/Qwen3.5-397B-A17B |
397B flagship | Too large for 98 GB even in Int4 |
Why investigate: We already eval the 2B and 9B Qwen3.5 sizes; the 122B-A10B FP8/Int4 is the missing scale-up data point. 10B active params β Qwen3-VL-8B inference cost.
β οΈ Medium relevance β worth watching
Mistral Medium 3.5 / Small 4 (2026-04-27 to 2026-04-30)
mistralai/Mistral-Medium-3.5-128Bβ 128B VLM (Mistral3ForConditionalGeneration, vision tower confirmed in config). License:other(Mistral non-commercial / research) β license blocker for production.mistralai/Mistral-Small-4-119B-2603β 119B VLM, Apache 2.0. Likely too large at full precision for our 98 GB GPU; will need community FP8/Int4.
Why medium: Vision-capable and Apache for the Small 4, but unproven on structured JSON extraction and untested in our SFT+GRPO pipeline. Watch for community quants; defer until size/license fit is clear.
IBM Granite 4.0 Vision (released 2026-04-30, Apache 2.0)
ibm-granite/granite-4.0-3b-visionβ 3B VLM (149k dl, 108 likes)
Why medium: Direct size competitor to our 2B small-model class. We already have a Granite4-Vision-SFT entry in eval results (weighted_score 1.0144 β suspicious / needs validation). The 4.0-3b is a fresh upstream release; worth a clean eval on the 3.5k-hard set before committing to a SFT run.
β Low relevance / skip
- DeepSeek-V4 family (
DeepSeek-V4-Pro,-Flash, etc., 2026-04-27) βtext-generationonly, no vision tower. - Qwen SAE-Res checkpoints β interpretability artifacts, not models.
- Various community fine-tunes of the above (uncensored / GGUF / MLX) β no benefit over upstream for our SFT pipeline.
Recommendation
Order of effort:
- Eval base
Qwen/Qwen3.6-35B-A3B-FP8,google/gemma-4-26B-A4B-it,google/gemma-4-E2B-it,Qwen/Qwen3.5-122B-A10B-Int4on the 3.5k-hard set (no fine-tuning, just baseline). Cheap and tells us if any beat our baseqwen3-vl-8b-instruct(0.8751). - If any baseline β₯0.85, kick off SFT+GRPO using our existing pipeline.
- Re-validate
Granite4-Vision-SFT1.0144 score (out-of-range, possibly a scoring bug) before comparing with the new 3.0-3B vision base.
Auto-generated by /hf-model-scout on 2026-05-01.