Daily Model Scout Report β 2026-04-02
Daily Model Scout Report β 2026-04-02
Org: Denali-AI | Task: Garment attribute classification (9-field JSON extraction)
Current best: qwen3-vl-8b-sft+grpo @ 0.9131 weighted overall (3,500-sample hard eval)
π₯ New Releases This Week (March 26 β April 2, 2026)
1. Google Gemma 4 β Released April 2, 2026 (TODAY)
| Model | Params (effective) | Context | License | Vision |
|---|---|---|---|---|
| gemma-4-E2B-it | 2.3B | 128K | Apache 2.0 | β + Audio |
| gemma-4-E4B-it | 4.5B | 128K | Apache 2.0 | β + Audio |
| gemma-4-31B-it | 31B | 256K | Apache 2.0 | β |
| gemma-4-26B-A4B-it | 4B active (26B total MoE) | 256K | Apache 2.0 | β |
Architecture: Dense/MoE with hybrid sliding-window + global attention, Per-Layer Embeddings (PLE), learned 2D vision positions, variable aspect ratios (70β1120 image tokens)
Vision benchmarks: MMMU Pro 76.9% (31B), MATH-Vision 85.6% (31B), E4B scores MMMU Pro 52.6%
Why it matters for us:
- gemma-4-E4B (4.5B effective) is a strong candidate to replace/complement our Qwen3-VL-2B models β more capable yet still small
- gemma-4-E2B (2.3B) could be an edge deployment candidate with native vision
- gemma-4-26B-A4B (4B active MoE) β extremely interesting for production: MoE efficiency with only 4B active params but 26B total knowledge. Fits easily on our RTX PRO 6000
- Apache 2.0 license, Unsloth fine-tuning support confirmed, base models available for SFT
- Variable image token budgets could help with inference speed tuning
Relevance: π΄ HIGH β Benchmark immediately. The E4B and 26B-A4B models are prime fine-tuning candidates.
2. Qwen3.5-Omni β Released March 30, 2026
| Variant | Modalities | Context | License |
|---|---|---|---|
| Qwen3.5-Omni-Plus | Text + Image + Audio + Video | 256K | Closed |
| Qwen3.5-Omni-Flash | Text + Image + Audio + Video | 256K | Closed |
| Qwen3.5-Omni-Light | Text + Image + Audio + Video | 256K | Closed |
Architecture: Native Thinker-Talker multimodal, unified text/audio/video processing, 10+ hrs audio, 400+ sec 720p video
Why it matters for us:
- SOTA on 215 audio and audio-visual benchmarks
- However, CLOSED SOURCE β breaks Alibaba's open-source streak
- Cannot fine-tune, cannot self-host without API costs
- We already have strong Qwen3.5-VL results with open weights
Relevance: π‘ MEDIUM β Monitor for open-weight release. Not actionable for fine-tuning today.
π Recently Released Models Worth Noting (Last 30 Days)
3. Qwen3.5 Base VLM Family β Released February 16, 2026
We already have these integrated and benchmarked. Current standings on our hard eval:
- qwen35-2b-base: 0.8437 weighted overall
- qwen35-2b-sft-v7: 0.6369 (SFT degraded β format issues)
- qwen35-2b-sft-grpo-gtpo-v8: 0.6535
These Qwen3.5 models underperform our Qwen3-VL-8B fine-tunes. The Qwen3.5 NVFP4 quantized models are broken (scoring ~0.43, similar to random baseline).
Relevance: π’ LOW β Already benchmarked. Qwen3.5 underperforms Qwen3-VL for our task.
4. InternVL3.5 β Released August 2025
| Model | Params | Vision Encoder | Language Model | License |
|---|---|---|---|---|
| InternVL3.5-2B | 2.3B | InternViT-300M | Qwen3-2B | Apache 2.0 |
| InternVL3.5-4B | 4.7B | InternViT-300M | Qwen3-4B | Apache 2.0 |
| InternVL3.5-8B | 8.5B | InternViT-300M | Qwen3-8B | Apache 2.0 |
Key improvement: +16% reasoning gain and 4.05x inference speedup over InternVL3. Cascade RL training (MPO + GSPO).
Why it matters: Our InternVL3-2B models scored only 0.72 weighted overall β significantly worse than Qwen3-VL. InternVL3.5's +16% reasoning improvement might close that gap. The 4B variant is new territory we haven't tested.
Relevance: π‘ MEDIUM β InternVL has underperformed for us, but the 3.5 generation's improvements are substantial enough to re-evaluate the 4B and 8B variants.
5. MiniCPM-V 4.5 β Released September 2025
| Model | Params | Architecture | License |
|---|---|---|---|
| MiniCPM-V-4.5 | 8.7B | Qwen3-8B + SigLIP2-400M | Apache 2.0 |
Key features: 96x video token compression, surpasses GPT-4o on OCR tasks, 4x fewer visual tokens than competitors
Why it matters: Built on Qwen3-8B (same base as our best model), but with SigLIP2 vision encoder and innovative 3D-Resampler. The reduced visual token count could significantly improve inference speed.
Relevance: π‘ MEDIUM β Interesting architecture but not a new release. Could be worth a base model comparison.
π Current Leaderboard (Denali-AI Hard Eval, 3,500 samples)
| Rank | Model | Weighted Overall | Notes |
|---|---|---|---|
| 1 | qwen3-vl-8b-sft+grpo | 0.9131 | Best overall |
| 2 | qwen3-vl-8b-sft-grpo-nvfp4 | 0.8945 | Best quantized 8B |
| 3 | qwen3-vl-2b-sft-grpo-v9 | 0.8948 | Best small model |
| 4 | qwen3-vl-8b-instruct-base | 0.8751 | No fine-tuning |
| 5 | qwen35-2b-base | 0.8437 | Qwen3.5 zero-shot |
| 6 | qwen3-vl-8b-instruct-nvfp4 | 0.8716 | Quantized base |
| 7 | qwen3-vl-2b-sft-grpo-v9-nvfp4 | 0.8422 | Quantized small |
| 8 | qwen3-vl-2b-instruct-base | 0.7642 | 2B zero-shot |
| 9 | internvl3-2b-grpo-gtpo-full | 0.7271 | InternVL3 best |
| 10 | moondream2-base | 0.6979 | Tiny model baseline |
π― Recommended Actions
IMMEDIATE: Benchmark Gemma 4 E4B and 26B-A4B base models on our hard eval set. These are brand new (released today), Apache 2.0, and architecturally novel. The E4B at 4.5B params and 26B-A4B with only 4B active params are sweet spots for our use case.
THIS WEEK: Evaluate Gemma 4 E2B (2.3B) as a potential edge/mobile model replacement for our Qwen3-VL-2B pipeline.
IF Gemma 4 base scores > 0.80: Initiate SFT fine-tuning pipeline adaptation for Gemma 4 architecture. Check Unsloth/TRL support (confirmed available).
MONITOR: Qwen3.5-Omni for open-weight release. If weights drop, benchmark immediately.
OPTIONAL: Re-evaluate InternVL3.5-4B as a mid-size option if Gemma 4 doesn't pan out.
Report generated by Claude Code Model Scout β Denali-AI
Next report: 2026-04-03