Daily Model Scout Report β 2026-04-03
#4
by msudharsanan - opened
Daily Model Scout Report β 2026-04-03
Current Denali-AI Baselines
| Model | Weighted Score | Notes |
|---|---|---|
| qwen3-vl-8b-sft+grpo | 0.9131 | Best overall |
| qwen3-vl-8b-sft-grpo-nvfp4 | 0.8945 | Best quantized |
| qwen3-vl-2b-sft-grpo-v9 | 0.8948 | Best small model |
| qwen35-2b-base | 0.8437 | Best Qwen3.5 base |
NEW MODELS (Last 7 Days: March 27 - April 3, 2026)
1. Google Gemma 4 (Released April 2, 2026) β HIGH RELEVANCE
- Sizes: E2B, E4B, 26B-A4B (MoE), 31B (Dense)
- HuggingFace: google/gemma-4-E2B, google/gemma-4-E4B, google/gemma-4-26B-A4B, google/gemma-4-31B
- Architecture: Native multimodal (text + image + video + audio on E2B/E4B), Apache 2.0 license
- Key features: OCR (multilingual), document/PDF parsing, chart comprehension, object detection, pointing. Variable aspect ratio. Context: 128K-256K.
- Why relevant: E2B/E4B are extremely small and natively multimodal β could replace our 0.8B-2B tier. 26B-A4B (only 4B active) delivers large model quality at small model cost. Strong OCR aligns with garment label reading.
- Relevance: HIGH β Must evaluate E4B and 26B-A4B immediately
2. IBM Granite 4.0 3B Vision (Released April 1, 2026) β MEDIUM RELEVANCE
- Size: ~3.5B base + 0.5B LoRA adapter
- HuggingFace: ibm-granite/granite-4.0-3b-vision
- Architecture: LoRA adapter on Granite 4.0 Micro, specialized for document extraction
- Key features: Chart/table extraction, semantic key-value pair extraction. 85.5% exact-match accuracy zero-shot.
- Why relevant: KVP extraction directly analogous to garment JSON extraction task.
- Relevance: MEDIUM β Worth a quick eval
3. Z.ai GLM-5V-Turbo (Released April 1, 2026) β LOW
- API-only, coding/agentic focused. Not relevant for classification.
4. Qwen 3.6 Plus Preview (Released March 31, 2026) β MEDIUM (WATCH)
- API-only preview, 1M context, strong reasoning. Watch for open weights.
5. Qwen3.5-Omni (Released March 30, 2026) β LOW
- Closed source. Cannot fine-tune.
RECENT MODELS STILL WORTH EVALUATING
6. Phi-4-Reasoning-Vision-15B (March 4, 2026) β MEDIUM
- HuggingFace: microsoft/Phi-4-reasoning-vision-15B
- 15B params, SigLIP-2 vision encoder, strong visual reasoning
7. Keye-VL-1.5-8B (Recent) β MEDIUM
- HuggingFace: Kwai-Keye/Keye-VL-1_5-8B
- 8B params, RL-trained, strong image comprehension. Competitor to Qwen3-VL-8B.
RECOMMENDATIONS
Immediate (This Week):
- Evaluate Gemma 4 E4B and 26B-A4B as base models for garment classification
- Evaluate IBM Granite 4.0 3B Vision on garment JSON extraction
Watch List:
3. Qwen 3.6 Plus β monitor for open weight release
4. Keye-VL-1.5-8B β evaluate when bandwidth allows
No Action Needed: Qwen3.5-Omni (closed), GLM-5V-Turbo (API/coding)
Summary Table
| Model | Released | Size | Open | Relevance | Action |
|---|---|---|---|---|---|
| Gemma 4 (E4B, 26B-A4B) | Apr 2 | E4B / 26B MoE | Yes (Apache 2.0) | HIGH | Eval now |
| IBM Granite 4.0 3B Vision | Apr 1 | ~4B | Yes | MEDIUM | Eval this week |
| GLM-5V-Turbo | Apr 1 | Unknown | No (API) | LOW | Skip |
| Qwen 3.6 Plus Preview | Mar 31 | Large | No (API) | MEDIUM | Watch |
| Qwen3.5-Omni | Mar 30 | Multiple | No | LOW | Skip |
| Phi-4-Reasoning-Vision | Mar 4 | 15B | Yes | MEDIUM | Eval when free |
| Keye-VL-1.5-8B | Recent | 8B | Yes | MEDIUM | Eval when free |