Daily Model Scout Report — 2026-04-27
#16
by msudharsanan - opened
Daily Model Scout Report — 2026-04-27
Scouting window: 2026-04-20 → 2026-04-27 (last 7 days), with a few late-March / mid-April items pulled in when they're clearly the headline release of the cycle.
Baselines (3,500-sample hard eval, _overall.weighted_score)
| Model | Score | Role |
|---|---|---|
qwen3-vl-8b-sft+grpo |
0.9131 | best overall |
qwen3-vl-8b-sft-grpo-nvfp4 |
0.8945 | best quantized |
qwen3-vl-2b-sft-grpo-v9 |
0.8948 | best small |
qwen35-2b-base |
0.8437 | best Qwen3.5 base |
High relevance — benchmark immediately
1. Qwen/Qwen3.6-27B (link)
- Released: 2026-04-21 · Likes: 899 · Downloads: ~400k
- Architecture: Dense 27B causal LM with vision encoder, 64 layers, hidden 5120
- Modalities: text + image + video
- License: Apache 2.0
- Context: 262k native, up to ~1M with YaRN
- Reported scores: VideoMME 87.7, V* 94.7, MMLU-Pro 86.2, GPQA-Diamond 87.8
- Why it matters: This is the natural successor to
Qwen3-VL-8B(our best base). Same family lineage, bigger backbone, fresh post-training. With our existing SFT+GRPO recipe it should land above 0.9131 if the underlying base is stronger than Qwen3-VL-8B-Instruct. - Cost note: 27B BF16 ≈ 54 GB weights — fits on a single RTX PRO 6000 98 GB for inference and SFT, but tighter than 8B. FP8 variant below halves it.
2. Qwen/Qwen3.6-35B-A3B (link)
- Released: 2026-04-15 · Likes: 1,448 · Downloads: ~1.35M
- Architecture: MoE — 35B total / 3B activated per token, 256 experts (8 routed + 1 shared), Gated DeltaNet + Gated Attention hybrid layout
- Modalities: text + image + video (up to 224k video tokens)
- License: Apache 2.0
- Context: 262k native, ~1M with YaRN
- Reported scores: RealWorldQA 85.3, MMBench-EN-DEV-v1.1 92.8, OmniDocBench 89.9, VideoMMU 83.7
- Why it matters: Active-param footprint is ~3B, so inference cost is comparable to our 2B SFT model while quality should approach the 27B dense. Strong document-understanding numbers (OmniDoc 89.9) are directly relevant to apparel tag/label OCR fields. Thinking mode is on by default — for our 9-field JSON extraction task, force non-thinking mode at eval.
- Cost note: 35B BF16 ≈ 70 GB — fits, but NVFP4 is the obvious deployment target.
3. RedHatAI/Qwen3.6-35B-A3B-NVFP4 (link)
- Released: ~2026-04-15+ · Likes: 106 · Downloads: ~525k
- Already-quantized NVFP4 build of #2. Matches our existing deployment format (we already run
qwen3-vl-8b-sft-grpo-nvfp4at 0.8945). Should land in the ~17 GB weight range — easy fit. - Why it matters: Skips the quantization step we'd otherwise have to redo ourselves. Good first-pass benchmark to see whether the 35B-A3B family is worth investing SFT cycles in.
4. Qwen/Qwen3.6-27B-FP8 and Qwen/Qwen3.6-35B-A3B-FP8
- Official Qwen FP8 quants released alongside the BF16 weights. Useful as a sanity-check rung between BF16 and our NVFP4 path.
Medium relevance — worth watching
5. moonshotai/Kimi-K2.6 (link)
- Released: 2026-04-14 · Likes: 1,093 · Downloads: ~443k
- Architecture: 1T total / 32B activated, MoonViT 400M vision encoder, 384 experts, 8 active per token
- License: Modified MIT
- Reported scores: MMMU-Pro 79.4, MathVision 87.4, SWE-Bench 80.2
- Why medium, not high: 1T parameters does not fit on 98 GB even at NVFP4 (~250 GB). Vision is also somewhat secondary in this release (mostly an agentic-coding model). Track for distilled or smaller variants.
6. Community quants of Qwen3.6 (unsloth/*-GGUF, cyankiwi/*-AWQ, lmstudio-community/*)
- Useful for local CPU/Mac smoke tests but not production candidates given our NVFP4 + RTX PRO 6000 path.
Low relevance / context only
tencent/HY-Embodied-0.5-X(2026-04-23, 4B/2B-active VLM): purpose-built for robotics / embodied planning, not general image classification. Skip.kai-os/Carnice-V2-27b(2026-04-25, 32 likes): community uncensored finetune in the Qwen3.6-27B family — irrelevant for our task.Guilherme34/Darwin-36B-Opus-ABLITERATED-HERETIC(2026-04-26): abliterated/distill chain, not a base candidate.nvidia/Qwen3-VL-235B-A22B-Instruct-NVFP4-MLPerf-Inference-Closed-V6.1(2026-04-07): NVIDIA's own NVFP4 of last-generation Qwen3-VL-235B — useful reference for NVFP4 calibration recipes, not a deployment target (235B too big).- No new Florence-3, InternVL4, PaliGemma3, Idefics4, MiniCPM-V-5, DeepSeek-VL3, LLaVA-OneVision-2, SmolVLM-3, Phi-5-Vision, or Molmo-2 releases in the window. The week is dominated by Qwen3.6.
- No fashion/apparel/garment-specific VLM finetunes of note this week. (
Denali-AI/granite4-vision-garment-classifierfrom 2026-04-03 is our own.)
Recommended next actions
- Run zero-shot eval on the 3.5k-sample hard eval set for, in priority order:
Qwen/Qwen3.6-27B(BF16 or FP8)RedHatAI/Qwen3.6-35B-A3B-NVFP4(cheapest first look at 35B-A3B)Qwen/Qwen3.6-35B-A3B(BF16) if NVFP4 looks promising
- If zero-shot meets or beats
qwen3-vl-8b-instruct-base(0.8751), kick off SFT+GRPO with the standard 9-field recipe and full pipeline (eval on 3.5k → update JSON/wiki → upload to HF with model card + charts). - Force non-thinking mode for Qwen3.6-35B-A3B eval — JSON-extraction tasks don't benefit from
<think>traces and they'll inflate latency. - Hold on Kimi-K2.6 until a smaller distilled variant lands — not deployable on current hardware.
Compiled by /hf-model-scout · 2026-04-27 · sources: HF Hub image-text-to-text listings sorted by created_at, individual model cards for top candidates.