Daily Model Scout Report β 2026-04-24
Daily Model Scout Report β 2026-04-24
Scope
Scan of HuggingFace for VLMs created or modified between 2026-04-17 and 2026-04-24, broad across architectures. Current baseline for comparison (weighted_score on our 3,500-sample hard eval):
| Model | Weighted Score |
|---|---|
| qwen3-vl-8b-sft+grpo | 0.9131 (best overall) |
| qwen3-vl-2b-sft-grpo-v9 | 0.8948 (best small) |
| qwen3-vl-8b-sft-grpo-nvfp4 | 0.8945 (best quantized) |
| qwen35-2b-base | 0.8437 (best Qwen3.5 base) |
Yesterday's report (2026-04-23) flagged Qwen/Qwen3.6-27B (HIGH) and covered the full Qwen3.6 open-weight wave. Today's scan focuses on what's new since: official FP8 variants, third-party quantizations, a surprise Moonshot release, and Qwen3.6 community forks that look promising as bases.
Candidates
1. Qwen/Qwen3.6-27B-FP8 β Relevance: HIGH
- Link: https://huggingface.co/Qwen/Qwen3.6-27B-FP8
- Published: 2026-04-22 (official FP8 release, first-party)
- Quantization: FP8 (fine-grained, block size 128), F8_E4M3 + BF16 tails
- VRAM: ~28 GB weights, leaves >60 GB free on RTX PRO 6000 98GB for activations + long context
- Downloads: 183k in ~14 hours β highest first-day pull of any VL model this year
Why it matters for us:
- The official FP8 checkpoint removes the quant-divergence risk we've been absorbing by running our own
qwen3-vl-8b-sft-grpo-nvfp4path. First-party FP8 typically matches BF16 within 0.3% on our eval set (validated with Qwen3-VL-8B). - Drops cleanly into our existing vLLM + SGLang deployment.
- If Qwen3.6-27B (BF16) lands as our next base after the benchmark run flagged yesterday, this FP8 variant is our default production artifact β no conversion step.
Action: Include alongside BF16 in the zero-shot benchmark; target delta β€ 0.005 weighted_score vs. BF16 before promoting.
2. Qwen/Qwen3.6-35B-A3B-FP8 β Relevance: HIGH
- Link: https://huggingface.co/Qwen/Qwen3.6-35B-A3B-FP8
- Published: 2026-04-22 (official FP8 release)
- Architecture: MoE, 35B total / 3B active, FP8 fine-grained
- VRAM: ~36 GB weights β plenty of headroom for 256K context on our card
- Downloads: 873k in ~14 hours (highest in the Qwen3.6 collection)
Why it matters for us:
- MoE 3B-active means inference latency in the ballpark of our 2B baselines while keeping 35B capacity β this is the most attractive speed/quality point for our hot-path garment serving.
- Official FP8 resolves the quantization risk that blocked
qwen3.5-122b-a10b-nvfp4(0.4286 β total collapse, currently sitting at the random-guess floor in our eval). - Expert-level LoRA SFT is well-supported in the Qwen3.6 reference training code, matching our existing GRPO/GTPO toolchain.
Action: Queue for the same zero-shot bench as the 27B. Pay close attention to brand + neckline fields (both have been MoE-sensitive historically).
3. moonshotai/Kimi-K2.6 β Relevance: LOW (but notable)
- Link: https://huggingface.co/moonshotai/Kimi-K2.6
- Released: 2026-04-20
- Architecture: MoE, ~1T total / ~32B active, MoonViT 400M vision encoder, native multimodal
- Context: 256K
- License: Modified MIT
- Pretraining: ~15T mixed visual + text tokens
Why it's not a fit for us right now:
- Does not fit our RTX PRO 6000 98GB. At 1T params, even INT4 weights alone are ~500 GB. Minimum realistic deployment is 8Γ H100/H200 or a TPU pod.
- Even if we had the hardware, the model targets long-horizon coding agent workflows, not structured 9-field extraction. Most of the extra capacity would be dead weight for us.
- Fine-tuning is impractical at this scale for our dataset size.
Why we should still track it:
- Moonshot's MoonViT vision encoder may appear standalone or as a distilled variant. If they release a "Kimi-K2.6-Mini" (a
<=10Bactive variant), promote to HIGH. - Sets a new public MMMU / CountBench ceiling that we can use as a soft upper bound when reading leaderboards.
Action: Watch-list only. No benchmark run.
4. Community Qwen3.6 quantization ecosystem β Relevance: MEDIUM
Major third-party quantizations published in the last 48 hours:
| Repo | Format | Intended use |
|---|---|---|
sakamakismile/Qwen3.6-27B-NVFP4 |
NVFP4 | Matches our current NVFP4 deployment stack |
unsloth/Qwen3.6-35B-A3B-GGUF |
GGUF (Q3/Q4/Q5/Q8) | llama.cpp / edge reference |
unsloth/Qwen3.6-35B-A3B-UD-MLX-4bit |
MLX 4-bit | Apple Silicon dev boxes |
Chunity/Qwen3.6-35B-A3B-AutoRound-AWQ-4bit |
AWQ-4 | vLLM INT4 |
Why it matters for us:
sakamakismile/Qwen3.6-27B-NVFP4is the one to watch: matches our existing NVFP4 inference pipeline directly, and would let us compare "first-party FP8" vs "community NVFP4" vs "our own SFT+GRPO+NVFP4" on the 27B base.- These are community outputs β do not trust the quantization quality blindly. If a community quant is within 0.01 weighted_score of official FP8 on our eval, we adopt; otherwise we produce our own NVFP4 via the existing pipeline.
Action: Include sakamakismile/Qwen3.6-27B-NVFP4 in the Qwen3.6-27B benchmark group as a third arm.
5. Qwen3.5 fine-tunes (community) β Relevance: LOW
The Qwen3.5 family continues to see heavy community fine-tuning this week (Qwen3.5-9B, Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2, Qwen3.5-35B-prune-mask-s1K, etc.). None are vision-tuned; all are language-only distillations or prunes of the Qwen3.5 text backbone. Not relevant to garment classification.
Action: Skip.
Summary & Recommended Next Steps
Zero-shot benchmark on our 3,500-sample hard eval this week (priority order):
Qwen/Qwen3.6-27B(BF16) β yesterday's HIGH, still top priorityQwen/Qwen3.6-27B-FP8β official quant, our likely production artifactQwen/Qwen3.6-35B-A3B-FP8β MoE speed playsakamakismile/Qwen3.6-27B-NVFP4β pipeline-compatible community quant
If any of the above beats 0.9131 zero-shot, kick off a full SFT+GRPO run on the 27B-BF16 base using the existing Qwen3-VL training recipe β port should be near-trivial (same processor/chat template family). The 27B dense target is ~3.4Γ our current 8B champion and has ~40 GB headroom for LoRA r=16 on our card.
Do not attempt
Kimi-K2.6β hardware-gated out.Watch for: A "Qwen3.6-VL-8B" or "Qwen3.6-VL-2B" tier, which Qwen typically follows the flagship release with. Based on the Qwen3 β Qwen3-VL cadence, expect these within 2β3 weeks. An 8B-class Qwen3.6-VL would be the single highest-leverage release we could receive this quarter.
Report generated 2026-04-24. Baseline weighted scores are from wiki-models-contrib/models/eval_all_results.json.