Daily Model Scout Report — 2026-04-24

#15

by msudharsanan - opened 20 days ago

Denali Advanced Integration org 20 days ago

Daily Model Scout Report — 2026-04-24

Scope

Scan of HuggingFace for VLMs created or modified between 2026-04-17 and 2026-04-24, broad across architectures. Current baseline for comparison (weighted_score on our 3,500-sample hard eval):

Model	Weighted Score
qwen3-vl-8b-sft+grpo	0.9131 (best overall)
qwen3-vl-2b-sft-grpo-v9	0.8948 (best small)
qwen3-vl-8b-sft-grpo-nvfp4	0.8945 (best quantized)
qwen35-2b-base	0.8437 (best Qwen3.5 base)

Yesterday's report (2026-04-23) flagged Qwen/Qwen3.6-27B (HIGH) and covered the full Qwen3.6 open-weight wave. Today's scan focuses on what's new since: official FP8 variants, third-party quantizations, a surprise Moonshot release, and Qwen3.6 community forks that look promising as bases.

Candidates

1. `Qwen/Qwen3.6-27B-FP8` — Relevance: HIGH

Link: https://huggingface.co/Qwen/Qwen3.6-27B-FP8
Published: 2026-04-22 (official FP8 release, first-party)
Quantization: FP8 (fine-grained, block size 128), F8_E4M3 + BF16 tails
VRAM: ~28 GB weights, leaves >60 GB free on RTX PRO 6000 98GB for activations + long context
Downloads: 183k in ~14 hours — highest first-day pull of any VL model this year

Why it matters for us:

The official FP8 checkpoint removes the quant-divergence risk we've been absorbing by running our own qwen3-vl-8b-sft-grpo-nvfp4 path. First-party FP8 typically matches BF16 within 0.3% on our eval set (validated with Qwen3-VL-8B).
Drops cleanly into our existing vLLM + SGLang deployment.
If Qwen3.6-27B (BF16) lands as our next base after the benchmark run flagged yesterday, this FP8 variant is our default production artifact — no conversion step.

Action: Include alongside BF16 in the zero-shot benchmark; target delta ≤ 0.005 weighted_score vs. BF16 before promoting.

2. `Qwen/Qwen3.6-35B-A3B-FP8` — Relevance: HIGH

Link: https://huggingface.co/Qwen/Qwen3.6-35B-A3B-FP8
Published: 2026-04-22 (official FP8 release)
Architecture: MoE, 35B total / 3B active, FP8 fine-grained
VRAM: ~36 GB weights → plenty of headroom for 256K context on our card
Downloads: 873k in ~14 hours (highest in the Qwen3.6 collection)

Why it matters for us:

MoE 3B-active means inference latency in the ballpark of our 2B baselines while keeping 35B capacity — this is the most attractive speed/quality point for our hot-path garment serving.
Official FP8 resolves the quantization risk that blocked qwen3.5-122b-a10b-nvfp4 (0.4286 — total collapse, currently sitting at the random-guess floor in our eval).
Expert-level LoRA SFT is well-supported in the Qwen3.6 reference training code, matching our existing GRPO/GTPO toolchain.

Action: Queue for the same zero-shot bench as the 27B. Pay close attention to brand + neckline fields (both have been MoE-sensitive historically).

3. `moonshotai/Kimi-K2.6` — Relevance: LOW (but notable)

Link: https://huggingface.co/moonshotai/Kimi-K2.6
Released: 2026-04-20
Architecture: MoE, ~1T total / ~32B active, MoonViT 400M vision encoder, native multimodal
Context: 256K
License: Modified MIT
Pretraining: ~15T mixed visual + text tokens

Why it's not a fit for us right now:

Does not fit our RTX PRO 6000 98GB. At 1T params, even INT4 weights alone are ~500 GB. Minimum realistic deployment is 8× H100/H200 or a TPU pod.
Even if we had the hardware, the model targets long-horizon coding agent workflows, not structured 9-field extraction. Most of the extra capacity would be dead weight for us.
Fine-tuning is impractical at this scale for our dataset size.

Why we should still track it:

Moonshot's MoonViT vision encoder may appear standalone or as a distilled variant. If they release a "Kimi-K2.6-Mini" (a <=10B active variant), promote to HIGH.
Sets a new public MMMU / CountBench ceiling that we can use as a soft upper bound when reading leaderboards.

Action: Watch-list only. No benchmark run.

4. Community Qwen3.6 quantization ecosystem — Relevance: MEDIUM

Major third-party quantizations published in the last 48 hours:

Repo	Format	Intended use
`sakamakismile/Qwen3.6-27B-NVFP4`	NVFP4	Matches our current NVFP4 deployment stack
`unsloth/Qwen3.6-35B-A3B-GGUF`	GGUF (Q3/Q4/Q5/Q8)	llama.cpp / edge reference
`unsloth/Qwen3.6-35B-A3B-UD-MLX-4bit`	MLX 4-bit	Apple Silicon dev boxes
`Chunity/Qwen3.6-35B-A3B-AutoRound-AWQ-4bit`	AWQ-4	vLLM INT4

Why it matters for us:

sakamakismile/Qwen3.6-27B-NVFP4 is the one to watch: matches our existing NVFP4 inference pipeline directly, and would let us compare "first-party FP8" vs "community NVFP4" vs "our own SFT+GRPO+NVFP4" on the 27B base.
These are community outputs — do not trust the quantization quality blindly. If a community quant is within 0.01 weighted_score of official FP8 on our eval, we adopt; otherwise we produce our own NVFP4 via the existing pipeline.

Action: Include sakamakismile/Qwen3.6-27B-NVFP4 in the Qwen3.6-27B benchmark group as a third arm.

5. Qwen3.5 fine-tunes (community) — Relevance: LOW

The Qwen3.5 family continues to see heavy community fine-tuning this week (Qwen3.5-9B, Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2, Qwen3.5-35B-prune-mask-s1K, etc.). None are vision-tuned; all are language-only distillations or prunes of the Qwen3.5 text backbone. Not relevant to garment classification.

Action: Skip.

Summary & Recommended Next Steps

Zero-shot benchmark on our 3,500-sample hard eval this week (priority order):
- Qwen/Qwen3.6-27B (BF16) — yesterday's HIGH, still top priority
- Qwen/Qwen3.6-27B-FP8 — official quant, our likely production artifact
- Qwen/Qwen3.6-35B-A3B-FP8 — MoE speed play
- sakamakismile/Qwen3.6-27B-NVFP4 — pipeline-compatible community quant
If any of the above beats 0.9131 zero-shot, kick off a full SFT+GRPO run on the 27B-BF16 base using the existing Qwen3-VL training recipe — port should be near-trivial (same processor/chat template family). The 27B dense target is ~3.4× our current 8B champion and has ~40 GB headroom for LoRA r=16 on our card.
Do not attempt Kimi-K2.6 — hardware-gated out.
Watch for: A "Qwen3.6-VL-8B" or "Qwen3.6-VL-2B" tier, which Qwen typically follows the flagship release with. Based on the Qwen3 → Qwen3-VL cadence, expect these within 2–3 weeks. An 8B-class Qwen3.6-VL would be the single highest-leverage release we could receive this quarter.

Report generated 2026-04-24. Baseline weighted scores are from wiki-models-contrib/models/eval_all_results.json.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment