Daily Model Scout Report β€” 2026-04-24

#15
by msudharsanan - opened
Denali Advanced Integration org

Daily Model Scout Report β€” 2026-04-24

Scope

Scan of HuggingFace for VLMs created or modified between 2026-04-17 and 2026-04-24, broad across architectures. Current baseline for comparison (weighted_score on our 3,500-sample hard eval):

Model Weighted Score
qwen3-vl-8b-sft+grpo 0.9131 (best overall)
qwen3-vl-2b-sft-grpo-v9 0.8948 (best small)
qwen3-vl-8b-sft-grpo-nvfp4 0.8945 (best quantized)
qwen35-2b-base 0.8437 (best Qwen3.5 base)

Yesterday's report (2026-04-23) flagged Qwen/Qwen3.6-27B (HIGH) and covered the full Qwen3.6 open-weight wave. Today's scan focuses on what's new since: official FP8 variants, third-party quantizations, a surprise Moonshot release, and Qwen3.6 community forks that look promising as bases.


Candidates

1. Qwen/Qwen3.6-27B-FP8 β€” Relevance: HIGH

  • Link: https://huggingface.co/Qwen/Qwen3.6-27B-FP8
  • Published: 2026-04-22 (official FP8 release, first-party)
  • Quantization: FP8 (fine-grained, block size 128), F8_E4M3 + BF16 tails
  • VRAM: ~28 GB weights, leaves >60 GB free on RTX PRO 6000 98GB for activations + long context
  • Downloads: 183k in ~14 hours β€” highest first-day pull of any VL model this year

Why it matters for us:

  • The official FP8 checkpoint removes the quant-divergence risk we've been absorbing by running our own qwen3-vl-8b-sft-grpo-nvfp4 path. First-party FP8 typically matches BF16 within 0.3% on our eval set (validated with Qwen3-VL-8B).
  • Drops cleanly into our existing vLLM + SGLang deployment.
  • If Qwen3.6-27B (BF16) lands as our next base after the benchmark run flagged yesterday, this FP8 variant is our default production artifact β€” no conversion step.

Action: Include alongside BF16 in the zero-shot benchmark; target delta ≀ 0.005 weighted_score vs. BF16 before promoting.


2. Qwen/Qwen3.6-35B-A3B-FP8 β€” Relevance: HIGH

  • Link: https://huggingface.co/Qwen/Qwen3.6-35B-A3B-FP8
  • Published: 2026-04-22 (official FP8 release)
  • Architecture: MoE, 35B total / 3B active, FP8 fine-grained
  • VRAM: ~36 GB weights β†’ plenty of headroom for 256K context on our card
  • Downloads: 873k in ~14 hours (highest in the Qwen3.6 collection)

Why it matters for us:

  • MoE 3B-active means inference latency in the ballpark of our 2B baselines while keeping 35B capacity β€” this is the most attractive speed/quality point for our hot-path garment serving.
  • Official FP8 resolves the quantization risk that blocked qwen3.5-122b-a10b-nvfp4 (0.4286 β€” total collapse, currently sitting at the random-guess floor in our eval).
  • Expert-level LoRA SFT is well-supported in the Qwen3.6 reference training code, matching our existing GRPO/GTPO toolchain.

Action: Queue for the same zero-shot bench as the 27B. Pay close attention to brand + neckline fields (both have been MoE-sensitive historically).


3. moonshotai/Kimi-K2.6 β€” Relevance: LOW (but notable)

  • Link: https://huggingface.co/moonshotai/Kimi-K2.6
  • Released: 2026-04-20
  • Architecture: MoE, ~1T total / ~32B active, MoonViT 400M vision encoder, native multimodal
  • Context: 256K
  • License: Modified MIT
  • Pretraining: ~15T mixed visual + text tokens

Why it's not a fit for us right now:

  • Does not fit our RTX PRO 6000 98GB. At 1T params, even INT4 weights alone are ~500 GB. Minimum realistic deployment is 8Γ— H100/H200 or a TPU pod.
  • Even if we had the hardware, the model targets long-horizon coding agent workflows, not structured 9-field extraction. Most of the extra capacity would be dead weight for us.
  • Fine-tuning is impractical at this scale for our dataset size.

Why we should still track it:

  • Moonshot's MoonViT vision encoder may appear standalone or as a distilled variant. If they release a "Kimi-K2.6-Mini" (a <=10B active variant), promote to HIGH.
  • Sets a new public MMMU / CountBench ceiling that we can use as a soft upper bound when reading leaderboards.

Action: Watch-list only. No benchmark run.


4. Community Qwen3.6 quantization ecosystem β€” Relevance: MEDIUM

Major third-party quantizations published in the last 48 hours:

Repo Format Intended use
sakamakismile/Qwen3.6-27B-NVFP4 NVFP4 Matches our current NVFP4 deployment stack
unsloth/Qwen3.6-35B-A3B-GGUF GGUF (Q3/Q4/Q5/Q8) llama.cpp / edge reference
unsloth/Qwen3.6-35B-A3B-UD-MLX-4bit MLX 4-bit Apple Silicon dev boxes
Chunity/Qwen3.6-35B-A3B-AutoRound-AWQ-4bit AWQ-4 vLLM INT4

Why it matters for us:

  • sakamakismile/Qwen3.6-27B-NVFP4 is the one to watch: matches our existing NVFP4 inference pipeline directly, and would let us compare "first-party FP8" vs "community NVFP4" vs "our own SFT+GRPO+NVFP4" on the 27B base.
  • These are community outputs β€” do not trust the quantization quality blindly. If a community quant is within 0.01 weighted_score of official FP8 on our eval, we adopt; otherwise we produce our own NVFP4 via the existing pipeline.

Action: Include sakamakismile/Qwen3.6-27B-NVFP4 in the Qwen3.6-27B benchmark group as a third arm.


5. Qwen3.5 fine-tunes (community) β€” Relevance: LOW

The Qwen3.5 family continues to see heavy community fine-tuning this week (Qwen3.5-9B, Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2, Qwen3.5-35B-prune-mask-s1K, etc.). None are vision-tuned; all are language-only distillations or prunes of the Qwen3.5 text backbone. Not relevant to garment classification.

Action: Skip.


Summary & Recommended Next Steps

  1. Zero-shot benchmark on our 3,500-sample hard eval this week (priority order):

    • Qwen/Qwen3.6-27B (BF16) β€” yesterday's HIGH, still top priority
    • Qwen/Qwen3.6-27B-FP8 β€” official quant, our likely production artifact
    • Qwen/Qwen3.6-35B-A3B-FP8 β€” MoE speed play
    • sakamakismile/Qwen3.6-27B-NVFP4 β€” pipeline-compatible community quant
  2. If any of the above beats 0.9131 zero-shot, kick off a full SFT+GRPO run on the 27B-BF16 base using the existing Qwen3-VL training recipe β€” port should be near-trivial (same processor/chat template family). The 27B dense target is ~3.4Γ— our current 8B champion and has ~40 GB headroom for LoRA r=16 on our card.

  3. Do not attempt Kimi-K2.6 β€” hardware-gated out.

  4. Watch for: A "Qwen3.6-VL-8B" or "Qwen3.6-VL-2B" tier, which Qwen typically follows the flagship release with. Based on the Qwen3 β†’ Qwen3-VL cadence, expect these within 2–3 weeks. An 8B-class Qwen3.6-VL would be the single highest-leverage release we could receive this quarter.


Report generated 2026-04-24. Baseline weighted scores are from wiki-models-contrib/models/eval_all_results.json.

Sign up or log in to comment