============================================================ Eval Flow Test (replicates training eval) ============================================================ [1] Loading model... [Auto-detect] Qwen3-Omni MoE thinker (30.5B total, ~3.3B active) [FireEcho] Loading /run/media/echo/Echo/ECHO/training/Prototype Fireecho/model/Qwen3-Omni-30B-A3B-Instruct... [FireEcho] AutoConfig failed ('Qwen3OmniMoeTalkerCodePredictorConfig' object has no attribute 'use_sliding_window'), loading config.json directly Qwen3-Omni: will stream-load from 15 shards [Qwen3 Streaming] Loaded shard index: 28010 keys across 15 shards [Qwen3 Streaming] Building engine skeleton... [Qwen3 Streaming] Global params on GPU: 1.2 GB Layer 4/48: 393 weights, VRAM 2.8 GB, CPU 1.4 GB Layer 8/48: 393 weights, VRAM 4.3 GB, CPU 1.6 GB Layer 12/48: 393 weights, VRAM 5.8 GB, CPU 1.7 GB Layer 16/48: 393 weights, VRAM 7.3 GB, CPU 1.9 GB Layer 20/48: 393 weights, VRAM 8.9 GB, CPU 2.0 GB Layer 24/48: 393 weights, VRAM 10.4 GB, CPU 2.2 GB Layer 28/48: 393 weights, VRAM 11.9 GB, CPU 2.3 GB Layer 32/48: 393 weights, VRAM 13.4 GB, CPU 2.5 GB Layer 36/48: 393 weights, VRAM 15.0 GB, CPU 2.6 GB Layer 40/48: 393 weights, VRAM 16.5 GB, CPU 2.8 GB Layer 44/48: 393 weights, VRAM 18.0 GB, CPU 2.9 GB Layer 48/48: 393 weights, VRAM 19.5 GB, CPU 3.1 GB [Qwen3 Streaming] Final VRAM: 19.5 GB (FP4 quantized) [Qwen3 Streaming] Done: 1571.8M params, 18867 weights loaded Total params: 1.57B Frozen params: 1.54B (base model, FP4) Trainable params: 30.2M (Hebbian only) [Flat KV] Enabled: 4096 tokens, 403 MB [Packed MoE] 48 layers packed (6144 experts → contiguous) [2] Enabling EAGLE (no checkpoint)... [FE-XT] Draft head: D=8, 356.5M params, 713 MB, capture layers [8, 24, 47] + Hebbian memory [3] Loading checkpoint separately (like training script)... [EAGLE] Loaded legacy D=2 checkpoint. 54 new layer params initialized randomly. Loaded checkpoint (step 4000) VRAM: 21.25 GB [4a] Running manual speculation test WITHOUT warmup... --- Manual speculation test --- Prefill logits: has_nan=True FATAL: NaN in prefill! Cannot continue. [4b] Warmup (3x generate)... Warmup done [4c] Running manual speculation test AFTER warmup... --- Manual speculation test --- Prefill logits: has_nan=True FATAL: NaN in prefill! Cannot continue. [5] Running full speculative_generate eval... [EAGLE-3] 9 rounds, 43 drafted, 43 accepted (100%), avg 4.8/round Prompt 0: 61 tokens, 21.3 tok/s Output: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! WARNING: All tokens are the same (0) — likely NaN bug! [EAGLE-3] 9 rounds, 43 drafted, 43 accepted (100%), avg 4.8/round Prompt 1: 61 tokens, 32.5 tok/s Output: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! WARNING: All tokens are the same (0) — likely NaN bug! [EAGLE-3] 9 rounds, 43 drafted, 43 accepted (100%), avg 4.8/round Prompt 2: 61 tokens, 31.7 tok/s Output: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! WARNING: All tokens are the same (0) — likely NaN bug! ============================================================ Done ============================================================