| ============================================================ | |
| D=8 NaN Isolation | |
| ============================================================ | |
| [1] Loading model... | |
| [Auto-detect] Qwen3-Omni MoE thinker (30.5B total, ~3.3B active) | |
| [FireEcho] Loading /run/media/echo/Echo/ECHO/training/Prototype Fireecho/model/Qwen3-Omni-30B-A3B-Instruct... | |
| [FireEcho] AutoConfig failed ('Qwen3OmniMoeTalkerCodePredictorConfig' object has no attribute 'use_sliding_window'), loading config.json directly | |
| Qwen3-Omni: will stream-load from 15 shards | |
| [Qwen3 Streaming] Loaded shard index: 28010 keys across 15 shards | |
| [Qwen3 Streaming] Building engine skeleton... | |
| [Qwen3 Streaming] Global params on GPU: 1.2 GB | |
| Layer 4/48: 393 weights, VRAM 2.8 GB, CPU 1.4 GB | |
| Layer 8/48: 393 weights, VRAM 4.3 GB, CPU 1.6 GB | |
| Layer 12/48: 393 weights, VRAM 5.8 GB, CPU 1.7 GB | |
| Layer 16/48: 393 weights, VRAM 7.4 GB, CPU 1.9 GB | |
| Layer 20/48: 393 weights, VRAM 8.9 GB, CPU 2.0 GB | |
| Layer 24/48: 393 weights, VRAM 10.4 GB, CPU 2.2 GB | |
| Layer 28/48: 393 weights, VRAM 11.9 GB, CPU 2.3 GB | |
| Layer 32/48: 393 weights, VRAM 13.5 GB, CPU 2.5 GB | |
| Layer 36/48: 393 weights, VRAM 15.0 GB, CPU 2.6 GB | |
| Layer 40/48: 393 weights, VRAM 16.5 GB, CPU 2.8 GB | |
| Layer 44/48: 393 weights, VRAM 18.0 GB, CPU 2.9 GB | |
| Layer 48/48: 393 weights, VRAM 19.6 GB, CPU 3.1 GB | |
| [Qwen3 Streaming] Final VRAM: 19.6 GB (FP4 quantized) | |
| [Qwen3 Streaming] Done: 1571.8M params, 18867 weights loaded | |
| Total params: 1.57B | |
| Frozen params: 1.54B (base model, FP4) | |
| Trainable params: 30.2M (Hebbian only) | |
| [Packed MoE] 48 layers packed (6144 experts β contiguous) | |
| [Flat KV] Enabled: 4096 tokens, 403 MB | |
| [2] Warmup... | |
| VRAM baseline: 19.96 GB | |
| [3] Baseline (no eagle)... | |
| [baseline] OK β top=13048 ('Hi') | |
| [4] D=2 eagle head... | |
| [EAGLE] Loaded legacy D=2 checkpoint. 0 new layer params initialized randomly. | |
| [EAGLE-3] Draft head: D=2, 104.9M params, 210 MB, capture layers [8, 24, 47] + Hebbian memory | |
| VRAM: 20.17 GB (+0.21) | |
| [D=2] OK β top=13048 ('Hi') | |
| [5] D=8 eagle head (random init, no checkpoint)... | |
| [FE-XT] Draft head: D=8, 356.5M params, 713 MB, capture layers [8, 24, 47] + Hebbian memory | |
| VRAM: 20.67 GB (+0.72) | |
| [D=8 random] OK β top=13048 ('Hi') | |
| [6] D=8 eagle head (with checkpoint)... | |
| [EAGLE] Loaded legacy D=2 checkpoint. 54 new layer params initialized randomly. | |
| [FE-XT] Draft head: D=8, 356.5M params, 713 MB, capture layers [8, 24, 47] + Hebbian memory | |
| VRAM: 20.67 GB (+0.72) | |
| [D=8 with ckpt] OK β top=13048 ('Hi') | |
| [7] D=8 eagle head (allocated, NOT registered on engine)... | |
| VRAM: 20.67 GB (+0.72) | |
| [D=8 unregistered] OK β top=13048 ('Hi') | |
| [8] D=4 eagle head (checkpoint)... | |
| [EAGLE] Loaded legacy D=2 checkpoint. 18 new layer params initialized randomly. | |
| [FE-XT] Draft head: D=4, 188.8M params, 378 MB, capture layers [8, 24, 47] + Hebbian memory | |
| VRAM: 20.34 GB (+0.38) | |
| [D=4] OK β top=13048 ('Hi') | |
| [9] D=8 eagle head, but _eagle_enabled=False... | |
| [EAGLE] Loaded legacy D=2 checkpoint. 54 new layer params initialized randomly. | |
| [FE-XT] Draft head: D=8, 356.5M params, 713 MB, capture layers [8, 24, 47] + Hebbian memory | |
| VRAM: 20.67 GB (+0.72) | |
| [D=8 flag OFF] OK β top=13048 ('Hi') | |
| ============================================================ | |
| RESULTS | |
| ============================================================ | |
| D=8 random: OK | |
| D=8 with ckpt: OK | |
| D=8 unregistered: OK | |
| D=4: OK | |
| D=8 flag OFF: OK | |