File size: 2,569 Bytes
b5bff9c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | ============================================================
NaN Isolation Test
============================================================
[1/6] Loading model...
[Auto-detect] Qwen3-Omni MoE thinker (30.5B total, ~3.3B active)
[FireEcho] Loading /run/media/echo/Echo/ECHO/training/Prototype Fireecho/model/Qwen3-Omni-30B-A3B-Instruct...
[FireEcho] AutoConfig failed ('Qwen3OmniMoeTalkerCodePredictorConfig' object has no attribute 'use_sliding_window'), loading config.json directly
Qwen3-Omni: will stream-load from 15 shards
[Qwen3 Streaming] Loaded shard index: 28010 keys across 15 shards
[Qwen3 Streaming] Building engine skeleton...
[Qwen3 Streaming] Global params on GPU: 1.2 GB
Layer 4/48: 393 weights, VRAM 2.8 GB, CPU 1.4 GB
Layer 8/48: 393 weights, VRAM 4.3 GB, CPU 1.6 GB
Layer 12/48: 393 weights, VRAM 5.8 GB, CPU 1.7 GB
Layer 16/48: 393 weights, VRAM 7.4 GB, CPU 1.9 GB
Layer 20/48: 393 weights, VRAM 8.9 GB, CPU 2.0 GB
Layer 24/48: 393 weights, VRAM 10.4 GB, CPU 2.2 GB
Layer 28/48: 393 weights, VRAM 11.9 GB, CPU 2.3 GB
Layer 32/48: 393 weights, VRAM 13.5 GB, CPU 2.5 GB
Layer 36/48: 393 weights, VRAM 15.0 GB, CPU 2.6 GB
Layer 40/48: 393 weights, VRAM 16.5 GB, CPU 2.8 GB
Layer 44/48: 393 weights, VRAM 18.0 GB, CPU 2.9 GB
Layer 48/48: 393 weights, VRAM 19.6 GB, CPU 3.1 GB
[Qwen3 Streaming] Final VRAM: 19.6 GB (FP4 quantized)
[Qwen3 Streaming] Done: 1571.8M params, 18867 weights loaded
Total params: 1.57B
Frozen params: 1.54B (base model, FP4)
Trainable params: 30.2M (Hebbian only)
[Packed MoE] 48 layers packed (6144 experts → contiguous)
[Flat KV] Enabled: 4096 tokens, 403 MB
VRAM after load: 19.95 GB
[2/6] Warmup...
[3/6] Test BEFORE enable_eagle()...
[before eagle] OK — top token=13048 ('Hi'), max=26.88
[4/6] Test: just set _eagle_enabled=True (no head creation)...
[flag only] OK — top token=13048 ('Hi'), max=26.88
[5/6] Test: create eagle head + assign as submodule...
VRAM after eagle head: 20.17 GB (+0.22 GB)
[with head (no ckpt)] OK — top token=13048 ('Hi'), max=26.88
[6/6] Test: load checkpoint into eagle head...
[EAGLE] Loaded legacy D=2 checkpoint. 0 new layer params initialized randomly.
[with ckpt] OK — top token=13048 ('Hi'), max=26.88
============================================================
RESULTS
============================================================
Before eagle: OK
Flag only: OK
With head (no ckpt): OK
With checkpoint: OK
All tests passed — no NaN detected!
|