| nohup: ignoring input |
| ============================================================ |
| EAGLE-3 Draft Head Training — OFFLINE mode |
| ============================================================ |
| Epochs: 5 |
| Max samples: 10000 |
| Max seq len: 512 |
| LR: 0.0003, warmup: 300 |
| Draft depth (K): 5 |
| Grad accum: 2, clip: 0.5 |
| Capture layers: (8, 24, 47) |
| Head layers: 8 |
| Loss type: fwd_kl |
| Focal gamma: 2.0 |
| Top-K logits: 64 |
| Flatness filter: 100% |
| Precompute dir: /run/media/echo/Echo/ECHO/training/Prototype Fireecho/tool/kernel/FireEcho Engine/eagle_precomputed |
|
|
| [1/4] Loading model... |
| [Auto-detect] Qwen3-Omni MoE thinker (30.5B total, ~3.3B active) |
| [FireEcho] Loading /run/media/echo/Echo/ECHO/training/Prototype Fireecho/model/Qwen3-Omni-30B-A3B-Instruct... |
| [FireEcho] AutoConfig failed ('Qwen3OmniMoeTalkerCodePredictorConfig' object has no attribute 'use_sliding_window'), loading config.json directly |
| Qwen3-Omni: will stream-load from 15 shards |
| [Qwen3 Streaming] Loaded shard index: 28010 keys across 15 shards |
| [Qwen3 Streaming] Building engine skeleton... |
| [Qwen3 Streaming] Global params on GPU: 1.2 GB |
| Layer 4/48: 393 weights, VRAM 2.8 GB, CPU 1.4 GB |
| Layer 8/48: 393 weights, VRAM 4.3 GB, CPU 1.6 GB |
| Layer 12/48: 393 weights, VRAM 5.8 GB, CPU 1.7 GB |
| Layer 16/48: 393 weights, VRAM 7.4 GB, CPU 1.9 GB |
| Layer 20/48: 393 weights, VRAM 8.9 GB, CPU 2.0 GB |
| Layer 24/48: 393 weights, VRAM 10.4 GB, CPU 2.2 GB |
| Layer 28/48: 393 weights, VRAM 11.9 GB, CPU 2.3 GB |
| Layer 32/48: 393 weights, VRAM 13.5 GB, CPU 2.5 GB |
| Layer 36/48: 393 weights, VRAM 15.0 GB, CPU 2.6 GB |
| Layer 40/48: 393 weights, VRAM 16.5 GB, CPU 2.8 GB |
| Layer 44/48: 393 weights, VRAM 18.0 GB, CPU 2.9 GB |
| Layer 48/48: 393 weights, VRAM 19.6 GB, CPU 3.1 GB |
| [Qwen3 Streaming] Final VRAM: 19.6 GB (FP4 quantized) |
| [Qwen3 Streaming] Done: 1571.8M params, 18867 weights loaded |
| Total params: 1.57B |
| Frozen params: 1.54B (base model, FP4) |
| Trainable params: 30.2M (Hebbian only) |
| [Flat KV] Enabled: 4096 tokens, 403 MB |
| [Packed MoE] 48 layers packed (6144 experts → contiguous) |
|
|
| [2/4] Enabling EAGLE-3 draft head... |
| [FE-XT] Draft head: D=8, 356.5M params, 713 MB, capture layers [8, 24, 47] + Hebbian memory |
| Trainable eagle params: 356.5M |
| [EAGLE] Loaded legacy D=2 checkpoint. 54 new layer params initialized randomly. |
| [Checkpoint] Optimizer state mismatch (head resized?), skipping. |
| [Checkpoint] Resumed from step 4000 (loss=5.0967) |
|
|
| [3/5] Loading external dataset... |
| Loading cached dataset from /run/media/echo/Echo/ECHO/training/Prototype Fireecho/tool/kernel/FireEcho Engine/eagle_data_codemix_cache.pt... |
| Loaded 10000 samples. |
|
|
| [OFFLINE] Loading precomputed features from /run/media/echo/Echo/ECHO/training/Prototype Fireecho/tool/kernel/FireEcho Engine/eagle_precomputed... |
| 2777 samples available |
|
|
| [OFFLINE] Starting training... |
| VRAM before training: 20.66 GB |
| [EAGLE-3] 27 rounds, 131 drafted, 5 accepted (4%), avg 0.2/round |
| [EAGLE-3] 29 rounds, 141 drafted, 1 accepted (1%), avg 0.0/round |
| [EAGLE-3] 29 rounds, 139 drafted, 2 accepted (1%), avg 0.1/round |
| [Eval @ step 4000] 180 tokens in 17.2s = 10.5 tok/s |
| Step 4100 | epoch 1/5 | loss=2.8709 | avg=4.6042 | acc=31.2% | lr=5.00e-05 | pos=64 |
| Step 4200 | epoch 1/5 | loss=3.2780 | avg=4.6526 | acc=35.3% | lr=1.00e-04 | pos=64 |
| Step 4300 | epoch 1/5 | loss=5.3967 | avg=4.6339 | acc=17.5% | lr=1.50e-04 | pos=64 |
| Step 4400 | epoch 1/5 | loss=5.6657 | avg=4.7462 | acc=12.8% | lr=2.00e-04 | pos=64 |
| Step 4500 | epoch 1/5 | loss=5.9773 | avg=4.8205 | acc=9.4% | lr=2.50e-04 | pos=64 |
| Step 4600 | epoch 1/5 | loss=5.4029 | avg=4.8950 | acc=16.9% | lr=3.00e-04 | pos=64 |
| Step 4700 | epoch 1/5 | loss=5.2982 | avg=4.9767 | acc=9.4% | lr=3.00e-04 | pos=64 |
| Step 4800 | epoch 1/5 | loss=5.0728 | avg=5.0216 | acc=12.2% | lr=3.00e-04 | pos=64 |
| Step 4900 | epoch 1/5 | loss=6.8400 | avg=5.0394 | acc=13.1% | lr=3.00e-04 | pos=64 |
| Step 5000 | epoch 1/5 | loss=5.1369 | avg=5.0459 | acc=16.2% | lr=2.99e-04 | pos=64 |
| [EAGLE-3] 30 rounds, 144 drafted, 1 accepted (1%), avg 0.0/round |
| [EAGLE-3] 29 rounds, 141 drafted, 1 accepted (1%), avg 0.0/round |
| [EAGLE-3] 30 rounds, 144 drafted, 0 accepted (0%), avg 0.0/round |
| [Eval @ step 5000] 181 tokens in 10.9s = 16.6 tok/s |
| [Checkpoint] Saved step 5000 (loss=5.1369) → /run/media/echo/Echo/ECHO/training/Prototype Fireecho/tool/kernel/FireEcho Engine/eagle_checkpoints/eagle_best.pt |
| [Best] New best tok/s: 16.6 (step 5000) |
| Step 5100 | epoch 1/5 | loss=5.3802 | avg=5.0351 | acc=16.2% | lr=2.99e-04 | pos=64 |
| Step 5200 | epoch 1/5 | loss=4.6753 | avg=4.9773 | acc=20.3% | lr=2.99e-04 | pos=64 |
| Step 5300 | epoch 1/5 | loss=4.3068 | avg=4.9713 | acc=24.4% | lr=2.98e-04 | pos=64 |
| Step 5400 | epoch 1/5 | loss=3.0352 | avg=4.9536 | acc=30.0% | lr=2.98e-04 | pos=64 |
| Step 5500 | epoch 1/5 | loss=4.8197 | avg=4.9954 | acc=21.9% | lr=2.97e-04 | pos=64 |
| Step 5600 | epoch 1/5 | loss=3.4431 | avg=5.0006 | acc=26.2% | lr=2.96e-04 | pos=64 |
| Step 5700 | epoch 1/5 | loss=3.6114 | avg=5.0065 | acc=22.8% | lr=2.95e-04 | pos=64 |
| Step 5800 | epoch 1/5 | loss=5.0362 | avg=4.9796 | acc=17.8% | lr=2.95e-04 | pos=64 |
| Step 5900 | epoch 1/5 | loss=5.8618 | avg=4.9976 | acc=8.4% | lr=2.94e-04 | pos=64 |
| Step 6000 | epoch 1/5 | loss=6.3429 | avg=4.9858 | acc=11.2% | lr=2.93e-04 | pos=64 |
| [EAGLE-3] 30 rounds, 144 drafted, 0 accepted (0%), avg 0.0/round |
| [EAGLE-3] 29 rounds, 141 drafted, 1 accepted (1%), avg 0.0/round |
| [EAGLE-3] 29 rounds, 141 drafted, 1 accepted (1%), avg 0.0/round |
| [Eval @ step 6000] 180 tokens in 10.5s = 17.1 tok/s |
| [Checkpoint] Saved step 6000 (loss=6.3429) → /run/media/echo/Echo/ECHO/training/Prototype Fireecho/tool/kernel/FireEcho Engine/eagle_checkpoints/eagle_best.pt |
| [Best] New best tok/s: 17.1 (step 6000) |
| [Checkpoint] Saved step 6000 (loss=6.3429) → /run/media/echo/Echo/ECHO/training/Prototype Fireecho/tool/kernel/FireEcho Engine/eagle_checkpoints/eagle_step6000.pt |
| Step 6100 | epoch 1/5 | loss=6.3301 | avg=4.9179 | acc=11.6% | lr=2.92e-04 | pos=64 |
| Step 6200 | epoch 1/5 | loss=4.4811 | avg=4.8956 | acc=19.4% | lr=2.90e-04 | pos=64 |
| Step 6300 | epoch 1/5 | loss=5.5715 | avg=4.9178 | acc=16.9% | lr=2.89e-04 | pos=64 |
| Step 6400 | epoch 1/5 | loss=3.3082 | avg=4.8940 | acc=28.7% | lr=2.88e-04 | pos=64 |
| Step 6500 | epoch 1/5 | loss=4.5000 | avg=4.9460 | acc=20.0% | lr=2.87e-04 | pos=64 |
| Step 6600 | epoch 1/5 | loss=4.0213 | avg=4.9359 | acc=18.8% | lr=2.85e-04 | pos=64 |
| Step 6700 | epoch 1/5 | loss=4.2572 | avg=4.9256 | acc=31.2% | lr=2.84e-04 | pos=64 |
| --- Epoch 1/5 complete (step 6777) --- |
| Step 6800 | epoch 2/5 | loss=3.7218 | avg=4.8991 | acc=24.1% | lr=2.82e-04 | pos=64 |
| Step 6900 | epoch 2/5 | loss=4.7880 | avg=4.8843 | acc=19.7% | lr=2.81e-04 | pos=64 |
| Step 7000 | epoch 2/5 | loss=5.4015 | avg=4.8636 | acc=9.7% | lr=2.79e-04 | pos=64 |
| [EAGLE-3] 29 rounds, 141 drafted, 1 accepted (1%), avg 0.0/round |
| [EAGLE-3] 30 rounds, 144 drafted, 0 accepted (0%), avg 0.0/round |
| [FE-MX] Expert tiers: 26 cold(FP4) / 61 warm(FP6) / 41 hot(FP8) |
| [FE-MX] Expert tiers: 24 cold(FP4) / 66 warm(FP6) / 38 hot(FP8) |
| [FE-MX] Expert tiers: 45 cold(FP4) / 43 warm(FP6) / 40 hot(FP8) |
| [FE-MX] Expert tiers: 40 cold(FP4) / 53 warm(FP6) / 35 hot(FP8) |
| [FE-MX] Expert tiers: 48 cold(FP4) / 46 warm(FP6) / 34 hot(FP8) |
| [FE-MX] Expert tiers: 47 cold(FP4) / 46 warm(FP6) / 35 hot(FP8) |
| [FE-MX] Expert tiers: 66 cold(FP4) / 32 warm(FP6) / 30 hot(FP8) |
| [FE-MX] Expert tiers: 67 cold(FP4) / 29 warm(FP6) / 32 hot(FP8) |
| [FE-MX] Expert tiers: 55 cold(FP4) / 42 warm(FP6) / 31 hot(FP8) |
| [FE-MX] Expert tiers: 50 cold(FP4) / 48 warm(FP6) / 30 hot(FP8) |
| [FE-MX] Expert tiers: 46 cold(FP4) / 47 warm(FP6) / 35 hot(FP8) |
| [FE-MX] Expert tiers: 40 cold(FP4) / 52 warm(FP6) / 36 hot(FP8) |
| [FE-MX] Expert tiers: 49 cold(FP4) / 48 warm(FP6) / 31 hot(FP8) |
| [FE-MX] Expert tiers: 49 cold(FP4) / 43 warm(FP6) / 36 hot(FP8) |
| [FE-MX] Expert tiers: 46 cold(FP4) / 42 warm(FP6) / 40 hot(FP8) |
| [FE-MX] Expert tiers: 51 cold(FP4) / 46 warm(FP6) / 31 hot(FP8) |
| [FE-MX] Expert tiers: 54 cold(FP4) / 39 warm(FP6) / 35 hot(FP8) |
| [FE-MX] Expert tiers: 51 cold(FP4) / 45 warm(FP6) / 32 hot(FP8) |
| [FE-MX] Expert tiers: 69 cold(FP4) / 30 warm(FP6) / 29 hot(FP8) |
| [FE-MX] Expert tiers: 77 cold(FP4) / 25 warm(FP6) / 26 hot(FP8) |
| [FE-MX] Expert tiers: 53 cold(FP4) / 45 warm(FP6) / 30 hot(FP8) |
| [FE-MX] Expert tiers: 52 cold(FP4) / 45 warm(FP6) / 31 hot(FP8) |
| [FE-MX] Expert tiers: 52 cold(FP4) / 41 warm(FP6) / 35 hot(FP8) |
| [FE-MX] Expert tiers: 47 cold(FP4) / 50 warm(FP6) / 31 hot(FP8) |
| [FE-MX] Expert tiers: 52 cold(FP4) / 47 warm(FP6) / 29 hot(FP8) |
| [FE-MX] Expert tiers: 49 cold(FP4) / 49 warm(FP6) / 30 hot(FP8) |
| [FE-MX] Expert tiers: 52 cold(FP4) / 40 warm(FP6) / 36 hot(FP8) |
| [FE-MX] Expert tiers: 54 cold(FP4) / 45 warm(FP6) / 29 hot(FP8) |
| [FE-MX] Expert tiers: 52 cold(FP4) / 42 warm(FP6) / 34 hot(FP8) |
| [FE-MX] Expert tiers: 55 cold(FP4) / 41 warm(FP6) / 32 hot(FP8) |
| [FE-MX] Expert tiers: 71 cold(FP4) / 30 warm(FP6) / 27 hot(FP8) |
| [FE-MX] Expert tiers: 77 cold(FP4) / 23 warm(FP6) / 28 hot(FP8) |
| [FE-MX] Expert tiers: 55 cold(FP4) / 41 warm(FP6) / 32 hot(FP8) |
| [FE-MX] Expert tiers: 49 cold(FP4) / 48 warm(FP6) / 31 hot(FP8) |
| [FE-MX] Expert tiers: 45 cold(FP4) / 48 warm(FP6) / 35 hot(FP8) |
| [FE-MX] Expert tiers: 40 cold(FP4) / 52 warm(FP6) / 36 hot(FP8) |
| [FE-MX] Expert tiers: 53 cold(FP4) / 44 warm(FP6) / 31 hot(FP8) |
| [FE-MX] Expert tiers: 44 cold(FP4) / 52 warm(FP6) / 32 hot(FP8) |
| [FE-MX] Expert tiers: 51 cold(FP4) / 39 warm(FP6) / 38 hot(FP8) |
| [FE-MX] Expert tiers: 51 cold(FP4) / 41 warm(FP6) / 36 hot(FP8) |
| [FE-MX] Expert tiers: 57 cold(FP4) / 29 warm(FP6) / 42 hot(FP8) |
| [FE-MX] Expert tiers: 55 cold(FP4) / 38 warm(FP6) / 35 hot(FP8) |
| [FE-MX] Expert tiers: 55 cold(FP4) / 33 warm(FP6) / 40 hot(FP8) |
| [FE-MX] Expert tiers: 53 cold(FP4) / 38 warm(FP6) / 37 hot(FP8) |
| [FE-MX] Expert tiers: 61 cold(FP4) / 31 warm(FP6) / 36 hot(FP8) |
| [FE-MX] Expert tiers: 58 cold(FP4) / 34 warm(FP6) / 36 hot(FP8) |
| [FE-MX] Expert tiers: 46 cold(FP4) / 48 warm(FP6) / 34 hot(FP8) |
| [FE-MX] Expert tiers: 41 cold(FP4) / 51 warm(FP6) / 36 hot(FP8) |
| [EAGLE-3] 30 rounds, 144 drafted, 0 accepted (0%), avg 0.0/round |
| [Eval @ step 7000] 180 tokens in 10.7s = 16.9 tok/s |
| Step 7100 | epoch 2/5 | loss=3.9199 | avg=4.8484 | acc=32.5% | lr=2.77e-04 | pos=64 |
| Step 7200 | epoch 2/5 | loss=4.4965 | avg=4.6926 | acc=23.1% | lr=2.75e-04 | pos=64 |
| Step 7300 | epoch 2/5 | loss=4.1791 | avg=4.6618 | acc=20.9% | lr=2.73e-04 | pos=64 |
| Step 7400 | epoch 2/5 | loss=3.6816 | avg=4.6057 | acc=22.2% | lr=2.71e-04 | pos=64 |
| Step 7500 | epoch 2/5 | loss=5.8260 | avg=4.5923 | acc=5.9% | lr=2.69e-04 | pos=64 |
| Step 7600 | epoch 2/5 | loss=4.9514 | avg=4.5939 | acc=18.4% | lr=2.67e-04 | pos=64 |
| Step 7700 | epoch 2/5 | loss=3.7191 | avg=4.6118 | acc=22.8% | lr=2.65e-04 | pos=64 |
| Step 7800 | epoch 2/5 | loss=4.6762 | avg=4.5979 | acc=19.1% | lr=2.63e-04 | pos=64 |
| Step 7900 | epoch 2/5 | loss=5.7284 | avg=4.5778 | acc=15.6% | lr=2.61e-04 | pos=64 |
| Step 8000 | epoch 2/5 | loss=5.9431 | avg=4.5689 | acc=4.7% | lr=2.59e-04 | pos=64 |
| [EAGLE-3] 29 rounds, 141 drafted, 1 accepted (1%), avg 0.0/round |
| [EAGLE-3] 29 rounds, 141 drafted, 1 accepted (1%), avg 0.0/round |
| [EAGLE-3] 29 rounds, 141 drafted, 1 accepted (1%), avg 0.0/round |
| [Eval @ step 8000] 180 tokens in 10.7s = 16.8 tok/s |
| [Checkpoint] Saved step 8000 (loss=5.9431) → /run/media/echo/Echo/ECHO/training/Prototype Fireecho/tool/kernel/FireEcho Engine/eagle_checkpoints/eagle_step8000.pt |
| Step 8100 | epoch 2/5 | loss=3.5748 | avg=4.4854 | acc=27.5% | lr=2.56e-04 | pos=64 |
| Step 8200 | epoch 2/5 | loss=3.9363 | avg=4.5077 | acc=32.5% | lr=2.54e-04 | pos=64 |
| Step 8300 | epoch 2/5 | loss=2.7494 | avg=4.4987 | acc=37.8% | lr=2.52e-04 | pos=64 |
| Step 8400 | epoch 2/5 | loss=4.1517 | avg=4.5172 | acc=25.0% | lr=2.49e-04 | pos=64 |
| Step 8500 | epoch 2/5 | loss=5.5557 | avg=4.4605 | acc=10.9% | lr=2.47e-04 | pos=64 |
| Step 8600 | epoch 2/5 | loss=2.5267 | avg=4.4706 | acc=31.6% | lr=2.44e-04 | pos=64 |
| Step 8700 | epoch 2/5 | loss=5.7917 | avg=4.4517 | acc=12.5% | lr=2.41e-04 | pos=64 |
| Step 8800 | epoch 2/5 | loss=5.8896 | avg=4.4381 | acc=12.5% | lr=2.39e-04 | pos=64 |
| Step 8900 | epoch 2/5 | loss=4.0428 | avg=4.4427 | acc=24.4% | lr=2.36e-04 | pos=64 |
| Step 9000 | epoch 2/5 | loss=5.2436 | avg=4.4426 | acc=9.7% | lr=2.33e-04 | pos=64 |
| [EAGLE-3] 30 rounds, 144 drafted, 0 accepted (0%), avg 0.0/round |
| [EAGLE-3] 30 rounds, 144 drafted, 0 accepted (0%), avg 0.0/round |
| [EAGLE-3] 30 rounds, 144 drafted, 0 accepted (0%), avg 0.0/round |
| [Eval @ step 9000] 180 tokens in 10.9s = 16.6 tok/s |
| Step 9100 | epoch 2/5 | loss=5.9143 | avg=4.2725 | acc=7.2% | lr=2.30e-04 | pos=64 |
| Step 9200 | epoch 2/5 | loss=5.3081 | avg=4.2707 | acc=12.8% | lr=2.28e-04 | pos=64 |
| Step 9300 | epoch 2/5 | loss=5.3774 | avg=4.3151 | acc=14.7% | lr=2.25e-04 | pos=64 |
| Step 9400 | epoch 2/5 | loss=5.7517 | avg=4.3221 | acc=17.8% | lr=2.22e-04 | pos=64 |
| Step 9500 | epoch 2/5 | loss=2.6826 | avg=4.3317 | acc=34.1% | lr=2.19e-04 | pos=64 |
| --- Epoch 2/5 complete (step 9554) --- |
| Step 9600 | epoch 3/5 | loss=4.7292 | avg=4.2845 | acc=20.9% | lr=2.16e-04 | pos=64 |
| Step 9700 | epoch 3/5 | loss=4.1688 | avg=4.2683 | acc=24.1% | lr=2.13e-04 | pos=64 |
| Step 9800 | epoch 3/5 | loss=4.5375 | avg=4.2397 | acc=21.9% | lr=2.10e-04 | pos=64 |
| Step 9900 | epoch 3/5 | loss=5.2854 | avg=4.2331 | acc=14.1% | lr=2.07e-04 | pos=64 |
| Step 10000 | epoch 3/5 | loss=4.0904 | avg=4.2228 | acc=25.3% | lr=2.04e-04 | pos=64 |
| [EAGLE-3] 29 rounds, 141 drafted, 1 accepted (1%), avg 0.0/round |
| [EAGLE-3] 29 rounds, 141 drafted, 1 accepted (1%), avg 0.0/round |
| [EAGLE-3] 29 rounds, 141 drafted, 1 accepted (1%), avg 0.0/round |
| [Eval @ step 10000] 180 tokens in 10.7s = 16.9 tok/s |
| [Checkpoint] Saved step 10000 (loss=4.0904) → /run/media/echo/Echo/ECHO/training/Prototype Fireecho/tool/kernel/FireEcho Engine/eagle_checkpoints/eagle_step10000.pt |
| Step 10100 | epoch 3/5 | loss=3.7871 | avg=3.9878 | acc=30.9% | lr=2.01e-04 | pos=64 |
| Step 10200 | epoch 3/5 | loss=2.2971 | avg=4.0641 | acc=37.8% | lr=1.98e-04 | pos=64 |
| Step 10300 | epoch 3/5 | loss=5.0256 | avg=4.0141 | acc=10.6% | lr=1.95e-04 | pos=64 |
| Step 10400 | epoch 3/5 | loss=5.8723 | avg=4.0130 | acc=10.3% | lr=1.92e-04 | pos=64 |
| Step 10500 | epoch 3/5 | loss=2.2164 | avg=3.9910 | acc=37.5% | lr=1.89e-04 | pos=64 |
|
|