Chickaboo
/

Pulse88-E-40M-Alpha-Preview

+Warmup configured to 2,110 steps (~0.75 epoch, <= 1.00 epoch cap).
+Step checkpoints configured every 1,000 optimizer steps.
+Launching DDP run:
+OMP_NUM_THREADS per rank: 2
+/usr/local/bin/torchrun --standalone --nnodes=1 --nproc_per_node 2 /kaggle/input/datasets/chickaboomcmurtrie/magic-midi/training/train_variant_e_40m_ddp.py --pretokenized_manifest /kaggle/working/outputs/sub100m_e_40m_100k/source_pretokenized/manifest.json --pretokenized_root /kaggle/input/datasets/chickaboomcmurtrie/pulse88-tokenize-100k --output_dir /kaggle/working/outputs/sub100m_e_40m_100k --max_pieces 100000 --seed 42 --seed_length 512 --continuation_length 1536 --max_sequence_length 2048 --epochs 20 --batch_size 2 --grad_accumulation_steps 8 --learning_rate 0.0002 --warmup_ratio 0.03750444365446143 --warmup_steps 2110 --min_lr_ratio 0.1 --weight_decay 0.01 --label_smoothing 0.1 --max_grad_norm 1.0 --num_workers 4 --log_every_n_steps 50 --save_every_n_steps 1000 --save_every_n_epochs 5 --resume_mode remaining --no_auto_resume
+[W412 20:47:10.890281599 socket.cpp:207] [c10d] The hostname of the client socket cannot be retrieved. err=-3
+/kaggle/input/datasets/chickaboomcmurtrie/magic-midi/model/mamba_block.py:17: UserWarning: mamba-ssm not available (requires CUDA). Using torch reference fallback for development. Install mamba-ssm on Colab/Kaggle for full performance.
+  warnings.warn(
+/kaggle/input/datasets/chickaboomcmurtrie/magic-midi/model/mamba_block.py:21: UserWarning: mamba-ssm import details: No module named 'mamba_ssm'
+  warnings.warn(f"mamba-ssm import details: {exc}")
+/kaggle/input/datasets/chickaboomcmurtrie/magic-midi/model/mamba_block.py:17: UserWarning: mamba-ssm not available (requires CUDA). Using torch reference fallback for development. Install mamba-ssm on Colab/Kaggle for full performance.
+  warnings.warn(
+/kaggle/input/datasets/chickaboomcmurtrie/magic-midi/model/mamba_block.py:21: UserWarning: mamba-ssm import details: No module named 'mamba_ssm'
+  warnings.warn(f"mamba-ssm import details: {exc}")
+[W412 20:47:19.363074762 socket.cpp:207] [c10d] The hostname of the client socket cannot be retrieved. err=-3
+[W412 20:47:19.369976685 socket.cpp:207] [c10d] The hostname of the client socket cannot be retrieved. err=-3
+/usr/local/lib/python3.12/dist-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
+  return func(*args, **kwargs)
+[rank0]:[W412 20:47:19.400861194 ProcessGroupNCCL.cpp:5138] Guessing device ID based on global rank. This can cause a hang if rank to GPU mapping is heterogeneous. You can specify device_id in init_process_group()
+Variant-E 40M DDP model params: 40,784,528 (40.78M)
+Backend status: {'gdn_using_fallback': False, 'cfc_using_fallback': False}
+[20:50:56] [INFO] Loaded pre-tokenized manifest: kept=86199 skipped_unresolved=0 skipped_short=13796
+[20:50:56] [INFO] Loaded pre-tokenized manifest: kept=86199 skipped_unresolved=0 skipped_short=13796
+/usr/local/lib/python3.12/dist-packages/torch/autograd/graph.py:865: UserWarning: Memory Efficient attention defaults to a non-deterministic algorithm. To explicitly enable determinism call torch.use_deterministic_algorithms(True, warn_only=False). (Triggered internally at /pytorch/aten/src/ATen/native/transformers/cuda/attention_backward.cu:897.)
+  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
+/usr/local/lib/python3.12/dist-packages/torch/autograd/graph.py:865: UserWarning: Memory Efficient attention defaults to a non-deterministic algorithm. To explicitly enable determinism call torch.use_deterministic_algorithms(True, warn_only=False). (Triggered internally at /pytorch/aten/src/ATen/native/transformers/cuda/attention_backward.cu:897.)
+  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
+epoch=001 step=000050 train_loss=5.1351 lr=4.739336e-06
+epoch=001 step=000100 train_loss=5.1077 lr=9.478673e-06
+epoch=001 step=000150 train_loss=5.0852 lr=1.421801e-05
+epoch=001 step=000200 train_loss=5.0700 lr=1.895735e-05
+epoch=001 step=000250 train_loss=5.0517 lr=2.369668e-05
+epoch=001 step=000300 train_loss=5.0310 lr=2.843602e-05
+epoch=001 step=000350 train_loss=5.0036 lr=3.317536e-05
+epoch=001 step=000400 train_loss=4.9425 lr=3.791469e-05
+epoch=001 step=000450 train_loss=4.8817 lr=4.265403e-05
+epoch=001 step=000500 train_loss=4.8229 lr=4.739336e-05
+epoch=001 step=000550 train_loss=4.7634 lr=5.213270e-05
+epoch=001 step=000600 train_loss=4.7001 lr=5.687204e-05
+epoch=001 step=000650 train_loss=4.6303 lr=6.161137e-05
+epoch=001 step=000700 train_loss=4.5589 lr=6.635071e-05
+epoch=001 step=000750 train_loss=4.4806 lr=7.109005e-05
+epoch=001 step=000800 train_loss=4.4025 lr=7.582938e-05
+epoch=001 step=000850 train_loss=4.3220 lr=8.056872e-05
+epoch=001 step=000900 train_loss=4.2364 lr=8.530806e-05
+epoch=001 step=000950 train_loss=4.1570 lr=9.004739e-05
+Step checkpoint saved: epoch=001 step=001000 loss~4.0672
+epoch=001 step=001000 train_loss=4.0672 lr=9.478673e-05
+epoch=001 step=001050 train_loss=3.9799 lr=9.952607e-05
+epoch=001 step=001100 train_loss=3.8986 lr=1.042654e-04
+epoch=001 step=001150 train_loss=3.8111 lr=1.090047e-04
+epoch=001 step=001200 train_loss=3.7314 lr=1.137441e-04
+epoch=001 step=001250 train_loss=3.6563 lr=1.184834e-04
+epoch=001 step=001300 train_loss=3.5761 lr=1.232227e-04
+epoch=001 step=001350 train_loss=3.5077 lr=1.279621e-04
+epoch=001 step=001400 train_loss=3.4228 lr=1.327014e-04
+epoch=001 step=001450 train_loss=3.3561 lr=1.374408e-04
+epoch=001 step=001500 train_loss=3.2874 lr=1.421801e-04
+epoch=001 step=001550 train_loss=3.2238 lr=1.469194e-04
+epoch=001 step=001600 train_loss=3.1665 lr=1.516588e-04
+epoch=001 step=001650 train_loss=3.0983 lr=1.563981e-04
+epoch=001 step=001700 train_loss=3.0406 lr=1.611374e-04
+epoch=001 step=001750 train_loss=2.9943 lr=1.658768e-04
+epoch=001 step=001800 train_loss=2.9474 lr=1.706161e-04
+epoch=001 step=001850 train_loss=2.8880 lr=1.753555e-04
+epoch=001 step=001900 train_loss=2.8516 lr=1.800948e-04
+epoch=001 step=001950 train_loss=2.8169 lr=1.848341e-04
+Step checkpoint saved: epoch=001 step=002000 loss~2.7854
+epoch=001 step=002000 train_loss=2.7854 lr=1.895735e-04
+epoch=001 step=002050 train_loss=2.7574 lr=1.943128e-04
+epoch=001 step=002100 train_loss=2.7148 lr=1.990521e-04
+epoch=001 step=002150 train_loss=2.6977 lr=1.999997e-04
+epoch=001 step=002200 train_loss=2.6649 lr=1.999983e-04
+epoch=001 step=002250 train_loss=2.6285 lr=1.999960e-04
+epoch=001 step=002300 train_loss=2.6148 lr=1.999925e-04
+epoch=001 step=002350 train_loss=2.5961 lr=1.999881e-04
+epoch=001 step=002400 train_loss=2.5664 lr=1.999826e-04
+Epoch 001 | train_loss=3.7736 | val_loss=2.5591 | ppl=12.92
+/usr/local/lib/python3.12/dist-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
+  return func(*args, **kwargs)
+epoch=002 step=002450 train_loss=2.5690 lr=1.999761e-04
+epoch=002 step=002500 train_loss=2.5478 lr=1.999686e-04
+epoch=002 step=002550 train_loss=2.5284 lr=1.999600e-04
+epoch=002 step=002600 train_loss=2.5071 lr=1.999505e-04
+epoch=002 step=002650 train_loss=2.4980 lr=1.999398e-04
+epoch=002 step=002700 train_loss=2.4797 lr=1.999282e-04
+epoch=002 step=002750 train_loss=2.4798 lr=1.999155e-04
+epoch=002 step=002800 train_loss=2.4632 lr=1.999018e-04
+epoch=002 step=002850 train_loss=2.4440 lr=1.998870e-04
+epoch=002 step=002900 train_loss=2.4474 lr=1.998712e-04
+epoch=002 step=002950 train_loss=2.4423 lr=1.998544e-04
+Step checkpoint saved: epoch=002 step=003000 loss~2.4265
+#NOTE: This model was trained in two separate sessions This is where session two began.
+epoch=003 step=003050 train_loss=2.4215 lr=1.998177e-04
+epoch=003 step=003100 train_loss=2.4032 lr=1.997978e-04
+epoch=003 step=003150 train_loss=2.4000 lr=1.997769e-04
+epoch=003 step=003200 train_loss=2.3878 lr=1.997549e-04
+epoch=003 step=003250 train_loss=2.3915 lr=1.997319e-04
+epoch=003 step=003300 train_loss=2.3817 lr=1.997079e-04
+epoch=003 step=003350 train_loss=2.3815 lr=1.996829e-04
+epoch=003 step=003400 train_loss=2.3769 lr=1.996568e-04
+epoch=003 step=003450 train_loss=2.3592 lr=1.996297e-04
+epoch=003 step=003500 train_loss=2.3582 lr=1.996016e-04
+epoch=003 step=003550 train_loss=2.3433 lr=1.995724e-04
+epoch=003 step=003600 train_loss=2.3343 lr=1.995422e-04
+epoch=003 step=003650 train_loss=2.3458 lr=1.995110e-04
+epoch=003 step=003700 train_loss=2.3448 lr=1.994788e-04
+epoch=003 step=003750 train_loss=2.3167 lr=1.994455e-04
+epoch=003 step=003800 train_loss=2.3137 lr=1.994112e-04
+epoch=003 step=003850 train_loss=2.3265 lr=1.993759e-04
+epoch=003 step=003900 train_loss=2.3250 lr=1.993396e-04
+epoch=003 step=003950 train_loss=2.3083 lr=1.993022e-04
+Step checkpoint saved: epoch=003 step=004000 loss~2.3077
+epoch=003 step=004000 train_loss=2.3077 lr=1.992638e-04
+epoch=003 step=004050 train_loss=2.3036 lr=1.992244e-04
+epoch=003 step=004100 train_loss=2.2895 lr=1.991840e-04
+epoch=003 step=004150 train_loss=2.2808 lr=1.991425e-04
+epoch=003 step=004200 train_loss=2.2893 lr=1.991000e-04
+epoch=003 step=004250 train_loss=2.2891 lr=1.990565e-04
+epoch=003 step=004300 train_loss=2.2760 lr=1.990120e-04
+epoch=003 step=004350 train_loss=2.2777 lr=1.989665e-04
+epoch=003 step=004400 train_loss=2.2768 lr=1.989199e-04
+epoch=003 step=004450 train_loss=2.2673 lr=1.988723e-04
+epoch=003 step=004500 train_loss=2.2637 lr=1.988237e-04
+epoch=003 step=004550 train_loss=2.2659 lr=1.987741e-04
+epoch=003 step=004600 train_loss=2.2515 lr=1.987235e-04
+epoch=003 step=004650 train_loss=2.2331 lr=1.986718e-04
+epoch=003 step=004700 train_loss=2.2511 lr=1.986191e-04
+epoch=003 step=004750 train_loss=2.2188 lr=1.985655e-04
+epoch=003 step=004800 train_loss=2.2482 lr=1.985108e-04
+epoch=003 step=004850 train_loss=2.2448 lr=1.984550e-04
+epoch=003 step=004900 train_loss=2.2246 lr=1.983983e-04
+epoch=003 step=004950 train_loss=2.2188 lr=1.983406e-04
+Step checkpoint saved: epoch=003 step=005000 loss~2.2113
+epoch=003 step=005000 train_loss=2.2113 lr=1.982818e-04
+epoch=003 step=005050 train_loss=2.2207 lr=1.982220e-04
+epoch=003 step=005100 train_loss=2.2103 lr=1.981613e-04
+epoch=003 step=005150 train_loss=2.1865 lr=1.980995e-04
+epoch=003 step=005200 train_loss=2.1997 lr=1.980367e-04
+epoch=003 step=005250 train_loss=2.2105 lr=1.979729e-04
+epoch=003 step=005300 train_loss=2.1968 lr=1.979080e-04
+epoch=003 step=005350 train_loss=2.1893 lr=1.978422e-04
+epoch=003 step=005400 train_loss=2.1870 lr=1.977754e-04
+Epoch 003 | train_loss=2.2880 | val_loss=2.1674 | ppl=8.74
+Epoch bundle saved: /kaggle/working/outputs/sub100m_e_40m_100k/kaggle_model_exports/epoch_003
+/usr/local/lib/python3.12/dist-packages/torch/distributed/c10d_logger.py:83: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
+  return func(*args, **kwargs)
+epoch=004 step=005450 train_loss=2.1949 lr=1.977075e-04
+epoch=004 step=005500 train_loss=2.1765 lr=1.976387e-04
+epoch=004 step=005550 train_loss=2.1790 lr=1.975688e-04
+epoch=004 step=005600 train_loss=2.1691 lr=1.974980e-04
+epoch=004 step=005650 train_loss=2.1764 lr=1.974261e-04
+epoch=004 step=005700 train_loss=2.1610 lr=1.973533e-04
+epoch=004 step=005750 train_loss=2.1606 lr=1.972794e-04
+epoch=004 step=005800 train_loss=2.1540 lr=1.972045e-04
+epoch=004 step=005850 train_loss=2.1448 lr=1.971287e-04
+epoch=004 step=005900 train_loss=2.1464 lr=1.970518e-04
+epoch=004 step=005950 train_loss=2.1550 lr=1.969739e-04
+Step checkpoint saved: epoch=004 step=006000 loss~2.1383
+epoch=004 step=006000 train_loss=2.1383 lr=1.968951e-04
+epoch=004 step=006050 train_loss=2.1607 lr=1.968152e-04
+epoch=004 step=006100 train_loss=2.1325 lr=1.967344e-04
+epoch=004 step=006150 train_loss=2.1383 lr=1.966525e-04
+epoch=004 step=006200 train_loss=2.1425 lr=1.965697e-04
+epoch=004 step=006250 train_loss=2.1288 lr=1.964859e-04
+epoch=004 step=006300 train_loss=2.1408 lr=1.964011e-04
+epoch=004 step=006350 train_loss=2.1292 lr=1.963152e-04
+epoch=004 step=006400 train_loss=2.1468 lr=1.962284e-04
+epoch=004 step=006450 train_loss=2.1370 lr=1.961406e-04
+epoch=004 step=006500 train_loss=2.1126 lr=1.960519e-04
+epoch=004 step=006550 train_loss=2.1325 lr=1.959621e-04
+epoch=004 step=006600 train_loss=2.1084 lr=1.958714e-04
+epoch=004 step=006650 train_loss=2.1124 lr=1.957796e-04
+epoch=004 step=006700 train_loss=2.1193 lr=1.956869e-04
+epoch=004 step=006750 train_loss=2.1147 lr=1.955932e-04
+epoch=004 step=006800 train_loss=2.1131 lr=1.954985e-04
+epoch=004 step=006850 train_loss=2.0983 lr=1.954029e-04
+epoch=004 step=006900 train_loss=2.1052 lr=1.953062e-04
+epoch=004 step=006950 train_loss=2.1126 lr=1.952086e-04
+Step checkpoint saved: epoch=004 step=007000 loss~2.1097
+epoch=004 step=007000 train_loss=2.1097 lr=1.951100e-04