Buckets:
| W0320 22:33:11.396000 687 torch/distributed/run.py:852] | |
| W0320 22:33:11.396000 687 torch/distributed/run.py:852] ***************************************** | |
| W0320 22:33:11.396000 687 torch/distributed/run.py:852] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
| W0320 22:33:11.396000 687 torch/distributed/run.py:852] ***************************************** | |
| logs/893fa756-e373-4d35-9677-3a642348d9d7.txt | |
| val_bpb:enabled tokenizer_kind=sentencepiece tokenizer_path=./data/tokenizers/fineweb_1024_bpe.model | |
| train_loader:dataset:fineweb10B_sp1024 train_shards:10 | |
| val_loader:shards pattern=./data/datasets/fineweb10B_sp1024/fineweb_val_*.bin tokens:62021632 | |
| model_params:17059912 | |
| world_size:8 grad_accum_steps:1 | |
| sdp_backends:cudnn=False flash=True mem_efficient=False math=False | |
| attention_mode:gqa num_heads:8 num_kv_heads:4 | |
| tie_embeddings:True embed_lr:0.05 head_lr:0.0 matrix_lr:0.04 scalar_lr:0.04 | |
| train_batch_tokens:524288 train_seq_len:1024 iterations:20000 warmup_steps:20 max_wallclock_seconds:600.000 | |
| seed:1337 | |
| warmup_step:1/20 | |
| warmup_step:2/20 | |
| warmup_step:3/20 | |
| warmup_step:4/20 | |
| warmup_step:5/20 | |
| warmup_step:6/20 | |
| warmup_step:7/20 | |
| warmup_step:8/20 | |
| warmup_step:9/20 | |
| warmup_step:10/20 | |
| warmup_step:11/20 | |
| warmup_step:12/20 | |
| warmup_step:13/20 | |
| warmup_step:14/20 | |
| warmup_step:15/20 | |
| warmup_step:16/20 | |
| warmup_step:17/20 | |
| warmup_step:18/20 | |
| warmup_step:19/20 | |
| warmup_step:20/20 | |
| step:0/20000 val_loss:6.9357 val_bpb:4.1077 train_time:0ms step_avg:0.02ms | |
| step:1/20000 train_loss:6.9370 train_time:22ms step_avg:22.23ms | |
| step:2/20000 train_loss:16.8367 train_time:62ms step_avg:30.79ms | |
| step:3/20000 train_loss:8.7608 train_time:100ms step_avg:33.44ms | |
| step:4/20000 train_loss:6.6384 train_time:139ms step_avg:34.83ms | |
| step:5/20000 train_loss:6.6117 train_time:178ms step_avg:35.58ms | |
| step:6/20000 train_loss:7.4217 train_time:217ms step_avg:36.15ms | |
| step:7/20000 train_loss:6.3501 train_time:256ms step_avg:36.56ms | |
| step:8/20000 train_loss:6.1581 train_time:295ms step_avg:36.82ms | |
| step:9/20000 train_loss:6.0679 train_time:334ms step_avg:37.13ms | |
| step:10/20000 train_loss:5.9747 train_time:373ms step_avg:37.29ms | |
| step:200/20000 train_loss:2.8484 train_time:7925ms step_avg:39.62ms | |
| step:400/20000 train_loss:2.3576 train_time:15876ms step_avg:39.69ms | |
| step:600/20000 train_loss:2.5429 train_time:23828ms step_avg:39.71ms | |
| step:800/20000 train_loss:2.2935 train_time:31781ms step_avg:39.73ms | |
| step:1000/20000 train_loss:2.3741 train_time:39753ms step_avg:39.75ms | |
| step:1000/20000 val_loss:2.3353 val_bpb:1.3831 train_time:39775ms step_avg:39.77ms | |
| step:1200/20000 train_loss:2.3843 train_time:47750ms step_avg:39.79ms | |
| step:1400/20000 train_loss:2.4309 train_time:55747ms step_avg:39.82ms | |
| step:1600/20000 train_loss:2.0958 train_time:63767ms step_avg:39.85ms | |
| step:1800/20000 train_loss:2.1992 train_time:71784ms step_avg:39.88ms | |
| step:2000/20000 train_loss:2.2314 train_time:79870ms step_avg:39.94ms | |
| step:2000/20000 val_loss:2.2370 val_bpb:1.3249 train_time:79892ms step_avg:39.95ms | |
| step:2200/20000 train_loss:2.3417 train_time:87918ms step_avg:39.96ms | |
| step:2400/20000 train_loss:2.3566 train_time:95969ms step_avg:39.99ms | |
| step:2600/20000 train_loss:2.2132 train_time:104022ms step_avg:40.01ms | |
| step:2800/20000 train_loss:2.1651 train_time:112089ms step_avg:40.03ms | |
| step:3000/20000 train_loss:3.1850 train_time:120148ms step_avg:40.05ms | |
| step:3000/20000 val_loss:2.1977 val_bpb:1.3016 train_time:120170ms step_avg:40.06ms | |
| step:3200/20000 train_loss:2.2635 train_time:128197ms step_avg:40.06ms | |
| step:3400/20000 train_loss:2.0977 train_time:136258ms step_avg:40.08ms | |
| step:3600/20000 train_loss:2.2027 train_time:144318ms step_avg:40.09ms | |
| step:3800/20000 train_loss:2.1499 train_time:152355ms step_avg:40.09ms | |
| step:4000/20000 train_loss:2.2729 train_time:160416ms step_avg:40.10ms | |
| step:4000/20000 val_loss:2.1705 val_bpb:1.2855 train_time:160438ms step_avg:40.11ms | |
| step:4200/20000 train_loss:2.2190 train_time:168536ms step_avg:40.13ms | |
| step:4400/20000 train_loss:2.1711 train_time:176589ms step_avg:40.13ms | |
| step:4600/20000 train_loss:2.1991 train_time:184621ms step_avg:40.14ms | |
| step:4800/20000 train_loss:2.1399 train_time:192674ms step_avg:40.14ms | |
| step:5000/20000 train_loss:2.2225 train_time:200725ms step_avg:40.14ms | |
| step:5000/20000 val_loss:2.1569 val_bpb:1.2774 train_time:200747ms step_avg:40.15ms | |
| step:5200/20000 train_loss:2.2842 train_time:208780ms step_avg:40.15ms | |
| step:5400/20000 train_loss:2.2347 train_time:216841ms step_avg:40.16ms | |
| step:5600/20000 train_loss:2.1364 train_time:224878ms step_avg:40.16ms | |
| step:5800/20000 train_loss:2.1868 train_time:232942ms step_avg:40.16ms | |
| step:6000/20000 train_loss:2.1038 train_time:240997ms step_avg:40.17ms | |
| step:6000/20000 val_loss:2.1440 val_bpb:1.2698 train_time:241019ms step_avg:40.17ms | |
| step:6200/20000 train_loss:2.0893 train_time:249050ms step_avg:40.17ms | |
| step:6400/20000 train_loss:1.8733 train_time:257108ms step_avg:40.17ms | |
| step:6600/20000 train_loss:2.0848 train_time:265151ms step_avg:40.17ms | |
| step:6800/20000 train_loss:2.1318 train_time:273197ms step_avg:40.18ms | |
| step:7000/20000 train_loss:2.0892 train_time:281250ms step_avg:40.18ms | |
| step:7000/20000 val_loss:2.1354 val_bpb:1.2647 train_time:281272ms step_avg:40.18ms | |
| step:7200/20000 train_loss:1.9715 train_time:289302ms step_avg:40.18ms | |
| step:7400/20000 train_loss:1.9067 train_time:297364ms step_avg:40.18ms | |
| step:7600/20000 train_loss:2.1656 train_time:305406ms step_avg:40.18ms | |
| step:7800/20000 train_loss:2.1265 train_time:313466ms step_avg:40.19ms | |
| step:8000/20000 train_loss:2.0572 train_time:321523ms step_avg:40.19ms | |
| step:8000/20000 val_loss:2.1280 val_bpb:1.2603 train_time:321545ms step_avg:40.19ms | |
| step:8200/20000 train_loss:2.1927 train_time:329577ms step_avg:40.19ms | |
| step:8400/20000 train_loss:2.1853 train_time:337700ms step_avg:40.20ms | |
| step:8600/20000 train_loss:2.2110 train_time:345760ms step_avg:40.20ms | |
| step:8800/20000 train_loss:2.0730 train_time:353821ms step_avg:40.21ms | |
| step:9000/20000 train_loss:2.0997 train_time:361880ms step_avg:40.21ms | |
| step:9000/20000 val_loss:2.1218 val_bpb:1.2567 train_time:361901ms step_avg:40.21ms | |
| step:9200/20000 train_loss:2.1788 train_time:369929ms step_avg:40.21ms | |
| step:9400/20000 train_loss:2.0175 train_time:377954ms step_avg:40.21ms | |
| step:9600/20000 train_loss:2.0124 train_time:386008ms step_avg:40.21ms | |
| step:9800/20000 train_loss:2.0677 train_time:394052ms step_avg:40.21ms | |
| step:10000/20000 train_loss:2.0207 train_time:402115ms step_avg:40.21ms | |
| step:10000/20000 val_loss:2.1180 val_bpb:1.2544 train_time:402137ms step_avg:40.21ms | |
| step:10200/20000 train_loss:2.1329 train_time:410164ms step_avg:40.21ms | |
| step:10400/20000 train_loss:2.0956 train_time:418207ms step_avg:40.21ms | |
| step:10600/20000 train_loss:2.0725 train_time:426263ms step_avg:40.21ms | |
| step:10800/20000 train_loss:2.1240 train_time:434309ms step_avg:40.21ms | |
| step:11000/20000 train_loss:2.0970 train_time:442345ms step_avg:40.21ms | |
| step:11000/20000 val_loss:2.1134 val_bpb:1.2516 train_time:442367ms step_avg:40.22ms | |
| step:11200/20000 train_loss:2.1306 train_time:450397ms step_avg:40.21ms | |
| step:11400/20000 train_loss:2.2157 train_time:458436ms step_avg:40.21ms | |
| step:11600/20000 train_loss:2.0888 train_time:466478ms step_avg:40.21ms | |
| step:11800/20000 train_loss:2.1709 train_time:474538ms step_avg:40.22ms | |
| step:12000/20000 train_loss:2.0764 train_time:482591ms step_avg:40.22ms | |
| step:12000/20000 val_loss:2.1079 val_bpb:1.2484 train_time:482613ms step_avg:40.22ms | |
| step:12200/20000 train_loss:2.2435 train_time:490655ms step_avg:40.22ms | |
| step:12400/20000 train_loss:2.0507 train_time:498788ms step_avg:40.22ms | |
| step:12600/20000 train_loss:2.0905 train_time:506835ms step_avg:40.23ms | |
| step:12800/20000 train_loss:2.0603 train_time:514893ms step_avg:40.23ms | |
| step:13000/20000 train_loss:1.9274 train_time:522961ms step_avg:40.23ms | |
| step:13000/20000 val_loss:2.1074 val_bpb:1.2481 train_time:522982ms step_avg:40.23ms | |
| step:13200/20000 train_loss:2.0786 train_time:530994ms step_avg:40.23ms | |
| step:13400/20000 train_loss:2.1084 train_time:539057ms step_avg:40.23ms | |
| step:13600/20000 train_loss:2.0703 train_time:547114ms step_avg:40.23ms | |
| step:13800/20000 train_loss:2.0812 train_time:555170ms step_avg:40.23ms | |
| step:14000/20000 train_loss:1.9203 train_time:563231ms step_avg:40.23ms | |
| step:14000/20000 val_loss:2.0953 val_bpb:1.2410 train_time:563254ms step_avg:40.23ms | |
| step:14200/20000 train_loss:1.9832 train_time:571288ms step_avg:40.23ms | |
| step:14400/20000 train_loss:1.9830 train_time:579347ms step_avg:40.23ms | |
| step:14600/20000 train_loss:1.9586 train_time:587404ms step_avg:40.23ms | |
| step:14800/20000 train_loss:2.0545 train_time:595467ms step_avg:40.23ms | |
| step:14912/20000 val_loss:2.0629 val_bpb:1.2218 train_time:600032ms step_avg:40.24ms | |
| stopping_early: wallclock_cap train_time:600032ms step:14912/20000 | |
| peak memory allocated: 10185 MiB reserved: 10396 MiB | |
| Serialized model: 67224983 bytes | |
| Code size: 58509 bytes | |
| Total submission size: 67283492 bytes | |
| Serialized model int8+zlib: 15823490 bytes (payload:17178912 raw_torch:17224025 payload_ratio:3.91x) | |
| Total submission size int8+zlib: 15881999 bytes | |
| final_int8_zlib_roundtrip val_loss:2.0760 val_bpb:1.2295 eval_time:1258ms | |
| final_int8_zlib_roundtrip_exact val_loss:2.07596963 val_bpb:1.22950615 | |
| final_int8_ttt_lora val_loss:2.0166 val_bpb:1.1943 eval_time:54010ms | |
Xet Storage Details
- Size:
- 9.36 kB
- Xet hash:
- 558f8286b9526953cfc626aefa843ea734af80dcedeff384959b8cdee9bca469
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.