Buckets:
| W0320 11:40:16.629000 316 site-packages/torch/distributed/run.py:774] | |
| W0320 11:40:16.629000 316 site-packages/torch/distributed/run.py:774] ***************************************** | |
| W0320 11:40:16.629000 316 site-packages/torch/distributed/run.py:774] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
| W0320 11:40:16.629000 316 site-packages/torch/distributed/run.py:774] ***************************************** | |
| logs/1a179c2d-1386-4cb8-a527-6bec7eade73d.txt | |
| val_bpb:enabled tokenizer_kind=sentencepiece tokenizer_path=./data/tokenizers/fineweb_1024_bpe.model | |
| train_loader:dataset:fineweb10B_sp1024 train_shards:10 | |
| val_loader:shards pattern=./data/datasets/fineweb10B_sp1024/fineweb_val_*.bin tokens:62021632 | |
| model_params:17059912 | |
| world_size:8 grad_accum_steps:1 | |
| sdp_backends:cudnn=False flash=True mem_efficient=False math=False | |
| attention_mode:gqa num_heads:8 num_kv_heads:4 | |
| tie_embeddings:True embed_lr:0.05 head_lr:0.0 matrix_lr:0.04 scalar_lr:0.04 | |
| train_batch_tokens:524288 train_seq_len:1024 iterations:20000 warmup_steps:20 max_wallclock_seconds:600.000 | |
| seed:1337 | |
| warmup_step:1/20 | |
| warmup_step:2/20 | |
| warmup_step:3/20 | |
| warmup_step:4/20 | |
| warmup_step:5/20 | |
| warmup_step:6/20 | |
| warmup_step:7/20 | |
| warmup_step:8/20 | |
| warmup_step:9/20 | |
| warmup_step:10/20 | |
| warmup_step:11/20 | |
| warmup_step:12/20 | |
| warmup_step:13/20 | |
| warmup_step:14/20 | |
| warmup_step:15/20 | |
| warmup_step:16/20 | |
| warmup_step:17/20 | |
| warmup_step:18/20 | |
| warmup_step:19/20 | |
| warmup_step:20/20 | |
| step:0/20000 val_loss:6.9357 val_bpb:4.1077 train_time:0ms step_avg:0.02ms | |
| step:1/20000 train_loss:6.9370 train_time:21ms step_avg:21.47ms | |
| step:2/20000 train_loss:16.8366 train_time:62ms step_avg:30.94ms | |
| step:3/20000 train_loss:8.7600 train_time:101ms step_avg:33.78ms | |
| step:4/20000 train_loss:6.6381 train_time:141ms step_avg:35.29ms | |
| step:5/20000 train_loss:6.6113 train_time:181ms step_avg:36.19ms | |
| step:6/20000 train_loss:7.4219 train_time:221ms step_avg:36.83ms | |
| step:7/20000 train_loss:6.3505 train_time:261ms step_avg:37.29ms | |
| step:8/20000 train_loss:6.1582 train_time:301ms step_avg:37.63ms | |
| step:9/20000 train_loss:6.0680 train_time:341ms step_avg:37.93ms | |
| step:10/20000 train_loss:5.9744 train_time:381ms step_avg:38.10ms | |
| step:200/20000 train_loss:2.8530 train_time:8130ms step_avg:40.65ms | |
| step:400/20000 train_loss:2.3537 train_time:16309ms step_avg:40.77ms | |
| step:600/20000 train_loss:2.5516 train_time:24514ms step_avg:40.86ms | |
| step:800/20000 train_loss:2.2995 train_time:32714ms step_avg:40.89ms | |
| step:1000/20000 train_loss:2.3690 train_time:40914ms step_avg:40.91ms | |
| step:1000/20000 val_loss:2.3344 val_bpb:1.3826 train_time:40939ms step_avg:40.94ms | |
| step:1200/20000 train_loss:2.3870 train_time:49134ms step_avg:40.95ms | |
| step:1400/20000 train_loss:2.4326 train_time:57350ms step_avg:40.96ms | |
| step:1600/20000 train_loss:2.0961 train_time:65600ms step_avg:41.00ms | |
| step:1800/20000 train_loss:2.1996 train_time:73851ms step_avg:41.03ms | |
| step:2000/20000 train_loss:2.2356 train_time:82130ms step_avg:41.06ms | |
| step:2000/20000 val_loss:2.2363 val_bpb:1.3244 train_time:82154ms step_avg:41.08ms | |
| step:2200/20000 train_loss:2.3374 train_time:90414ms step_avg:41.10ms | |
| step:2400/20000 train_loss:2.3511 train_time:98719ms step_avg:41.13ms | |
| step:2600/20000 train_loss:2.2148 train_time:107007ms step_avg:41.16ms | |
| step:2800/20000 train_loss:2.1654 train_time:115280ms step_avg:41.17ms | |
| step:3000/20000 train_loss:3.1759 train_time:123527ms step_avg:41.18ms | |
| step:3000/20000 val_loss:2.1959 val_bpb:1.3005 train_time:123552ms step_avg:41.18ms | |
| step:3200/20000 train_loss:2.2647 train_time:131781ms step_avg:41.18ms | |
| step:3400/20000 train_loss:2.0971 train_time:140027ms step_avg:41.18ms | |
| step:3600/20000 train_loss:2.2035 train_time:148307ms step_avg:41.20ms | |
| step:3800/20000 train_loss:2.1494 train_time:156590ms step_avg:41.21ms | |
| step:4000/20000 train_loss:2.2699 train_time:164898ms step_avg:41.22ms | |
| step:4000/20000 val_loss:2.1700 val_bpb:1.2852 train_time:164922ms step_avg:41.23ms | |
| step:4200/20000 train_loss:2.2158 train_time:173241ms step_avg:41.25ms | |
| step:4400/20000 train_loss:2.1686 train_time:181487ms step_avg:41.25ms | |
| step:4600/20000 train_loss:2.2009 train_time:189725ms step_avg:41.24ms | |
| step:4800/20000 train_loss:2.1390 train_time:197977ms step_avg:41.25ms | |
| step:5000/20000 train_loss:2.2237 train_time:206248ms step_avg:41.25ms | |
| step:5000/20000 val_loss:2.1563 val_bpb:1.2771 train_time:206273ms step_avg:41.25ms | |
| step:5200/20000 train_loss:2.2860 train_time:214534ms step_avg:41.26ms | |
| step:5400/20000 train_loss:2.2321 train_time:222829ms step_avg:41.26ms | |
| step:5600/20000 train_loss:2.1371 train_time:231094ms step_avg:41.27ms | |
| step:5800/20000 train_loss:2.1859 train_time:239348ms step_avg:41.27ms | |
| step:6000/20000 train_loss:2.1024 train_time:247576ms step_avg:41.26ms | |
| step:6000/20000 val_loss:2.1443 val_bpb:1.2700 train_time:247600ms step_avg:41.27ms | |
| step:6200/20000 train_loss:2.0876 train_time:255834ms step_avg:41.26ms | |
| step:6400/20000 train_loss:1.8724 train_time:264081ms step_avg:41.26ms | |
| step:6600/20000 train_loss:2.0873 train_time:272339ms step_avg:41.26ms | |
| step:6800/20000 train_loss:2.1358 train_time:280586ms step_avg:41.26ms | |
| step:7000/20000 train_loss:2.0861 train_time:288861ms step_avg:41.27ms | |
| step:7000/20000 val_loss:2.1348 val_bpb:1.2643 train_time:288885ms step_avg:41.27ms | |
| step:7200/20000 train_loss:1.9738 train_time:297123ms step_avg:41.27ms | |
| step:7400/20000 train_loss:1.9076 train_time:305399ms step_avg:41.27ms | |
| step:7600/20000 train_loss:2.1622 train_time:313660ms step_avg:41.27ms | |
| step:7800/20000 train_loss:2.1278 train_time:321918ms step_avg:41.27ms | |
| step:8000/20000 train_loss:2.0545 train_time:330166ms step_avg:41.27ms | |
| step:8000/20000 val_loss:2.1276 val_bpb:1.2601 train_time:330190ms step_avg:41.27ms | |
| step:8200/20000 train_loss:2.1967 train_time:338409ms step_avg:41.27ms | |
| step:8400/20000 train_loss:2.1854 train_time:346724ms step_avg:41.28ms | |
| step:8600/20000 train_loss:2.2127 train_time:354970ms step_avg:41.28ms | |
| step:8800/20000 train_loss:2.0716 train_time:363247ms step_avg:41.28ms | |
| step:9000/20000 train_loss:2.1050 train_time:371507ms step_avg:41.28ms | |
| step:9000/20000 val_loss:2.1220 val_bpb:1.2568 train_time:371531ms step_avg:41.28ms | |
| step:9200/20000 train_loss:2.1821 train_time:379790ms step_avg:41.28ms | |
| step:9400/20000 train_loss:2.0183 train_time:388070ms step_avg:41.28ms | |
| step:9600/20000 train_loss:2.0162 train_time:396344ms step_avg:41.29ms | |
| step:9800/20000 train_loss:2.0704 train_time:404590ms step_avg:41.28ms | |
| step:10000/20000 train_loss:2.0226 train_time:412835ms step_avg:41.28ms | |
| step:10000/20000 val_loss:2.1183 val_bpb:1.2546 train_time:412860ms step_avg:41.29ms | |
| step:10200/20000 train_loss:2.1271 train_time:421067ms step_avg:41.28ms | |
| step:10400/20000 train_loss:2.1001 train_time:429329ms step_avg:41.28ms | |
| step:10600/20000 train_loss:2.0759 train_time:437581ms step_avg:41.28ms | |
| step:10800/20000 train_loss:2.1200 train_time:445858ms step_avg:41.28ms | |
| step:11000/20000 train_loss:2.0973 train_time:454140ms step_avg:41.29ms | |
| step:11000/20000 val_loss:2.1137 val_bpb:1.2519 train_time:454165ms step_avg:41.29ms | |
| step:11200/20000 train_loss:2.1343 train_time:462416ms step_avg:41.29ms | |
| step:11400/20000 train_loss:2.2197 train_time:470662ms step_avg:41.29ms | |
| step:11600/20000 train_loss:2.0828 train_time:478910ms step_avg:41.29ms | |
| step:11800/20000 train_loss:2.1688 train_time:487135ms step_avg:41.28ms | |
| step:12000/20000 train_loss:2.0737 train_time:495387ms step_avg:41.28ms | |
| step:12000/20000 val_loss:2.1079 val_bpb:1.2484 train_time:495412ms step_avg:41.28ms | |
| step:12200/20000 train_loss:2.2438 train_time:503646ms step_avg:41.28ms | |
| step:12400/20000 train_loss:2.0510 train_time:511978ms step_avg:41.29ms | |
| step:12600/20000 train_loss:2.0913 train_time:520269ms step_avg:41.29ms | |
| step:12800/20000 train_loss:2.0641 train_time:528544ms step_avg:41.29ms | |
| step:13000/20000 train_loss:1.9192 train_time:536816ms step_avg:41.29ms | |
| step:13000/20000 val_loss:2.1074 val_bpb:1.2481 train_time:536840ms step_avg:41.30ms | |
| step:13200/20000 train_loss:2.0736 train_time:545043ms step_avg:41.29ms | |
| step:13400/20000 train_loss:2.1090 train_time:553291ms step_avg:41.29ms | |
| step:13600/20000 train_loss:2.0642 train_time:561531ms step_avg:41.29ms | |
| step:13800/20000 train_loss:2.0711 train_time:569801ms step_avg:41.29ms | |
| step:14000/20000 train_loss:1.9067 train_time:578077ms step_avg:41.29ms | |
| step:14000/20000 val_loss:2.0823 val_bpb:1.2333 train_time:578102ms step_avg:41.29ms | |
| step:14200/20000 train_loss:1.9695 train_time:586371ms step_avg:41.29ms | |
| step:14400/20000 train_loss:1.9728 train_time:594632ms step_avg:41.29ms | |
| step:14530/20000 val_loss:2.0637 val_bpb:1.2223 train_time:600049ms step_avg:41.30ms | |
| stopping_early: wallclock_cap train_time:600049ms step:14530/20000 | |
| peak memory allocated: 10247 MiB reserved: 10460 MiB | |
| Serialized model: 67224983 bytes | |
| Code size: 58509 bytes | |
| Total submission size: 67283492 bytes | |
| Serialized model int8+zlib: 15813548 bytes (payload:17178912 raw_torch:17224025 payload_ratio:3.91x) | |
| Total submission size int8+zlib: 15872057 bytes | |
| final_int8_zlib_roundtrip val_loss:2.0773 val_bpb:1.2303 eval_time:1300ms | |
| final_int8_zlib_roundtrip_exact val_loss:2.07729490 val_bpb:1.23029105 | |
| final_int8_ttt_lora val_loss:2.0183 val_bpb:1.1954 eval_time:73029ms | |
Xet Storage Details
- Size:
- 9.27 kB
- Xet hash:
- 099840ac9f1ad7e557156f6072faedd6689777d1bc6b9ff8b24a9a8dbaea6398
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.