Buckets:
| W0323 09:26:38.186000 715 torch/distributed/run.py:852] | |
| W0323 09:26:38.186000 715 torch/distributed/run.py:852] ***************************************** | |
| W0323 09:26:38.186000 715 torch/distributed/run.py:852] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
| W0323 09:26:38.186000 715 torch/distributed/run.py:852] ***************************************** | |
| logs/1d087ef9-4b9c-42f5-a5a4-daeebf0d3141.txt | |
| val_bpb:enabled tokenizer_kind=sentencepiece tokenizer_path=./data/tokenizers/fineweb_1024_bpe.model | |
| train_loader:dataset:fineweb10B_sp1024 train_shards:10 | |
| val_loader:shards pattern=./data/datasets/fineweb10B_sp1024/fineweb_val_*.bin tokens:62021632 | |
| model_params:17059912 | |
| world_size:8 grad_accum_steps:1 | |
| sdp_backends:cudnn=False flash=True mem_efficient=False math=False | |
| attention_mode:gqa num_heads:8 num_kv_heads:4 | |
| tie_embeddings:True embed_lr:0.05 head_lr:0.0 matrix_lr:0.04 scalar_lr:0.04 | |
| train_batch_tokens:524288 train_seq_len:1024 iterations:20000 warmup_steps:20 max_wallclock_seconds:600.000 | |
| seed:1337 | |
| warmup_step:1/20 | |
| warmup_step:2/20 | |
| warmup_step:3/20 | |
| warmup_step:4/20 | |
| warmup_step:5/20 | |
| warmup_step:6/20 | |
| warmup_step:7/20 | |
| warmup_step:8/20 | |
| warmup_step:9/20 | |
| warmup_step:10/20 | |
| warmup_step:11/20 | |
| warmup_step:12/20 | |
| warmup_step:13/20 | |
| warmup_step:14/20 | |
| warmup_step:15/20 | |
| warmup_step:16/20 | |
| warmup_step:17/20 | |
| warmup_step:18/20 | |
| warmup_step:19/20 | |
| warmup_step:20/20 | |
| step:0/20000 val_loss:6.9357 val_bpb:4.1077 train_time:0ms step_avg:0.02ms | |
| step:1/20000 train_loss:6.9370 train_time:23ms step_avg:23.04ms | |
| step:2/20000 train_loss:16.8366 train_time:61ms step_avg:30.56ms | |
| step:3/20000 train_loss:8.7609 train_time:99ms step_avg:33.13ms | |
| step:4/20000 train_loss:6.6385 train_time:138ms step_avg:34.45ms | |
| step:5/20000 train_loss:6.6117 train_time:176ms step_avg:35.27ms | |
| step:6/20000 train_loss:7.4219 train_time:215ms step_avg:35.85ms | |
| step:7/20000 train_loss:6.3504 train_time:253ms step_avg:36.18ms | |
| step:8/20000 train_loss:6.1580 train_time:292ms step_avg:36.48ms | |
| step:9/20000 train_loss:6.0678 train_time:330ms step_avg:36.64ms | |
| step:10/20000 train_loss:5.9744 train_time:368ms step_avg:36.85ms | |
| step:200/20000 train_loss:2.8515 train_time:7777ms step_avg:38.89ms | |
| step:400/20000 train_loss:2.3627 train_time:15582ms step_avg:38.96ms | |
| step:600/20000 train_loss:2.5524 train_time:23394ms step_avg:38.99ms | |
| step:800/20000 train_loss:2.2935 train_time:31212ms step_avg:39.01ms | |
| step:1000/20000 train_loss:2.3722 train_time:39028ms step_avg:39.03ms | |
| step:1000/20000 val_loss:2.3361 val_bpb:1.3836 train_time:39048ms step_avg:39.05ms | |
| step:1200/20000 train_loss:2.3900 train_time:46872ms step_avg:39.06ms | |
| step:1400/20000 train_loss:2.4339 train_time:54724ms step_avg:39.09ms | |
| step:1600/20000 train_loss:2.1009 train_time:62595ms step_avg:39.12ms | |
| step:1800/20000 train_loss:2.1977 train_time:70451ms step_avg:39.14ms | |
| step:2000/20000 train_loss:2.2363 train_time:78336ms step_avg:39.17ms | |
| step:2000/20000 val_loss:2.2366 val_bpb:1.3246 train_time:78357ms step_avg:39.18ms | |
| step:2200/20000 train_loss:2.3429 train_time:86311ms step_avg:39.23ms | |
| step:2400/20000 train_loss:2.3578 train_time:94258ms step_avg:39.27ms | |
| step:2600/20000 train_loss:2.2132 train_time:102215ms step_avg:39.31ms | |
| step:2800/20000 train_loss:2.1680 train_time:110180ms step_avg:39.35ms | |
| step:3000/20000 train_loss:3.1765 train_time:118152ms step_avg:39.38ms | |
| step:3000/20000 val_loss:2.1971 val_bpb:1.3013 train_time:118174ms step_avg:39.39ms | |
| step:3200/20000 train_loss:2.2656 train_time:126118ms step_avg:39.41ms | |
| step:3400/20000 train_loss:2.0951 train_time:134081ms step_avg:39.44ms | |
| step:3600/20000 train_loss:2.2049 train_time:142023ms step_avg:39.45ms | |
| step:3800/20000 train_loss:2.1506 train_time:149958ms step_avg:39.46ms | |
| step:4000/20000 train_loss:2.2749 train_time:157880ms step_avg:39.47ms | |
| step:4000/20000 val_loss:2.1707 val_bpb:1.2856 train_time:157903ms step_avg:39.48ms | |
| step:4200/20000 train_loss:2.2180 train_time:165885ms step_avg:39.50ms | |
| step:4400/20000 train_loss:2.1667 train_time:173809ms step_avg:39.50ms | |
| step:4600/20000 train_loss:2.2021 train_time:181738ms step_avg:39.51ms | |
| step:4800/20000 train_loss:2.1372 train_time:189681ms step_avg:39.52ms | |
| step:5000/20000 train_loss:2.2252 train_time:197633ms step_avg:39.53ms | |
| step:5000/20000 val_loss:2.1573 val_bpb:1.2777 train_time:197655ms step_avg:39.53ms | |
| step:5200/20000 train_loss:2.2864 train_time:205591ms step_avg:39.54ms | |
| step:5400/20000 train_loss:2.2367 train_time:213551ms step_avg:39.55ms | |
| step:5600/20000 train_loss:2.1411 train_time:221477ms step_avg:39.55ms | |
| step:5800/20000 train_loss:2.1883 train_time:229389ms step_avg:39.55ms | |
| step:6000/20000 train_loss:2.1000 train_time:237304ms step_avg:39.55ms | |
| step:6000/20000 val_loss:2.1444 val_bpb:1.2700 train_time:237325ms step_avg:39.55ms | |
| step:6200/20000 train_loss:2.0895 train_time:245233ms step_avg:39.55ms | |
| step:6400/20000 train_loss:1.8691 train_time:253178ms step_avg:39.56ms | |
| step:6600/20000 train_loss:2.0886 train_time:261139ms step_avg:39.57ms | |
| step:6800/20000 train_loss:2.1338 train_time:269114ms step_avg:39.58ms | |
| step:7000/20000 train_loss:2.0947 train_time:277078ms step_avg:39.58ms | |
| step:7000/20000 val_loss:2.1353 val_bpb:1.2647 train_time:277100ms step_avg:39.59ms | |
| step:7200/20000 train_loss:1.9731 train_time:285003ms step_avg:39.58ms | |
| step:7400/20000 train_loss:1.9078 train_time:292928ms step_avg:39.58ms | |
| step:7600/20000 train_loss:2.1631 train_time:300826ms step_avg:39.58ms | |
| step:7800/20000 train_loss:2.1269 train_time:308754ms step_avg:39.58ms | |
| step:8000/20000 train_loss:2.0576 train_time:316707ms step_avg:39.59ms | |
| step:8000/20000 val_loss:2.1290 val_bpb:1.2609 train_time:316730ms step_avg:39.59ms | |
| step:8200/20000 train_loss:2.1947 train_time:324663ms step_avg:39.59ms | |
| step:8400/20000 train_loss:2.1891 train_time:332686ms step_avg:39.61ms | |
| step:8600/20000 train_loss:2.2069 train_time:340601ms step_avg:39.60ms | |
| step:8800/20000 train_loss:2.0772 train_time:348567ms step_avg:39.61ms | |
| step:9000/20000 train_loss:2.1063 train_time:356480ms step_avg:39.61ms | |
| step:9000/20000 val_loss:2.1227 val_bpb:1.2572 train_time:356501ms step_avg:39.61ms | |
| step:9200/20000 train_loss:2.1819 train_time:364422ms step_avg:39.61ms | |
| step:9400/20000 train_loss:2.0182 train_time:372339ms step_avg:39.61ms | |
| step:9600/20000 train_loss:2.0148 train_time:380276ms step_avg:39.61ms | |
| step:9800/20000 train_loss:2.0697 train_time:388209ms step_avg:39.61ms | |
| step:10000/20000 train_loss:2.0241 train_time:396138ms step_avg:39.61ms | |
| step:10000/20000 val_loss:2.1184 val_bpb:1.2546 train_time:396161ms step_avg:39.62ms | |
| step:10200/20000 train_loss:2.1331 train_time:404089ms step_avg:39.62ms | |
| step:10400/20000 train_loss:2.1017 train_time:412025ms step_avg:39.62ms | |
| step:10600/20000 train_loss:2.0756 train_time:419950ms step_avg:39.62ms | |
| step:10800/20000 train_loss:2.1248 train_time:427887ms step_avg:39.62ms | |
| step:11000/20000 train_loss:2.1002 train_time:435814ms step_avg:39.62ms | |
| step:11000/20000 val_loss:2.1139 val_bpb:1.2520 train_time:435837ms step_avg:39.62ms | |
| step:11200/20000 train_loss:2.1328 train_time:443762ms step_avg:39.62ms | |
| step:11400/20000 train_loss:2.2185 train_time:451695ms step_avg:39.62ms | |
| step:11600/20000 train_loss:2.0862 train_time:459659ms step_avg:39.63ms | |
| step:11800/20000 train_loss:2.1698 train_time:467605ms step_avg:39.63ms | |
| step:12000/20000 train_loss:2.0754 train_time:475556ms step_avg:39.63ms | |
| step:12000/20000 val_loss:2.1082 val_bpb:1.2486 train_time:475578ms step_avg:39.63ms | |
| step:12200/20000 train_loss:2.2423 train_time:483522ms step_avg:39.63ms | |
| step:12400/20000 train_loss:2.0525 train_time:491519ms step_avg:39.64ms | |
| step:12600/20000 train_loss:2.0900 train_time:499453ms step_avg:39.64ms | |
| step:12800/20000 train_loss:2.0601 train_time:507372ms step_avg:39.64ms | |
| step:13000/20000 train_loss:1.9239 train_time:515328ms step_avg:39.64ms | |
| step:13000/20000 val_loss:2.1079 val_bpb:1.2484 train_time:515341ms step_avg:39.64ms | |
| step:13200/20000 train_loss:2.0757 train_time:523268ms step_avg:39.64ms | |
| step:13400/20000 train_loss:2.1089 train_time:531220ms step_avg:39.64ms | |
| step:13600/20000 train_loss:2.0693 train_time:539151ms step_avg:39.64ms | |
| step:13800/20000 train_loss:2.0852 train_time:547103ms step_avg:39.65ms | |
| step:14000/20000 train_loss:1.9272 train_time:555051ms step_avg:39.65ms | |
| step:14000/20000 val_loss:2.1021 val_bpb:1.2450 train_time:555073ms step_avg:39.65ms | |
| step:14200/20000 train_loss:1.9886 train_time:562996ms step_avg:39.65ms | |
| step:14400/20000 train_loss:1.9930 train_time:570920ms step_avg:39.65ms | |
| step:14600/20000 train_loss:1.9616 train_time:578860ms step_avg:39.65ms | |
| step:14800/20000 train_loss:2.0628 train_time:586789ms step_avg:39.65ms | |
| step:15000/20000 train_loss:2.1184 train_time:594735ms step_avg:39.65ms | |
| step:15000/20000 val_loss:2.0662 val_bpb:1.2237 train_time:594757ms step_avg:39.65ms | |
| step:15133/20000 val_loss:2.0631 val_bpb:1.2219 train_time:600032ms step_avg:39.65ms | |
| stopping_early: wallclock_cap train_time:600032ms step:15133/20000 | |
| peak memory allocated: 10184 MiB reserved: 10478 MiB | |
| Serialized model: 67224983 bytes | |
| Code size: 47686 bytes | |
| Total submission size: 67272669 bytes | |
| Serialized model int8+zlib: 15820150 bytes (payload:17178912 raw_torch:17224025 payload_ratio:3.91x) | |
| Total submission size int8+zlib: 15867836 bytes | |
| final_int8_zlib_roundtrip val_loss:2.0765 val_bpb:1.2298 eval_time:1235ms | |
| final_int8_zlib_roundtrip_exact val_loss:2.07651564 val_bpb:1.22982953 | |
Xet Storage Details
- Size:
- 9.44 kB
- Xet hash:
- 38877cae6d0c0ab469f18164ba81c32d92a4e4952c94da66fbb6d67995073fbb
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.