download
raw
9.36 kB
W0320 22:33:11.396000 687 torch/distributed/run.py:852]
W0320 22:33:11.396000 687 torch/distributed/run.py:852] *****************************************
W0320 22:33:11.396000 687 torch/distributed/run.py:852] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W0320 22:33:11.396000 687 torch/distributed/run.py:852] *****************************************
logs/893fa756-e373-4d35-9677-3a642348d9d7.txt
val_bpb:enabled tokenizer_kind=sentencepiece tokenizer_path=./data/tokenizers/fineweb_1024_bpe.model
train_loader:dataset:fineweb10B_sp1024 train_shards:10
val_loader:shards pattern=./data/datasets/fineweb10B_sp1024/fineweb_val_*.bin tokens:62021632
model_params:17059912
world_size:8 grad_accum_steps:1
sdp_backends:cudnn=False flash=True mem_efficient=False math=False
attention_mode:gqa num_heads:8 num_kv_heads:4
tie_embeddings:True embed_lr:0.05 head_lr:0.0 matrix_lr:0.04 scalar_lr:0.04
train_batch_tokens:524288 train_seq_len:1024 iterations:20000 warmup_steps:20 max_wallclock_seconds:600.000
seed:1337
warmup_step:1/20
warmup_step:2/20
warmup_step:3/20
warmup_step:4/20
warmup_step:5/20
warmup_step:6/20
warmup_step:7/20
warmup_step:8/20
warmup_step:9/20
warmup_step:10/20
warmup_step:11/20
warmup_step:12/20
warmup_step:13/20
warmup_step:14/20
warmup_step:15/20
warmup_step:16/20
warmup_step:17/20
warmup_step:18/20
warmup_step:19/20
warmup_step:20/20
step:0/20000 val_loss:6.9357 val_bpb:4.1077 train_time:0ms step_avg:0.02ms
step:1/20000 train_loss:6.9370 train_time:22ms step_avg:22.23ms
step:2/20000 train_loss:16.8367 train_time:62ms step_avg:30.79ms
step:3/20000 train_loss:8.7608 train_time:100ms step_avg:33.44ms
step:4/20000 train_loss:6.6384 train_time:139ms step_avg:34.83ms
step:5/20000 train_loss:6.6117 train_time:178ms step_avg:35.58ms
step:6/20000 train_loss:7.4217 train_time:217ms step_avg:36.15ms
step:7/20000 train_loss:6.3501 train_time:256ms step_avg:36.56ms
step:8/20000 train_loss:6.1581 train_time:295ms step_avg:36.82ms
step:9/20000 train_loss:6.0679 train_time:334ms step_avg:37.13ms
step:10/20000 train_loss:5.9747 train_time:373ms step_avg:37.29ms
step:200/20000 train_loss:2.8484 train_time:7925ms step_avg:39.62ms
step:400/20000 train_loss:2.3576 train_time:15876ms step_avg:39.69ms
step:600/20000 train_loss:2.5429 train_time:23828ms step_avg:39.71ms
step:800/20000 train_loss:2.2935 train_time:31781ms step_avg:39.73ms
step:1000/20000 train_loss:2.3741 train_time:39753ms step_avg:39.75ms
step:1000/20000 val_loss:2.3353 val_bpb:1.3831 train_time:39775ms step_avg:39.77ms
step:1200/20000 train_loss:2.3843 train_time:47750ms step_avg:39.79ms
step:1400/20000 train_loss:2.4309 train_time:55747ms step_avg:39.82ms
step:1600/20000 train_loss:2.0958 train_time:63767ms step_avg:39.85ms
step:1800/20000 train_loss:2.1992 train_time:71784ms step_avg:39.88ms
step:2000/20000 train_loss:2.2314 train_time:79870ms step_avg:39.94ms
step:2000/20000 val_loss:2.2370 val_bpb:1.3249 train_time:79892ms step_avg:39.95ms
step:2200/20000 train_loss:2.3417 train_time:87918ms step_avg:39.96ms
step:2400/20000 train_loss:2.3566 train_time:95969ms step_avg:39.99ms
step:2600/20000 train_loss:2.2132 train_time:104022ms step_avg:40.01ms
step:2800/20000 train_loss:2.1651 train_time:112089ms step_avg:40.03ms
step:3000/20000 train_loss:3.1850 train_time:120148ms step_avg:40.05ms
step:3000/20000 val_loss:2.1977 val_bpb:1.3016 train_time:120170ms step_avg:40.06ms
step:3200/20000 train_loss:2.2635 train_time:128197ms step_avg:40.06ms
step:3400/20000 train_loss:2.0977 train_time:136258ms step_avg:40.08ms
step:3600/20000 train_loss:2.2027 train_time:144318ms step_avg:40.09ms
step:3800/20000 train_loss:2.1499 train_time:152355ms step_avg:40.09ms
step:4000/20000 train_loss:2.2729 train_time:160416ms step_avg:40.10ms
step:4000/20000 val_loss:2.1705 val_bpb:1.2855 train_time:160438ms step_avg:40.11ms
step:4200/20000 train_loss:2.2190 train_time:168536ms step_avg:40.13ms
step:4400/20000 train_loss:2.1711 train_time:176589ms step_avg:40.13ms
step:4600/20000 train_loss:2.1991 train_time:184621ms step_avg:40.14ms
step:4800/20000 train_loss:2.1399 train_time:192674ms step_avg:40.14ms
step:5000/20000 train_loss:2.2225 train_time:200725ms step_avg:40.14ms
step:5000/20000 val_loss:2.1569 val_bpb:1.2774 train_time:200747ms step_avg:40.15ms
step:5200/20000 train_loss:2.2842 train_time:208780ms step_avg:40.15ms
step:5400/20000 train_loss:2.2347 train_time:216841ms step_avg:40.16ms
step:5600/20000 train_loss:2.1364 train_time:224878ms step_avg:40.16ms
step:5800/20000 train_loss:2.1868 train_time:232942ms step_avg:40.16ms
step:6000/20000 train_loss:2.1038 train_time:240997ms step_avg:40.17ms
step:6000/20000 val_loss:2.1440 val_bpb:1.2698 train_time:241019ms step_avg:40.17ms
step:6200/20000 train_loss:2.0893 train_time:249050ms step_avg:40.17ms
step:6400/20000 train_loss:1.8733 train_time:257108ms step_avg:40.17ms
step:6600/20000 train_loss:2.0848 train_time:265151ms step_avg:40.17ms
step:6800/20000 train_loss:2.1318 train_time:273197ms step_avg:40.18ms
step:7000/20000 train_loss:2.0892 train_time:281250ms step_avg:40.18ms
step:7000/20000 val_loss:2.1354 val_bpb:1.2647 train_time:281272ms step_avg:40.18ms
step:7200/20000 train_loss:1.9715 train_time:289302ms step_avg:40.18ms
step:7400/20000 train_loss:1.9067 train_time:297364ms step_avg:40.18ms
step:7600/20000 train_loss:2.1656 train_time:305406ms step_avg:40.18ms
step:7800/20000 train_loss:2.1265 train_time:313466ms step_avg:40.19ms
step:8000/20000 train_loss:2.0572 train_time:321523ms step_avg:40.19ms
step:8000/20000 val_loss:2.1280 val_bpb:1.2603 train_time:321545ms step_avg:40.19ms
step:8200/20000 train_loss:2.1927 train_time:329577ms step_avg:40.19ms
step:8400/20000 train_loss:2.1853 train_time:337700ms step_avg:40.20ms
step:8600/20000 train_loss:2.2110 train_time:345760ms step_avg:40.20ms
step:8800/20000 train_loss:2.0730 train_time:353821ms step_avg:40.21ms
step:9000/20000 train_loss:2.0997 train_time:361880ms step_avg:40.21ms
step:9000/20000 val_loss:2.1218 val_bpb:1.2567 train_time:361901ms step_avg:40.21ms
step:9200/20000 train_loss:2.1788 train_time:369929ms step_avg:40.21ms
step:9400/20000 train_loss:2.0175 train_time:377954ms step_avg:40.21ms
step:9600/20000 train_loss:2.0124 train_time:386008ms step_avg:40.21ms
step:9800/20000 train_loss:2.0677 train_time:394052ms step_avg:40.21ms
step:10000/20000 train_loss:2.0207 train_time:402115ms step_avg:40.21ms
step:10000/20000 val_loss:2.1180 val_bpb:1.2544 train_time:402137ms step_avg:40.21ms
step:10200/20000 train_loss:2.1329 train_time:410164ms step_avg:40.21ms
step:10400/20000 train_loss:2.0956 train_time:418207ms step_avg:40.21ms
step:10600/20000 train_loss:2.0725 train_time:426263ms step_avg:40.21ms
step:10800/20000 train_loss:2.1240 train_time:434309ms step_avg:40.21ms
step:11000/20000 train_loss:2.0970 train_time:442345ms step_avg:40.21ms
step:11000/20000 val_loss:2.1134 val_bpb:1.2516 train_time:442367ms step_avg:40.22ms
step:11200/20000 train_loss:2.1306 train_time:450397ms step_avg:40.21ms
step:11400/20000 train_loss:2.2157 train_time:458436ms step_avg:40.21ms
step:11600/20000 train_loss:2.0888 train_time:466478ms step_avg:40.21ms
step:11800/20000 train_loss:2.1709 train_time:474538ms step_avg:40.22ms
step:12000/20000 train_loss:2.0764 train_time:482591ms step_avg:40.22ms
step:12000/20000 val_loss:2.1079 val_bpb:1.2484 train_time:482613ms step_avg:40.22ms
step:12200/20000 train_loss:2.2435 train_time:490655ms step_avg:40.22ms
step:12400/20000 train_loss:2.0507 train_time:498788ms step_avg:40.22ms
step:12600/20000 train_loss:2.0905 train_time:506835ms step_avg:40.23ms
step:12800/20000 train_loss:2.0603 train_time:514893ms step_avg:40.23ms
step:13000/20000 train_loss:1.9274 train_time:522961ms step_avg:40.23ms
step:13000/20000 val_loss:2.1074 val_bpb:1.2481 train_time:522982ms step_avg:40.23ms
step:13200/20000 train_loss:2.0786 train_time:530994ms step_avg:40.23ms
step:13400/20000 train_loss:2.1084 train_time:539057ms step_avg:40.23ms
step:13600/20000 train_loss:2.0703 train_time:547114ms step_avg:40.23ms
step:13800/20000 train_loss:2.0812 train_time:555170ms step_avg:40.23ms
step:14000/20000 train_loss:1.9203 train_time:563231ms step_avg:40.23ms
step:14000/20000 val_loss:2.0953 val_bpb:1.2410 train_time:563254ms step_avg:40.23ms
step:14200/20000 train_loss:1.9832 train_time:571288ms step_avg:40.23ms
step:14400/20000 train_loss:1.9830 train_time:579347ms step_avg:40.23ms
step:14600/20000 train_loss:1.9586 train_time:587404ms step_avg:40.23ms
step:14800/20000 train_loss:2.0545 train_time:595467ms step_avg:40.23ms
step:14912/20000 val_loss:2.0629 val_bpb:1.2218 train_time:600032ms step_avg:40.24ms
stopping_early: wallclock_cap train_time:600032ms step:14912/20000
peak memory allocated: 10185 MiB reserved: 10396 MiB
Serialized model: 67224983 bytes
Code size: 58509 bytes
Total submission size: 67283492 bytes
Serialized model int8+zlib: 15823490 bytes (payload:17178912 raw_torch:17224025 payload_ratio:3.91x)
Total submission size int8+zlib: 15881999 bytes
final_int8_zlib_roundtrip val_loss:2.0760 val_bpb:1.2295 eval_time:1258ms
final_int8_zlib_roundtrip_exact val_loss:2.07596963 val_bpb:1.22950615
final_int8_ttt_lora val_loss:2.0166 val_bpb:1.1943 eval_time:54010ms

Xet Storage Details

Size:
9.36 kB
·
Xet hash:
558f8286b9526953cfc626aefa843ea734af80dcedeff384959b8cdee9bca469

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.