download
raw
5.81 kB
W0320 14:08:03.383000 201 site-packages/torch/distributed/run.py:774]
W0320 14:08:03.383000 201 site-packages/torch/distributed/run.py:774] *****************************************
W0320 14:08:03.383000 201 site-packages/torch/distributed/run.py:774] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W0320 14:08:03.383000 201 site-packages/torch/distributed/run.py:774] *****************************************
logs/0be57f57-ecc5-437f-9704-92a56669b314.txt
val_bpb:enabled tokenizer_kind=sentencepiece tokenizer_path=./data/tokenizers/fineweb_1024_bpe.model
train_loader:dataset:fineweb10B_sp1024 train_shards:10
val_loader:shards pattern=./data/datasets/fineweb10B_sp1024/fineweb_val_*.bin tokens:62021632
model_params:17059912
world_size:8 grad_accum_steps:1
sdp_backends:cudnn=False flash=True mem_efficient=False math=False
attention_mode:gqa num_heads:8 num_kv_heads:4
tie_embeddings:True embed_lr:0.05 head_lr:0.0 matrix_lr:0.04 scalar_lr:0.04
train_batch_tokens:524288 train_seq_len:1024 iterations:20000 warmup_steps:20 max_wallclock_seconds:600.000
seed:1337
warmup_step:1/20
warmup_step:2/20
warmup_step:3/20
warmup_step:4/20
warmup_step:5/20
warmup_step:6/20
warmup_step:7/20
warmup_step:8/20
warmup_step:9/20
warmup_step:10/20
warmup_step:11/20
warmup_step:12/20
warmup_step:13/20
warmup_step:14/20
warmup_step:15/20
warmup_step:16/20
warmup_step:17/20
warmup_step:18/20
warmup_step:19/20
warmup_step:20/20
step:0/20000 val_loss:6.9357 val_bpb:4.1077 train_time:0ms step_avg:0.02ms
step:1/20000 train_loss:6.9370 train_time:43ms step_avg:42.53ms
step:2/20000 train_loss:16.8372 train_time:112ms step_avg:55.83ms
step:3/20000 train_loss:8.7578 train_time:197ms step_avg:65.70ms
step:4/20000 train_loss:6.6381 train_time:283ms step_avg:70.81ms
step:5/20000 train_loss:6.6135 train_time:370ms step_avg:73.91ms
step:6/20000 train_loss:7.4195 train_time:456ms step_avg:76.05ms
step:7/20000 train_loss:6.3500 train_time:542ms step_avg:77.40ms
step:8/20000 train_loss:6.1583 train_time:628ms step_avg:78.52ms
step:9/20000 train_loss:6.0681 train_time:714ms step_avg:79.31ms
step:10/20000 train_loss:5.9742 train_time:800ms step_avg:79.96ms
step:200/20000 train_loss:2.8597 train_time:17263ms step_avg:86.31ms
step:400/20000 train_loss:2.3606 train_time:34602ms step_avg:86.50ms
step:600/20000 train_loss:2.5491 train_time:52025ms step_avg:86.71ms
step:800/20000 train_loss:2.2949 train_time:69482ms step_avg:86.85ms
step:1000/20000 train_loss:2.3716 train_time:87109ms step_avg:87.11ms
step:1000/20000 val_loss:2.3347 val_bpb:1.3827 train_time:87170ms step_avg:87.17ms
step:1200/20000 train_loss:2.3898 train_time:104759ms step_avg:87.30ms
step:1400/20000 train_loss:2.4342 train_time:122472ms step_avg:87.48ms
step:1600/20000 train_loss:2.0993 train_time:140158ms step_avg:87.60ms
step:1800/20000 train_loss:2.2019 train_time:157891ms step_avg:87.72ms
step:2000/20000 train_loss:2.2352 train_time:175573ms step_avg:87.79ms
step:2000/20000 val_loss:2.2369 val_bpb:1.3248 train_time:175634ms step_avg:87.82ms
step:2200/20000 train_loss:2.3416 train_time:193361ms step_avg:87.89ms
step:2400/20000 train_loss:2.3567 train_time:211092ms step_avg:87.95ms
step:2600/20000 train_loss:2.2096 train_time:228777ms step_avg:87.99ms
step:2800/20000 train_loss:2.1666 train_time:246501ms step_avg:88.04ms
step:3000/20000 train_loss:3.1795 train_time:264197ms step_avg:88.07ms
step:3000/20000 val_loss:2.1965 val_bpb:1.3009 train_time:264258ms step_avg:88.09ms
step:3200/20000 train_loss:2.2668 train_time:281896ms step_avg:88.09ms
step:3400/20000 train_loss:2.0981 train_time:299636ms step_avg:88.13ms
step:3600/20000 train_loss:2.2011 train_time:317336ms step_avg:88.15ms
step:3800/20000 train_loss:2.1505 train_time:335073ms step_avg:88.18ms
step:4000/20000 train_loss:2.2753 train_time:352772ms step_avg:88.19ms
step:4000/20000 val_loss:2.1711 val_bpb:1.2859 train_time:352834ms step_avg:88.21ms
step:4200/20000 train_loss:2.2171 train_time:370547ms step_avg:88.23ms
step:4400/20000 train_loss:2.1655 train_time:388292ms step_avg:88.25ms
step:4600/20000 train_loss:2.2046 train_time:405972ms step_avg:88.25ms
step:4800/20000 train_loss:2.1370 train_time:423730ms step_avg:88.28ms
step:5000/20000 train_loss:2.2243 train_time:441431ms step_avg:88.29ms
step:5000/20000 val_loss:2.1566 val_bpb:1.2773 train_time:441491ms step_avg:88.30ms
step:5200/20000 train_loss:2.2862 train_time:459160ms step_avg:88.30ms
step:5400/20000 train_loss:2.2339 train_time:476891ms step_avg:88.31ms
step:5600/20000 train_loss:2.1392 train_time:494622ms step_avg:88.33ms
step:5800/20000 train_loss:2.1788 train_time:512397ms step_avg:88.34ms
step:6000/20000 train_loss:2.0872 train_time:530012ms step_avg:88.34ms
step:6000/20000 val_loss:2.1290 val_bpb:1.2609 train_time:530075ms step_avg:88.35ms
step:6200/20000 train_loss:2.0678 train_time:547723ms step_avg:88.34ms
step:6400/20000 train_loss:1.8506 train_time:565459ms step_avg:88.35ms
step:6600/20000 train_loss:2.0477 train_time:583137ms step_avg:88.35ms
step:6791/20000 val_loss:2.0926 val_bpb:1.2394 train_time:600109ms step_avg:88.37ms
stopping_early: wallclock_cap train_time:600109ms step:6791/20000
peak memory allocated: 10199 MiB reserved: 10248 MiB
Serialized model: 67224983 bytes
Code size: 58509 bytes
Total submission size: 67283492 bytes
Serialized model int8+zlib: 15810699 bytes (payload:17178912 raw_torch:17224025 payload_ratio:3.91x)
Total submission size int8+zlib: 15869208 bytes
final_int8_zlib_roundtrip val_loss:2.1023 val_bpb:1.2451 eval_time:2945ms
final_int8_zlib_roundtrip_exact val_loss:2.10231862 val_bpb:1.24511151
final_int8_ttt_lora val_loss:2.0439 val_bpb:1.2105 eval_time:142925ms

Xet Storage Details

Size:
5.81 kB
·
Xet hash:
a6be29cf995deb547791687e461a3d48c9d0df0f3689dc5ff38f867c88b04784

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.