Buckets:
| W0320 14:08:03.383000 201 site-packages/torch/distributed/run.py:774] | |
| W0320 14:08:03.383000 201 site-packages/torch/distributed/run.py:774] ***************************************** | |
| W0320 14:08:03.383000 201 site-packages/torch/distributed/run.py:774] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. | |
| W0320 14:08:03.383000 201 site-packages/torch/distributed/run.py:774] ***************************************** | |
| logs/0be57f57-ecc5-437f-9704-92a56669b314.txt | |
| val_bpb:enabled tokenizer_kind=sentencepiece tokenizer_path=./data/tokenizers/fineweb_1024_bpe.model | |
| train_loader:dataset:fineweb10B_sp1024 train_shards:10 | |
| val_loader:shards pattern=./data/datasets/fineweb10B_sp1024/fineweb_val_*.bin tokens:62021632 | |
| model_params:17059912 | |
| world_size:8 grad_accum_steps:1 | |
| sdp_backends:cudnn=False flash=True mem_efficient=False math=False | |
| attention_mode:gqa num_heads:8 num_kv_heads:4 | |
| tie_embeddings:True embed_lr:0.05 head_lr:0.0 matrix_lr:0.04 scalar_lr:0.04 | |
| train_batch_tokens:524288 train_seq_len:1024 iterations:20000 warmup_steps:20 max_wallclock_seconds:600.000 | |
| seed:1337 | |
| warmup_step:1/20 | |
| warmup_step:2/20 | |
| warmup_step:3/20 | |
| warmup_step:4/20 | |
| warmup_step:5/20 | |
| warmup_step:6/20 | |
| warmup_step:7/20 | |
| warmup_step:8/20 | |
| warmup_step:9/20 | |
| warmup_step:10/20 | |
| warmup_step:11/20 | |
| warmup_step:12/20 | |
| warmup_step:13/20 | |
| warmup_step:14/20 | |
| warmup_step:15/20 | |
| warmup_step:16/20 | |
| warmup_step:17/20 | |
| warmup_step:18/20 | |
| warmup_step:19/20 | |
| warmup_step:20/20 | |
| step:0/20000 val_loss:6.9357 val_bpb:4.1077 train_time:0ms step_avg:0.02ms | |
| step:1/20000 train_loss:6.9370 train_time:43ms step_avg:42.53ms | |
| step:2/20000 train_loss:16.8372 train_time:112ms step_avg:55.83ms | |
| step:3/20000 train_loss:8.7578 train_time:197ms step_avg:65.70ms | |
| step:4/20000 train_loss:6.6381 train_time:283ms step_avg:70.81ms | |
| step:5/20000 train_loss:6.6135 train_time:370ms step_avg:73.91ms | |
| step:6/20000 train_loss:7.4195 train_time:456ms step_avg:76.05ms | |
| step:7/20000 train_loss:6.3500 train_time:542ms step_avg:77.40ms | |
| step:8/20000 train_loss:6.1583 train_time:628ms step_avg:78.52ms | |
| step:9/20000 train_loss:6.0681 train_time:714ms step_avg:79.31ms | |
| step:10/20000 train_loss:5.9742 train_time:800ms step_avg:79.96ms | |
| step:200/20000 train_loss:2.8597 train_time:17263ms step_avg:86.31ms | |
| step:400/20000 train_loss:2.3606 train_time:34602ms step_avg:86.50ms | |
| step:600/20000 train_loss:2.5491 train_time:52025ms step_avg:86.71ms | |
| step:800/20000 train_loss:2.2949 train_time:69482ms step_avg:86.85ms | |
| step:1000/20000 train_loss:2.3716 train_time:87109ms step_avg:87.11ms | |
| step:1000/20000 val_loss:2.3347 val_bpb:1.3827 train_time:87170ms step_avg:87.17ms | |
| step:1200/20000 train_loss:2.3898 train_time:104759ms step_avg:87.30ms | |
| step:1400/20000 train_loss:2.4342 train_time:122472ms step_avg:87.48ms | |
| step:1600/20000 train_loss:2.0993 train_time:140158ms step_avg:87.60ms | |
| step:1800/20000 train_loss:2.2019 train_time:157891ms step_avg:87.72ms | |
| step:2000/20000 train_loss:2.2352 train_time:175573ms step_avg:87.79ms | |
| step:2000/20000 val_loss:2.2369 val_bpb:1.3248 train_time:175634ms step_avg:87.82ms | |
| step:2200/20000 train_loss:2.3416 train_time:193361ms step_avg:87.89ms | |
| step:2400/20000 train_loss:2.3567 train_time:211092ms step_avg:87.95ms | |
| step:2600/20000 train_loss:2.2096 train_time:228777ms step_avg:87.99ms | |
| step:2800/20000 train_loss:2.1666 train_time:246501ms step_avg:88.04ms | |
| step:3000/20000 train_loss:3.1795 train_time:264197ms step_avg:88.07ms | |
| step:3000/20000 val_loss:2.1965 val_bpb:1.3009 train_time:264258ms step_avg:88.09ms | |
| step:3200/20000 train_loss:2.2668 train_time:281896ms step_avg:88.09ms | |
| step:3400/20000 train_loss:2.0981 train_time:299636ms step_avg:88.13ms | |
| step:3600/20000 train_loss:2.2011 train_time:317336ms step_avg:88.15ms | |
| step:3800/20000 train_loss:2.1505 train_time:335073ms step_avg:88.18ms | |
| step:4000/20000 train_loss:2.2753 train_time:352772ms step_avg:88.19ms | |
| step:4000/20000 val_loss:2.1711 val_bpb:1.2859 train_time:352834ms step_avg:88.21ms | |
| step:4200/20000 train_loss:2.2171 train_time:370547ms step_avg:88.23ms | |
| step:4400/20000 train_loss:2.1655 train_time:388292ms step_avg:88.25ms | |
| step:4600/20000 train_loss:2.2046 train_time:405972ms step_avg:88.25ms | |
| step:4800/20000 train_loss:2.1370 train_time:423730ms step_avg:88.28ms | |
| step:5000/20000 train_loss:2.2243 train_time:441431ms step_avg:88.29ms | |
| step:5000/20000 val_loss:2.1566 val_bpb:1.2773 train_time:441491ms step_avg:88.30ms | |
| step:5200/20000 train_loss:2.2862 train_time:459160ms step_avg:88.30ms | |
| step:5400/20000 train_loss:2.2339 train_time:476891ms step_avg:88.31ms | |
| step:5600/20000 train_loss:2.1392 train_time:494622ms step_avg:88.33ms | |
| step:5800/20000 train_loss:2.1788 train_time:512397ms step_avg:88.34ms | |
| step:6000/20000 train_loss:2.0872 train_time:530012ms step_avg:88.34ms | |
| step:6000/20000 val_loss:2.1290 val_bpb:1.2609 train_time:530075ms step_avg:88.35ms | |
| step:6200/20000 train_loss:2.0678 train_time:547723ms step_avg:88.34ms | |
| step:6400/20000 train_loss:1.8506 train_time:565459ms step_avg:88.35ms | |
| step:6600/20000 train_loss:2.0477 train_time:583137ms step_avg:88.35ms | |
| step:6791/20000 val_loss:2.0926 val_bpb:1.2394 train_time:600109ms step_avg:88.37ms | |
| stopping_early: wallclock_cap train_time:600109ms step:6791/20000 | |
| peak memory allocated: 10199 MiB reserved: 10248 MiB | |
| Serialized model: 67224983 bytes | |
| Code size: 58509 bytes | |
| Total submission size: 67283492 bytes | |
| Serialized model int8+zlib: 15810699 bytes (payload:17178912 raw_torch:17224025 payload_ratio:3.91x) | |
| Total submission size int8+zlib: 15869208 bytes | |
| final_int8_zlib_roundtrip val_loss:2.1023 val_bpb:1.2451 eval_time:2945ms | |
| final_int8_zlib_roundtrip_exact val_loss:2.10231862 val_bpb:1.24511151 | |
| final_int8_ttt_lora val_loss:2.0439 val_bpb:1.2105 eval_time:142925ms | |
Xet Storage Details
- Size:
- 5.81 kB
- Xet hash:
- a6be29cf995deb547791687e461a3d48c9d0df0f3689dc5ff38f867c88b04784
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.