download
raw
9.44 kB
W0323 09:26:38.186000 715 torch/distributed/run.py:852]
W0323 09:26:38.186000 715 torch/distributed/run.py:852] *****************************************
W0323 09:26:38.186000 715 torch/distributed/run.py:852] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W0323 09:26:38.186000 715 torch/distributed/run.py:852] *****************************************
logs/1d087ef9-4b9c-42f5-a5a4-daeebf0d3141.txt
val_bpb:enabled tokenizer_kind=sentencepiece tokenizer_path=./data/tokenizers/fineweb_1024_bpe.model
train_loader:dataset:fineweb10B_sp1024 train_shards:10
val_loader:shards pattern=./data/datasets/fineweb10B_sp1024/fineweb_val_*.bin tokens:62021632
model_params:17059912
world_size:8 grad_accum_steps:1
sdp_backends:cudnn=False flash=True mem_efficient=False math=False
attention_mode:gqa num_heads:8 num_kv_heads:4
tie_embeddings:True embed_lr:0.05 head_lr:0.0 matrix_lr:0.04 scalar_lr:0.04
train_batch_tokens:524288 train_seq_len:1024 iterations:20000 warmup_steps:20 max_wallclock_seconds:600.000
seed:1337
warmup_step:1/20
warmup_step:2/20
warmup_step:3/20
warmup_step:4/20
warmup_step:5/20
warmup_step:6/20
warmup_step:7/20
warmup_step:8/20
warmup_step:9/20
warmup_step:10/20
warmup_step:11/20
warmup_step:12/20
warmup_step:13/20
warmup_step:14/20
warmup_step:15/20
warmup_step:16/20
warmup_step:17/20
warmup_step:18/20
warmup_step:19/20
warmup_step:20/20
step:0/20000 val_loss:6.9357 val_bpb:4.1077 train_time:0ms step_avg:0.02ms
step:1/20000 train_loss:6.9370 train_time:23ms step_avg:23.04ms
step:2/20000 train_loss:16.8366 train_time:61ms step_avg:30.56ms
step:3/20000 train_loss:8.7609 train_time:99ms step_avg:33.13ms
step:4/20000 train_loss:6.6385 train_time:138ms step_avg:34.45ms
step:5/20000 train_loss:6.6117 train_time:176ms step_avg:35.27ms
step:6/20000 train_loss:7.4219 train_time:215ms step_avg:35.85ms
step:7/20000 train_loss:6.3504 train_time:253ms step_avg:36.18ms
step:8/20000 train_loss:6.1580 train_time:292ms step_avg:36.48ms
step:9/20000 train_loss:6.0678 train_time:330ms step_avg:36.64ms
step:10/20000 train_loss:5.9744 train_time:368ms step_avg:36.85ms
step:200/20000 train_loss:2.8515 train_time:7777ms step_avg:38.89ms
step:400/20000 train_loss:2.3627 train_time:15582ms step_avg:38.96ms
step:600/20000 train_loss:2.5524 train_time:23394ms step_avg:38.99ms
step:800/20000 train_loss:2.2935 train_time:31212ms step_avg:39.01ms
step:1000/20000 train_loss:2.3722 train_time:39028ms step_avg:39.03ms
step:1000/20000 val_loss:2.3361 val_bpb:1.3836 train_time:39048ms step_avg:39.05ms
step:1200/20000 train_loss:2.3900 train_time:46872ms step_avg:39.06ms
step:1400/20000 train_loss:2.4339 train_time:54724ms step_avg:39.09ms
step:1600/20000 train_loss:2.1009 train_time:62595ms step_avg:39.12ms
step:1800/20000 train_loss:2.1977 train_time:70451ms step_avg:39.14ms
step:2000/20000 train_loss:2.2363 train_time:78336ms step_avg:39.17ms
step:2000/20000 val_loss:2.2366 val_bpb:1.3246 train_time:78357ms step_avg:39.18ms
step:2200/20000 train_loss:2.3429 train_time:86311ms step_avg:39.23ms
step:2400/20000 train_loss:2.3578 train_time:94258ms step_avg:39.27ms
step:2600/20000 train_loss:2.2132 train_time:102215ms step_avg:39.31ms
step:2800/20000 train_loss:2.1680 train_time:110180ms step_avg:39.35ms
step:3000/20000 train_loss:3.1765 train_time:118152ms step_avg:39.38ms
step:3000/20000 val_loss:2.1971 val_bpb:1.3013 train_time:118174ms step_avg:39.39ms
step:3200/20000 train_loss:2.2656 train_time:126118ms step_avg:39.41ms
step:3400/20000 train_loss:2.0951 train_time:134081ms step_avg:39.44ms
step:3600/20000 train_loss:2.2049 train_time:142023ms step_avg:39.45ms
step:3800/20000 train_loss:2.1506 train_time:149958ms step_avg:39.46ms
step:4000/20000 train_loss:2.2749 train_time:157880ms step_avg:39.47ms
step:4000/20000 val_loss:2.1707 val_bpb:1.2856 train_time:157903ms step_avg:39.48ms
step:4200/20000 train_loss:2.2180 train_time:165885ms step_avg:39.50ms
step:4400/20000 train_loss:2.1667 train_time:173809ms step_avg:39.50ms
step:4600/20000 train_loss:2.2021 train_time:181738ms step_avg:39.51ms
step:4800/20000 train_loss:2.1372 train_time:189681ms step_avg:39.52ms
step:5000/20000 train_loss:2.2252 train_time:197633ms step_avg:39.53ms
step:5000/20000 val_loss:2.1573 val_bpb:1.2777 train_time:197655ms step_avg:39.53ms
step:5200/20000 train_loss:2.2864 train_time:205591ms step_avg:39.54ms
step:5400/20000 train_loss:2.2367 train_time:213551ms step_avg:39.55ms
step:5600/20000 train_loss:2.1411 train_time:221477ms step_avg:39.55ms
step:5800/20000 train_loss:2.1883 train_time:229389ms step_avg:39.55ms
step:6000/20000 train_loss:2.1000 train_time:237304ms step_avg:39.55ms
step:6000/20000 val_loss:2.1444 val_bpb:1.2700 train_time:237325ms step_avg:39.55ms
step:6200/20000 train_loss:2.0895 train_time:245233ms step_avg:39.55ms
step:6400/20000 train_loss:1.8691 train_time:253178ms step_avg:39.56ms
step:6600/20000 train_loss:2.0886 train_time:261139ms step_avg:39.57ms
step:6800/20000 train_loss:2.1338 train_time:269114ms step_avg:39.58ms
step:7000/20000 train_loss:2.0947 train_time:277078ms step_avg:39.58ms
step:7000/20000 val_loss:2.1353 val_bpb:1.2647 train_time:277100ms step_avg:39.59ms
step:7200/20000 train_loss:1.9731 train_time:285003ms step_avg:39.58ms
step:7400/20000 train_loss:1.9078 train_time:292928ms step_avg:39.58ms
step:7600/20000 train_loss:2.1631 train_time:300826ms step_avg:39.58ms
step:7800/20000 train_loss:2.1269 train_time:308754ms step_avg:39.58ms
step:8000/20000 train_loss:2.0576 train_time:316707ms step_avg:39.59ms
step:8000/20000 val_loss:2.1290 val_bpb:1.2609 train_time:316730ms step_avg:39.59ms
step:8200/20000 train_loss:2.1947 train_time:324663ms step_avg:39.59ms
step:8400/20000 train_loss:2.1891 train_time:332686ms step_avg:39.61ms
step:8600/20000 train_loss:2.2069 train_time:340601ms step_avg:39.60ms
step:8800/20000 train_loss:2.0772 train_time:348567ms step_avg:39.61ms
step:9000/20000 train_loss:2.1063 train_time:356480ms step_avg:39.61ms
step:9000/20000 val_loss:2.1227 val_bpb:1.2572 train_time:356501ms step_avg:39.61ms
step:9200/20000 train_loss:2.1819 train_time:364422ms step_avg:39.61ms
step:9400/20000 train_loss:2.0182 train_time:372339ms step_avg:39.61ms
step:9600/20000 train_loss:2.0148 train_time:380276ms step_avg:39.61ms
step:9800/20000 train_loss:2.0697 train_time:388209ms step_avg:39.61ms
step:10000/20000 train_loss:2.0241 train_time:396138ms step_avg:39.61ms
step:10000/20000 val_loss:2.1184 val_bpb:1.2546 train_time:396161ms step_avg:39.62ms
step:10200/20000 train_loss:2.1331 train_time:404089ms step_avg:39.62ms
step:10400/20000 train_loss:2.1017 train_time:412025ms step_avg:39.62ms
step:10600/20000 train_loss:2.0756 train_time:419950ms step_avg:39.62ms
step:10800/20000 train_loss:2.1248 train_time:427887ms step_avg:39.62ms
step:11000/20000 train_loss:2.1002 train_time:435814ms step_avg:39.62ms
step:11000/20000 val_loss:2.1139 val_bpb:1.2520 train_time:435837ms step_avg:39.62ms
step:11200/20000 train_loss:2.1328 train_time:443762ms step_avg:39.62ms
step:11400/20000 train_loss:2.2185 train_time:451695ms step_avg:39.62ms
step:11600/20000 train_loss:2.0862 train_time:459659ms step_avg:39.63ms
step:11800/20000 train_loss:2.1698 train_time:467605ms step_avg:39.63ms
step:12000/20000 train_loss:2.0754 train_time:475556ms step_avg:39.63ms
step:12000/20000 val_loss:2.1082 val_bpb:1.2486 train_time:475578ms step_avg:39.63ms
step:12200/20000 train_loss:2.2423 train_time:483522ms step_avg:39.63ms
step:12400/20000 train_loss:2.0525 train_time:491519ms step_avg:39.64ms
step:12600/20000 train_loss:2.0900 train_time:499453ms step_avg:39.64ms
step:12800/20000 train_loss:2.0601 train_time:507372ms step_avg:39.64ms
step:13000/20000 train_loss:1.9239 train_time:515328ms step_avg:39.64ms
step:13000/20000 val_loss:2.1079 val_bpb:1.2484 train_time:515341ms step_avg:39.64ms
step:13200/20000 train_loss:2.0757 train_time:523268ms step_avg:39.64ms
step:13400/20000 train_loss:2.1089 train_time:531220ms step_avg:39.64ms
step:13600/20000 train_loss:2.0693 train_time:539151ms step_avg:39.64ms
step:13800/20000 train_loss:2.0852 train_time:547103ms step_avg:39.65ms
step:14000/20000 train_loss:1.9272 train_time:555051ms step_avg:39.65ms
step:14000/20000 val_loss:2.1021 val_bpb:1.2450 train_time:555073ms step_avg:39.65ms
step:14200/20000 train_loss:1.9886 train_time:562996ms step_avg:39.65ms
step:14400/20000 train_loss:1.9930 train_time:570920ms step_avg:39.65ms
step:14600/20000 train_loss:1.9616 train_time:578860ms step_avg:39.65ms
step:14800/20000 train_loss:2.0628 train_time:586789ms step_avg:39.65ms
step:15000/20000 train_loss:2.1184 train_time:594735ms step_avg:39.65ms
step:15000/20000 val_loss:2.0662 val_bpb:1.2237 train_time:594757ms step_avg:39.65ms
step:15133/20000 val_loss:2.0631 val_bpb:1.2219 train_time:600032ms step_avg:39.65ms
stopping_early: wallclock_cap train_time:600032ms step:15133/20000
peak memory allocated: 10184 MiB reserved: 10478 MiB
Serialized model: 67224983 bytes
Code size: 47686 bytes
Total submission size: 67272669 bytes
Serialized model int8+zlib: 15820150 bytes (payload:17178912 raw_torch:17224025 payload_ratio:3.91x)
Total submission size int8+zlib: 15867836 bytes
final_int8_zlib_roundtrip val_loss:2.0765 val_bpb:1.2298 eval_time:1235ms
final_int8_zlib_roundtrip_exact val_loss:2.07651564 val_bpb:1.22982953

Xet Storage Details

Size:
9.44 kB
·
Xet hash:
38877cae6d0c0ab469f18164ba81c32d92a4e4952c94da66fbb6d67995073fbb

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.