download
raw
5.82 kB
W0320 23:44:38.019000 399 torch/distributed/run.py:852]
W0320 23:44:38.019000 399 torch/distributed/run.py:852] *****************************************
W0320 23:44:38.019000 399 torch/distributed/run.py:852] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
W0320 23:44:38.019000 399 torch/distributed/run.py:852] *****************************************
logs/09cd52fd-2ccf-4dbc-9c86-78bfce55cc3e.txt
val_bpb:enabled tokenizer_kind=sentencepiece tokenizer_path=./data/tokenizers/fineweb_1024_bpe.model
train_loader:dataset:fineweb10B_sp1024 train_shards:10
val_loader:shards pattern=./data/datasets/fineweb10B_sp1024/fineweb_val_*.bin tokens:62021632
model_params:17059912
world_size:8 grad_accum_steps:1
sdp_backends:cudnn=False flash=True mem_efficient=False math=False
attention_mode:gqa num_heads:8 num_kv_heads:4
tie_embeddings:True embed_lr:0.05 head_lr:0.0 matrix_lr:0.04 scalar_lr:0.04
train_batch_tokens:524288 train_seq_len:1024 iterations:20000 warmup_steps:20 max_wallclock_seconds:600.000
seed:1337
warmup_step:1/20
warmup_step:2/20
warmup_step:3/20
warmup_step:4/20
warmup_step:5/20
warmup_step:6/20
warmup_step:7/20
warmup_step:8/20
warmup_step:9/20
warmup_step:10/20
warmup_step:11/20
warmup_step:12/20
warmup_step:13/20
warmup_step:14/20
warmup_step:15/20
warmup_step:16/20
warmup_step:17/20
warmup_step:18/20
warmup_step:19/20
warmup_step:20/20
step:0/20000 val_loss:6.9357 val_bpb:4.1077 train_time:0ms step_avg:0.02ms
step:1/20000 train_loss:6.9370 train_time:43ms step_avg:43.22ms
step:2/20000 train_loss:16.8373 train_time:111ms step_avg:55.52ms
step:3/20000 train_loss:8.7589 train_time:195ms step_avg:65.16ms
step:4/20000 train_loss:6.6379 train_time:280ms step_avg:70.04ms
step:5/20000 train_loss:6.6127 train_time:365ms step_avg:72.93ms
step:6/20000 train_loss:7.4194 train_time:449ms step_avg:74.85ms
step:7/20000 train_loss:6.3501 train_time:534ms step_avg:76.22ms
step:8/20000 train_loss:6.1582 train_time:618ms step_avg:77.23ms
step:9/20000 train_loss:6.0680 train_time:702ms step_avg:78.01ms
step:10/20000 train_loss:5.9742 train_time:786ms step_avg:78.63ms
step:200/20000 train_loss:2.8471 train_time:16997ms step_avg:84.99ms
step:400/20000 train_loss:2.3598 train_time:34076ms step_avg:85.19ms
step:600/20000 train_loss:2.5472 train_time:51250ms step_avg:85.42ms
step:800/20000 train_loss:2.2967 train_time:68504ms step_avg:85.63ms
step:1000/20000 train_loss:2.3740 train_time:86101ms step_avg:86.10ms
step:1000/20000 val_loss:2.3353 val_bpb:1.3831 train_time:86157ms step_avg:86.16ms
step:1200/20000 train_loss:2.3870 train_time:103504ms step_avg:86.25ms
step:1400/20000 train_loss:2.4340 train_time:120853ms step_avg:86.32ms
step:1600/20000 train_loss:2.1002 train_time:138339ms step_avg:86.46ms
step:1800/20000 train_loss:2.2006 train_time:155716ms step_avg:86.51ms
step:2000/20000 train_loss:2.2334 train_time:173092ms step_avg:86.55ms
step:2000/20000 val_loss:2.2366 val_bpb:1.3246 train_time:173148ms step_avg:86.57ms
step:2200/20000 train_loss:2.3407 train_time:190446ms step_avg:86.57ms
step:2400/20000 train_loss:2.3557 train_time:207873ms step_avg:86.61ms
step:2600/20000 train_loss:2.2090 train_time:225224ms step_avg:86.62ms
step:2800/20000 train_loss:2.1677 train_time:242644ms step_avg:86.66ms
step:3000/20000 train_loss:3.1737 train_time:260008ms step_avg:86.67ms
step:3000/20000 val_loss:2.1967 val_bpb:1.3010 train_time:260066ms step_avg:86.69ms
step:3200/20000 train_loss:2.2643 train_time:277275ms step_avg:86.65ms
step:3400/20000 train_loss:2.0970 train_time:294627ms step_avg:86.65ms
step:3600/20000 train_loss:2.2103 train_time:312035ms step_avg:86.68ms
step:3800/20000 train_loss:2.1482 train_time:329402ms step_avg:86.68ms
step:4000/20000 train_loss:2.2680 train_time:346811ms step_avg:86.70ms
step:4000/20000 val_loss:2.1703 val_bpb:1.2854 train_time:346868ms step_avg:86.72ms
step:4200/20000 train_loss:2.2184 train_time:364292ms step_avg:86.74ms
step:4400/20000 train_loss:2.1702 train_time:381637ms step_avg:86.74ms
step:4600/20000 train_loss:2.1995 train_time:398977ms step_avg:86.73ms
step:4800/20000 train_loss:2.1370 train_time:416407ms step_avg:86.75ms
step:5000/20000 train_loss:2.2227 train_time:433748ms step_avg:86.75ms
step:5000/20000 val_loss:2.1569 val_bpb:1.2774 train_time:433804ms step_avg:86.76ms
step:5200/20000 train_loss:2.2872 train_time:451117ms step_avg:86.75ms
step:5400/20000 train_loss:2.2349 train_time:468514ms step_avg:86.76ms
step:5600/20000 train_loss:2.1406 train_time:485858ms step_avg:86.76ms
step:5800/20000 train_loss:2.1819 train_time:503216ms step_avg:86.76ms
step:6000/20000 train_loss:2.0902 train_time:520652ms step_avg:86.78ms
step:6000/20000 val_loss:2.1336 val_bpb:1.2637 train_time:520708ms step_avg:86.78ms
step:6200/20000 train_loss:2.0713 train_time:538046ms step_avg:86.78ms
step:6400/20000 train_loss:1.8496 train_time:555393ms step_avg:86.78ms
step:6600/20000 train_loss:2.0520 train_time:572799ms step_avg:86.79ms
step:6800/20000 train_loss:2.0919 train_time:590142ms step_avg:86.79ms
step:6913/20000 val_loss:2.0915 val_bpb:1.2387 train_time:600059ms step_avg:86.80ms
stopping_early: wallclock_cap train_time:600059ms step:6913/20000
peak memory allocated: 10135 MiB reserved: 10184 MiB
Serialized model: 67224983 bytes
Code size: 58509 bytes
Total submission size: 67283492 bytes
Serialized model int8+zlib: 15809041 bytes (payload:17178912 raw_torch:17224025 payload_ratio:3.91x)
Total submission size int8+zlib: 15867550 bytes
final_int8_zlib_roundtrip val_loss:2.1015 val_bpb:1.2446 eval_time:2846ms
final_int8_zlib_roundtrip_exact val_loss:2.10153730 val_bpb:1.24464876
final_int8_ttt_lora val_loss:2.0430 val_bpb:1.2100 eval_time:103552ms

Xet Storage Details

Size:
5.82 kB
·
Xet hash:
ab63d68951b30560ad1fcc819cb0ba0b74705a1f25b811adfdd8d32f4f4bfa2d

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.