weight / train0.log
TomGrc's picture
Upload train0.log with huggingface_hub
0e38d2f verified
['./train_muon_ki.py', '-traindir', '/root/katago_project/data/train/b32c512h16tfrs', '-datadir', '/root/katago_project/data/shuffleddata/current', '-exportdir', '/root/katago_project/data/torchmodels_toexport_extra', '-exportprefix', 'b32c512h16tfrs', '-max-epochs-this-instance', '100', '-pos-len', '19', '-samples-per-epoch', '5000000', '-lr-scale', '1.0', '-swa-scales', '32.0', '-use-fp16', '-symmetry-type', 'xyt', '-batch-size', '256', '-model-kind', 'b32c512h16tfrs-bng-silu', '-swa-period-samples', '500000', '-max-val-samples', '20000', '-enable-history-matrices', '-multi-gpus', '0,1,2,3,4,5,6,7', '-gnorm-clip-scale', '1.0', '-wd-scale', '1.0', '-export-prob', '0.003', '-master-port', '23456', '-lr-scale-auto-type', 'custom']
Running torch.distributed.init_process_group
Returned from torch.distributed.init_process_group, my rank = 0, world_size=8
Using GPU device: Tesla V100-SXM2-32GB
Seeding torch with 51081365751662798
No preexisting checkpoint found at: /root/katago_project/data/train/b32c512h16tfrs/checkpoint.ckpt
Initializing new model!
['./train_muon_ki.py', '-traindir', '/root/katago_project/data/train/b32c512h16tfrs', '-datadir', '/root/katago_project/data/shuffleddata/current', '-exportdir', '/root/katago_project/data/torchmodels_toexport_extra', '-exportprefix', 'b32c512h16tfrs', '-max-epochs-this-instance', '100', '-pos-len', '19', '-samples-per-epoch', '5000000', '-lr-scale', '1.0', '-swa-scales', '32.0', '-use-fp16', '-symmetry-type', 'xyt', '-batch-size', '256', '-model-kind', 'b32c512h16tfrs-bng-silu', '-swa-period-samples', '500000', '-max-val-samples', '20000', '-enable-history-matrices', '-multi-gpus', '0,1,2,3,4,5,6,7', '-gnorm-clip-scale', '1.0', '-wd-scale', '1.0', '-export-prob', '0.003', '-master-port', '23456', '-lr-scale-auto-type', 'custom']
Running torch.distributed.init_process_group
Returned from torch.distributed.init_process_group, my rank = 0, world_size=8
Using GPU device: Tesla V100-SXM2-32GB
Seeding torch with 22393081809424076
No preexisting checkpoint found at: /root/katago_project/data/train/b32c512h16tfrs/checkpoint.ckpt
Initializing new model!
{'version': 15, 'norm_kind': 'bnorm', 'bnorm_epsilon': 0.0001, 'bnorm_running_avg_momentum': 0.001, 'initial_conv_1x1': False, 'trunk_num_channels': 512, 'mid_num_channels': 512, 'gpool_num_channels': 64, 'transformer_ffn_channels': 1536, 'transformer_heads': 16, 'transformer_kv_heads': 16, 'use_attention_pool': False, 'num_attention_pool_heads': 4, 'block_kind': [['rconv1', 'transformerropesg'], ['rconv2', 'transformerropesg'], ['rconv3', 'transformerropesg'], ['rconv4', 'transformerropesg'], ['rconv5', 'transformerropesg'], ['rconv6', 'transformerropesg'], ['rconv7', 'transformerropesg'], ['rconv8', 'transformerropesg'], ['rconv9', 'transformerropesg'], ['rconv10', 'transformerropesg'], ['rconv11', 'transformerropesg'], ['rconv12', 'transformerropesg'], ['rconv13', 'transformerropesg'], ['rconv14', 'transformerropesg'], ['rconv15', 'transformerropesg'], ['rconv16', 'transformerropesg'], ['rconv17', 'transformerropesg'], ['rconv18', 'transformerropesg'], ['rconv19', 'transformerropesg'], ['rconv20', 'transformerropesg'], ['rconv21', 'transformerropesg'], ['rconv22', 'transformerropesg'], ['rconv23', 'transformerropesg'], ['rconv24', 'transformerropesg'], ['rconv25', 'transformerropesg'], ['rconv26', 'transformerropesg'], ['rconv27', 'transformerropesg'], ['rconv28', 'transformerropesg'], ['rconv29', 'transformerropesg'], ['rconv30', 'transformerropesg'], ['rconv31', 'transformerropesg'], ['rconv32', 'transformerropesg']], 'p1_num_channels': 64, 'g1_num_channels': 64, 'v1_num_channels': 128, 'sbv2_num_channels': 128, 'num_scorebeliefs': 8, 'v2_size': 144, 'bnorm_use_gamma': True, 'activation': 'silu'}
Model norm normal baseline computed: 19628.654296875
swa_period_samples 500000.0
swa_scales [32.0]
lookahead_alpha None
lookahead_k None
soft_policy_weight_scale 8.0
disable_optimistic_policy False
meta_kata_only_soft_policy False
value_loss_scale 0.6
td_value_loss_scales [0.6, 0.6, 0.6]
seki_loss_scale 1.0
variance_time_loss_scale 1.0
main_loss_scale None
intermediate_loss_scale None
Parameters in model:
Total num params: 109468997
Total trainable params: 109468997
Training in FP16! Creating scaler
Shuffled data train.json file does not exist, there seems to be no shuffled data yet, waiting and trying again later: /root/katago_project/data/shuffleddata/current/train.json
Shuffled data train.json file does not exist, there seems to be no shuffled data yet, waiting and trying again later: /root/katago_project/data/shuffleddata/current/train.json
Shuffled data train.json file does not exist, there seems to be no shuffled data yet, waiting and trying again later: /root/katago_project/data/shuffleddata/current/train.json
Shuffled data train.json file does not exist, there seems to be no shuffled data yet, waiting and trying again later: /root/katago_project/data/shuffleddata/current/train.json
Shuffled data train.json file does not exist, there seems to be no shuffled data yet, waiting and trying again later: /root/katago_project/data/shuffleddata/current/train.json
Updated training data: /root/katago_project/data/shuffleddata/current
Train steps since last reload: 0 -> 0
Skipping 0/0 files in: /root/katago_project/data/shuffleddata/current/train as already used first pass
No new training files found in: /root/katago_project/data/shuffleddata/current/train, waiting 30s and trying again
GC collect
=========================================================================
BEGINNING NEXT EPOCH 0
=========================================================================
Current time: 2026-02-28 06:37:03.338473
Global step: 0 samples
Currently up to data row 95431473
Training dir: /root/katago_project/data/train/b32c512h16tfrs
Export dir: /root/katago_project/data/torchmodels_toexport_extra
Current grad scale: 65536.0
['./train_muon_ki.py', '-traindir', '/root/katago_project/data/train/b32c512h16tfrs', '-datadir', '/root/katago_project/data/shuffleddata/current', '-exportdir', '/root/katago_project/data/torchmodels_toexport_extra', '-exportprefix', 'b32c512h16tfrs', '-max-epochs-this-instance', '100', '-pos-len', '19', '-samples-per-epoch', '5000000', '-lr-scale', '1.0', '-swa-scales', '32.0', '-use-fp16', '-symmetry-type', 'xyt', '-batch-size', '64', '-model-kind', 'b32c512h16tfrs-bng-silu', '-swa-period-samples', '500000', '-max-val-samples', '20000', '-enable-history-matrices', '-multi-gpus', '0,1,2,3,4,5,6,7', '-gnorm-clip-scale', '1.0', '-wd-scale', '1.0', '-export-prob', '0.003', '-master-port', '23456', '-lr-scale-auto-type', 'custom']
Running torch.distributed.init_process_group
Returned from torch.distributed.init_process_group, my rank = 0, world_size=8
Using GPU device: Tesla V100-SXM2-32GB
Seeding torch with 57946386230100039
No preexisting checkpoint found at: /root/katago_project/data/train/b32c512h16tfrs/checkpoint.ckpt
Initializing new model!
{'version': 15, 'norm_kind': 'bnorm', 'bnorm_epsilon': 0.0001, 'bnorm_running_avg_momentum': 0.001, 'initial_conv_1x1': False, 'trunk_num_channels': 512, 'mid_num_channels': 512, 'gpool_num_channels': 64, 'transformer_ffn_channels': 1536, 'transformer_heads': 16, 'transformer_kv_heads': 16, 'use_attention_pool': False, 'num_attention_pool_heads': 4, 'block_kind': [['rconv1', 'transformerropesg'], ['rconv2', 'transformerropesg'], ['rconv3', 'transformerropesg'], ['rconv4', 'transformerropesg'], ['rconv5', 'transformerropesg'], ['rconv6', 'transformerropesg'], ['rconv7', 'transformerropesg'], ['rconv8', 'transformerropesg'], ['rconv9', 'transformerropesg'], ['rconv10', 'transformerropesg'], ['rconv11', 'transformerropesg'], ['rconv12', 'transformerropesg'], ['rconv13', 'transformerropesg'], ['rconv14', 'transformerropesg'], ['rconv15', 'transformerropesg'], ['rconv16', 'transformerropesg'], ['rconv17', 'transformerropesg'], ['rconv18', 'transformerropesg'], ['rconv19', 'transformerropesg'], ['rconv20', 'transformerropesg'], ['rconv21', 'transformerropesg'], ['rconv22', 'transformerropesg'], ['rconv23', 'transformerropesg'], ['rconv24', 'transformerropesg'], ['rconv25', 'transformerropesg'], ['rconv26', 'transformerropesg'], ['rconv27', 'transformerropesg'], ['rconv28', 'transformerropesg'], ['rconv29', 'transformerropesg'], ['rconv30', 'transformerropesg'], ['rconv31', 'transformerropesg'], ['rconv32', 'transformerropesg']], 'p1_num_channels': 64, 'g1_num_channels': 64, 'v1_num_channels': 128, 'sbv2_num_channels': 128, 'num_scorebeliefs': 8, 'v2_size': 144, 'bnorm_use_gamma': True, 'activation': 'silu'}
Model norm normal baseline computed: 19629.177734375
swa_period_samples 500000.0
swa_scales [32.0]
lookahead_alpha None
lookahead_k None
soft_policy_weight_scale 8.0
disable_optimistic_policy False
meta_kata_only_soft_policy False
value_loss_scale 0.6
td_value_loss_scales [0.6, 0.6, 0.6]
seki_loss_scale 1.0
variance_time_loss_scale 1.0
main_loss_scale None
intermediate_loss_scale None
Parameters in model:
Total num params: 109468997
Total trainable params: 109468997
Training in FP16! Creating scaler
Updated training data: /root/katago_project/data/shuffleddata/current
Train steps since last reload: 0 -> 0
Skipping 0/0 files in: /root/katago_project/data/shuffleddata/current/train as already used first pass
No new training files found in: /root/katago_project/data/shuffleddata/current/train, waiting 30s and trying again
GC collect
=========================================================================
BEGINNING NEXT EPOCH 0
=========================================================================
Current time: 2026-02-28 06:49:17.488797
Global step: 0 samples
Currently up to data row 95431473
Training dir: /root/katago_project/data/train/b32c512h16tfrs
Export dir: /root/katago_project/data/torchmodels_toexport_extra
Current grad scale: 65536.0
['./train_muon_ki.py', '-traindir', '/root/katago_project/data/train/b32c512h16tfrs', '-datadir', '/root/katago_project/data/shuffleddata/current', '-exportdir', '/root/katago_project/data/torchmodels_toexport_extra', '-exportprefix', 'b32c512h16tfrs', '-max-epochs-this-instance', '100', '-pos-len', '19', '-samples-per-epoch', '5000000', '-lr-scale', '1.0', '-swa-scales', '32.0', '-use-fp16', '-symmetry-type', 'xyt', '-batch-size', '64', '-model-kind', 'b32c512h16tfrs-bng-silu', '-swa-period-samples', '500000', '-max-val-samples', '20000', '-enable-history-matrices', '-multi-gpus', '0,1,2,3,4,5,6,7', '-gnorm-clip-scale', '1.0', '-wd-scale', '1.0', '-export-prob', '0.003', '-master-port', '23456', '-lr-scale-auto-type', 'custom']
Running torch.distributed.init_process_group
Returned from torch.distributed.init_process_group, my rank = 0, world_size=8
Using GPU device: Tesla V100-SXM2-32GB
Seeding torch with 56989061263673706
No preexisting checkpoint found at: /root/katago_project/data/train/b32c512h16tfrs/checkpoint.ckpt
Initializing new model!
{'version': 15, 'norm_kind': 'bnorm', 'bnorm_epsilon': 0.0001, 'bnorm_running_avg_momentum': 0.001, 'initial_conv_1x1': False, 'trunk_num_channels': 512, 'mid_num_channels': 512, 'gpool_num_channels': 64, 'transformer_ffn_channels': 1536, 'transformer_heads': 16, 'transformer_kv_heads': 16, 'use_attention_pool': False, 'num_attention_pool_heads': 4, 'block_kind': [['rconv1', 'transformerropesg'], ['rconv2', 'transformerropesg'], ['rconv3', 'transformerropesg'], ['rconv4', 'transformerropesg'], ['rconv5', 'transformerropesg'], ['rconv6', 'transformerropesg'], ['rconv7', 'transformerropesg'], ['rconv8', 'transformerropesg'], ['rconv9', 'transformerropesg'], ['rconv10', 'transformerropesg'], ['rconv11', 'transformerropesg'], ['rconv12', 'transformerropesg'], ['rconv13', 'transformerropesg'], ['rconv14', 'transformerropesg'], ['rconv15', 'transformerropesg'], ['rconv16', 'transformerropesg'], ['rconv17', 'transformerropesg'], ['rconv18', 'transformerropesg'], ['rconv19', 'transformerropesg'], ['rconv20', 'transformerropesg'], ['rconv21', 'transformerropesg'], ['rconv22', 'transformerropesg'], ['rconv23', 'transformerropesg'], ['rconv24', 'transformerropesg'], ['rconv25', 'transformerropesg'], ['rconv26', 'transformerropesg'], ['rconv27', 'transformerropesg'], ['rconv28', 'transformerropesg'], ['rconv29', 'transformerropesg'], ['rconv30', 'transformerropesg'], ['rconv31', 'transformerropesg'], ['rconv32', 'transformerropesg']], 'p1_num_channels': 64, 'g1_num_channels': 64, 'v1_num_channels': 128, 'sbv2_num_channels': 128, 'num_scorebeliefs': 8, 'v2_size': 144, 'bnorm_use_gamma': True, 'activation': 'silu'}
Model norm normal baseline computed: 19621.96484375
swa_period_samples 500000.0
swa_scales [32.0]
lookahead_alpha None
lookahead_k None
soft_policy_weight_scale 8.0
disable_optimistic_policy False
meta_kata_only_soft_policy False
value_loss_scale 0.6
td_value_loss_scales [0.6, 0.6, 0.6]
seki_loss_scale 1.0
variance_time_loss_scale 1.0
main_loss_scale None
intermediate_loss_scale None
Parameters in model:
Total num params: 109468997
Total trainable params: 109468997
Training in FP16! Creating scaler
Updated training data: /root/katago_project/data/shuffleddata/current
Train steps since last reload: 0 -> 0
Skipping 0/1362 files in: /root/katago_project/data/shuffleddata/current/train as already used first pass
GC collect
=========================================================================
BEGINNING NEXT EPOCH 0
=========================================================================
Current time: 2026-02-28 06:50:34.411008
Global step: 0 samples
Currently up to data row 95431473
Training dir: /root/katago_project/data/train/b32c512h16tfrs
Export dir: /root/katago_project/data/torchmodels_toexport_extra
Current grad scale: 65536.0
Beginning training subepoch!
Currently up to data row 95431473
['./train_muon_ki.py', '-traindir', '/root/katago_project/data/train/b32c512h16tfrs', '-datadir', '/root/katago_project/data/shuffleddata/current', '-exportdir', '/root/katago_project/data/torchmodels_toexport_extra', '-exportprefix', 'b32c512h16tfrs', '-max-epochs-this-instance', '100', '-pos-len', '19', '-samples-per-epoch', '5000000', '-lr-scale', '1.0', '-swa-scales', '32.0', '-use-fp16', '-symmetry-type', 'xyt', '-batch-size', '64', '-model-kind', 'b32c512h16tfrs-bng-silu', '-swa-period-samples', '500000', '-max-val-samples', '20000', '-enable-history-matrices', '-multi-gpus', '0,1,2,3,4,5,6,7', '-gnorm-clip-scale', '1.0', '-wd-scale', '1.0', '-export-prob', '0.003', '-master-port', '23456', '-lr-scale-auto-type', 'custom']
Running torch.distributed.init_process_group
Returned from torch.distributed.init_process_group, my rank = 0, world_size=8
Using GPU device: Tesla V100-SXM2-32GB
Seeding torch with 34786518695568134
No preexisting checkpoint found at: /root/katago_project/data/train/b32c512h16tfrs/checkpoint.ckpt
Initializing new model!
{'version': 15, 'norm_kind': 'bnorm', 'bnorm_epsilon': 0.0001, 'bnorm_running_avg_momentum': 0.001, 'initial_conv_1x1': False, 'trunk_num_channels': 512, 'mid_num_channels': 512, 'gpool_num_channels': 64, 'transformer_ffn_channels': 1536, 'transformer_heads': 16, 'transformer_kv_heads': 16, 'use_attention_pool': False, 'num_attention_pool_heads': 4, 'block_kind': [['rconv1', 'transformerropesg'], ['rconv2', 'transformerropesg'], ['rconv3', 'transformerropesg'], ['rconv4', 'transformerropesg'], ['rconv5', 'transformerropesg'], ['rconv6', 'transformerropesg'], ['rconv7', 'transformerropesg'], ['rconv8', 'transformerropesg'], ['rconv9', 'transformerropesg'], ['rconv10', 'transformerropesg'], ['rconv11', 'transformerropesg'], ['rconv12', 'transformerropesg'], ['rconv13', 'transformerropesg'], ['rconv14', 'transformerropesg'], ['rconv15', 'transformerropesg'], ['rconv16', 'transformerropesg'], ['rconv17', 'transformerropesg'], ['rconv18', 'transformerropesg'], ['rconv19', 'transformerropesg'], ['rconv20', 'transformerropesg'], ['rconv21', 'transformerropesg'], ['rconv22', 'transformerropesg'], ['rconv23', 'transformerropesg'], ['rconv24', 'transformerropesg'], ['rconv25', 'transformerropesg'], ['rconv26', 'transformerropesg'], ['rconv27', 'transformerropesg'], ['rconv28', 'transformerropesg'], ['rconv29', 'transformerropesg'], ['rconv30', 'transformerropesg'], ['rconv31', 'transformerropesg'], ['rconv32', 'transformerropesg']], 'p1_num_channels': 64, 'g1_num_channels': 64, 'v1_num_channels': 128, 'sbv2_num_channels': 128, 'num_scorebeliefs': 8, 'v2_size': 144, 'bnorm_use_gamma': True, 'activation': 'silu'}
Model norm normal baseline computed: 19628.322265625
swa_period_samples 500000.0
swa_scales [32.0]
lookahead_alpha None
lookahead_k None
soft_policy_weight_scale 8.0
disable_optimistic_policy False
meta_kata_only_soft_policy False
value_loss_scale 0.6
td_value_loss_scales [0.6, 0.6, 0.6]
seki_loss_scale 1.0
variance_time_loss_scale 1.0
main_loss_scale None
intermediate_loss_scale None
Parameters in model:
Total num params: 109468997
Total trainable params: 109468997
Training in FP16! Creating scaler
Updated training data: /root/katago_project/data/shuffleddata/current
Train steps since last reload: 0 -> 0
Skipping 0/1362 files in: /root/katago_project/data/shuffleddata/current/train as already used first pass
GC collect
=========================================================================
BEGINNING NEXT EPOCH 0
=========================================================================
Current time: 2026-02-28 06:53:52.889169
Global step: 0 samples
Currently up to data row 95431473
Training dir: /root/katago_project/data/train/b32c512h16tfrs
Export dir: /root/katago_project/data/torchmodels_toexport_extra
Current grad scale: 65536.0
Beginning training subepoch!
Currently up to data row 95431473
b32c512h16tfrs: nsamp=51200, time=113.37, p0loss=5.5369, vloss=1.0932, pslr=2.263e-05,wdtc=5.000e+07, norm=1.96178e+04, attn_norm=1.09163e+04
b32c512h16tfrs: nsamp=102400, time=91.64, p0loss=4.4847, vloss=1.0355, pslr=2.263e-05,wdtc=5.000e+07, norm=1.96048e+04, attn_norm=1.08863e+04
b32c512h16tfrs: nsamp=153600, time=91.61, p0loss=3.9729, vloss=0.9973, pslr=2.263e-05,wdtc=5.000e+07, norm=1.96030e+04, attn_norm=1.08574e+04
b32c512h16tfrs: nsamp=204800, time=90.70, p0loss=3.6722, vloss=0.9789, pslr=2.263e-05,wdtc=5.000e+07, norm=1.96097e+04, attn_norm=1.08386e+04
b32c512h16tfrs: nsamp=256000, time=90.65, p0loss=3.4594, vloss=0.9675, pslr=2.263e-05,wdtc=5.000e+07, norm=1.96204e+04, attn_norm=1.08226e+04
b32c512h16tfrs: nsamp=307200, time=91.31, p0loss=3.3222, vloss=0.9581, pslr=3.232e-05,wdtc=3.501e+07, norm=1.96488e+04, attn_norm=1.08071e+04
b32c512h16tfrs: nsamp=358400, time=91.80, p0loss=3.2063, vloss=0.9466, pslr=3.232e-05,wdtc=3.500e+07, norm=1.97039e+04, attn_norm=1.07936e+04
b32c512h16tfrs: nsamp=409600, time=91.76, p0loss=3.1219, vloss=0.9361, pslr=3.232e-05,wdtc=3.500e+07, norm=1.97603e+04, attn_norm=1.07826e+04
b32c512h16tfrs: nsamp=460800, time=91.59, p0loss=3.0317, vloss=0.9246, pslr=3.232e-05,wdtc=3.500e+07, norm=1.98192e+04, attn_norm=1.07736e+04
b32c512h16tfrs: nsamp=512000, time=91.08, p0loss=2.9660, vloss=0.9136, pslr=3.232e-05,wdtc=3.500e+07, norm=1.98809e+04, attn_norm=1.07668e+04
b32c512h16tfrs: nsamp=563200, time=90.89, p0loss=2.9458, vloss=0.9048, pslr=4.524e-05,wdtc=2.501e+07, norm=1.99730e+04, attn_norm=1.07633e+04
b32c512h16tfrs: nsamp=614400, time=91.06, p0loss=2.8948, vloss=0.8941, pslr=4.525e-05,wdtc=2.500e+07, norm=2.01101e+04, attn_norm=1.07658e+04
b32c512h16tfrs: nsamp=665600, time=90.95, p0loss=2.8557, vloss=0.8830, pslr=4.525e-05,wdtc=2.500e+07, norm=2.02471e+04, attn_norm=1.07706e+04
b32c512h16tfrs: nsamp=716800, time=90.82, p0loss=2.8101, vloss=0.8721, pslr=4.525e-05,wdtc=2.500e+07, norm=2.03834e+04, attn_norm=1.07777e+04
b32c512h16tfrs: nsamp=768000, time=91.13, p0loss=2.7732, vloss=0.8593, pslr=4.525e-05,wdtc=2.500e+07, norm=2.05183e+04, attn_norm=1.07850e+04
b32c512h16tfrs: nsamp=819200, time=91.49, p0loss=2.7559, vloss=0.8517, pslr=6.463e-05,wdtc=1.751e+07, norm=2.07121e+04, attn_norm=1.08010e+04
b32c512h16tfrs: nsamp=870400, time=90.95, p0loss=2.7356, vloss=0.8425, pslr=6.465e-05,wdtc=1.750e+07, norm=2.09960e+04, attn_norm=1.08320e+04
b32c512h16tfrs: nsamp=921600, time=91.40, p0loss=2.7243, vloss=0.8350, pslr=6.465e-05,wdtc=1.750e+07, norm=2.12783e+04, attn_norm=1.08649e+04
b32c512h16tfrs: nsamp=972800, time=91.85, p0loss=2.7038, vloss=0.8252, pslr=6.465e-05,wdtc=1.750e+07, norm=2.15610e+04, attn_norm=1.08999e+04
b32c512h16tfrs: nsamp=1024000, time=92.15, p0loss=2.7010, vloss=0.8183, pslr=6.465e-05,wdtc=1.750e+07, norm=2.18363e+04, attn_norm=1.09352e+04
b32c512h16tfrs: nsamp=1075200, time=91.02, p0loss=2.6985, vloss=0.8181, pslr=9.048e-05,wdtc=1.250e+07, norm=2.22096e+04, attn_norm=1.09886e+04
b32c512h16tfrs: nsamp=1126400, time=90.96, p0loss=2.6686, vloss=0.8091, pslr=9.051e-05,wdtc=1.250e+07, norm=2.27716e+04, attn_norm=1.10784e+04
b32c512h16tfrs: nsamp=1177600, time=91.88, p0loss=2.6281, vloss=0.8083, pslr=9.051e-05,wdtc=1.250e+07, norm=2.33297e+04, attn_norm=1.11689e+04
b32c512h16tfrs: nsamp=1228800, time=92.01, p0loss=2.6317, vloss=0.8048, pslr=9.051e-05,wdtc=1.250e+07, norm=2.38664e+04, attn_norm=1.12593e+04
b32c512h16tfrs: nsamp=1280000, time=92.11, p0loss=2.6507, vloss=0.8035, pslr=1.206e-04,wdtc=9.377e+06, norm=2.44809e+04, attn_norm=1.13651e+04
b32c512h16tfrs: nsamp=1331200, time=91.82, p0loss=2.6685, vloss=0.8061, pslr=1.508e-04,wdtc=7.502e+06, norm=2.59186e+04, attn_norm=1.16415e+04
b32c512h16tfrs: nsamp=1382400, time=91.30, p0loss=2.6571, vloss=0.8005, pslr=1.508e-04,wdtc=7.500e+06, norm=2.75201e+04, attn_norm=1.19590e+04
b32c512h16tfrs: nsamp=1433600, time=90.80, p0loss=2.6211, vloss=0.7938, pslr=1.508e-04,wdtc=7.500e+06, norm=2.90381e+04, attn_norm=1.22702e+04
b32c512h16tfrs: nsamp=1484800, time=91.32, p0loss=2.5808, vloss=0.7930, pslr=1.508e-04,wdtc=7.500e+06, norm=3.04796e+04, attn_norm=1.25738e+04
b32c512h16tfrs: nsamp=1536000, time=91.88, p0loss=2.5780, vloss=0.7939, pslr=1.885e-04,wdtc=6.001e+06, norm=3.20358e+04, attn_norm=1.29059e+04
b32c512h16tfrs: nsamp=1587200, time=91.77, p0loss=2.6118, vloss=0.7950, pslr=2.262e-04,wdtc=5.001e+06, norm=3.50002e+04, attn_norm=1.35618e+04
b32c512h16tfrs: nsamp=1638400, time=91.89, p0loss=2.5871, vloss=0.7922, pslr=2.263e-04,wdtc=5.000e+06, norm=3.82041e+04, attn_norm=1.42966e+04
b32c512h16tfrs: nsamp=1689600, time=91.98, p0loss=2.5680, vloss=0.7886, pslr=2.263e-04,wdtc=5.000e+06, norm=4.12383e+04, attn_norm=1.50174e+04
b32c512h16tfrs: nsamp=1740800, time=91.97, p0loss=2.5715, vloss=0.7835, pslr=2.263e-04,wdtc=5.000e+06, norm=4.41623e+04, attn_norm=1.57315e+04
b32c512h16tfrs: nsamp=1792000, time=91.82, p0loss=2.5519, vloss=0.7824, pslr=2.747e-04,wdtc=4.118e+06, norm=4.72550e+04, attn_norm=1.65076e+04
b32c512h16tfrs: nsamp=1843200, time=91.01, p0loss=2.5608, vloss=0.7860, pslr=3.232e-04,wdtc=3.501e+06, norm=5.26741e+04, attn_norm=1.78897e+04
b32c512h16tfrs: nsamp=1894400, time=90.58, p0loss=2.5654, vloss=0.7867, pslr=3.232e-04,wdtc=3.500e+06, norm=5.86777e+04, attn_norm=1.94630e+04
b32c512h16tfrs: nsamp=1945600, time=90.48, p0loss=2.5697, vloss=0.7769, pslr=3.232e-04,wdtc=3.500e+06, norm=6.43944e+04, attn_norm=2.10118e+04
b32c512h16tfrs: nsamp=1996800, time=90.60, p0loss=2.5571, vloss=0.7726, pslr=3.232e-04,wdtc=3.500e+06, norm=6.98839e+04, attn_norm=2.25559e+04
b32c512h16tfrs: nsamp=2048000, time=90.70, p0loss=2.5677, vloss=0.7755, pslr=3.878e-04,wdtc=2.917e+06, norm=7.56328e+04, attn_norm=2.42060e+04
b32c512h16tfrs: nsamp=2099200, time=90.66, p0loss=2.5789, vloss=0.7868, pslr=4.525e-04,wdtc=2.500e+06, norm=8.56802e+04, attn_norm=2.70664e+04
b32c512h16tfrs: nsamp=2150400, time=90.53, p0loss=2.5785, vloss=0.7856, pslr=4.525e-04,wdtc=2.500e+06, norm=9.68290e+04, attn_norm=3.03042e+04
b32c512h16tfrs: nsamp=2201600, time=90.61, p0loss=2.5611, vloss=0.7804, pslr=4.525e-04,wdtc=2.500e+06, norm=1.07435e+05, attn_norm=3.35049e+04
b32c512h16tfrs: nsamp=2252800, time=90.61, p0loss=2.5623, vloss=0.7758, pslr=4.525e-04,wdtc=2.500e+06, norm=1.17591e+05, attn_norm=3.66669e+04
b32c512h16tfrs: nsamp=2304000, time=90.64, p0loss=2.5389, vloss=0.7771, pslr=4.525e-04,wdtc=2.500e+06, norm=1.27308e+05, attn_norm=3.97935e+04
b32c512h16tfrs: nsamp=2355200, time=91.18, p0loss=2.5152, vloss=0.7717, pslr=4.525e-04,wdtc=2.500e+06, norm=1.36502e+05, attn_norm=4.28668e+04
b32c512h16tfrs: nsamp=2406400, time=90.67, p0loss=2.5115, vloss=0.7710, pslr=4.525e-04,wdtc=2.500e+06, norm=1.45397e+05, attn_norm=4.58870e+04
b32c512h16tfrs: nsamp=2457600, time=90.68, p0loss=2.5087, vloss=0.7736, pslr=4.525e-04,wdtc=2.500e+06, norm=1.53916e+05, attn_norm=4.88643e+04
b32c512h16tfrs: nsamp=2508800, time=90.67, p0loss=2.5113, vloss=0.7688, pslr=4.525e-04,wdtc=2.500e+06, norm=1.62071e+05, attn_norm=5.18200e+04
b32c512h16tfrs: nsamp=2560000, time=90.44, p0loss=2.5100, vloss=0.7660, pslr=4.525e-04,wdtc=2.500e+06, norm=1.69953e+05, attn_norm=5.47326e+04
b32c512h16tfrs: nsamp=2611200, time=90.58, p0loss=2.5131, vloss=0.7641, pslr=4.525e-04,wdtc=2.500e+06, norm=1.77527e+05, attn_norm=5.75879e+04
b32c512h16tfrs: nsamp=2662400, time=90.68, p0loss=2.4857, vloss=0.7644, pslr=4.525e-04,wdtc=2.500e+06, norm=1.84837e+05, attn_norm=6.04402e+04
b32c512h16tfrs: nsamp=2713600, time=90.57, p0loss=2.4858, vloss=0.7623, pslr=4.525e-04,wdtc=2.500e+06, norm=1.91720e+05, attn_norm=6.32198e+04
b32c512h16tfrs: nsamp=2764800, time=90.74, p0loss=2.4834, vloss=0.7621, pslr=4.525e-04,wdtc=2.500e+06, norm=1.98444e+05, attn_norm=6.59548e+04
b32c512h16tfrs: nsamp=2816000, time=91.23, p0loss=2.4735, vloss=0.7630, pslr=4.525e-04,wdtc=2.500e+06, norm=2.04899e+05, attn_norm=6.86626e+04
b32c512h16tfrs: nsamp=2867200, time=91.57, p0loss=2.4582, vloss=0.7604, pslr=4.525e-04,wdtc=2.500e+06, norm=2.11190e+05, attn_norm=7.13416e+04
b32c512h16tfrs: nsamp=2918400, time=91.52, p0loss=2.4555, vloss=0.7573, pslr=4.525e-04,wdtc=2.500e+06, norm=2.17156e+05, attn_norm=7.39704e+04
b32c512h16tfrs: nsamp=2969600, time=91.17, p0loss=2.4360, vloss=0.7543, pslr=4.525e-04,wdtc=2.500e+06, norm=2.22821e+05, attn_norm=7.65426e+04
b32c512h16tfrs: nsamp=3020800, time=91.33, p0loss=2.4435, vloss=0.7547, pslr=4.525e-04,wdtc=2.500e+06, norm=2.28446e+05, attn_norm=7.91176e+04
b32c512h16tfrs: nsamp=3072000, time=92.03, p0loss=2.4424, vloss=0.7510, pslr=4.525e-04,wdtc=2.500e+06, norm=2.33740e+05, attn_norm=8.16411e+04
b32c512h16tfrs: nsamp=3123200, time=90.89, p0loss=2.4246, vloss=0.7497, pslr=4.525e-04,wdtc=2.500e+06, norm=2.38869e+05, attn_norm=8.41119e+04
b32c512h16tfrs: nsamp=3174400, time=91.05, p0loss=2.4253, vloss=0.7525, pslr=4.525e-04,wdtc=2.500e+06, norm=2.43860e+05, attn_norm=8.65445e+04
b32c512h16tfrs: nsamp=3225600, time=91.53, p0loss=2.4171, vloss=0.7585, pslr=4.525e-04,wdtc=2.500e+06, norm=2.48615e+05, attn_norm=8.89572e+04
b32c512h16tfrs: nsamp=3276800, time=91.28, p0loss=2.4098, vloss=0.7503, pslr=4.525e-04,wdtc=2.500e+06, norm=2.53154e+05, attn_norm=9.13020e+04
b32c512h16tfrs: nsamp=3328000, time=91.23, p0loss=2.4132, vloss=0.7537, pslr=4.525e-04,wdtc=2.500e+06, norm=2.57643e+05, attn_norm=9.36336e+04
b32c512h16tfrs: nsamp=3379200, time=91.43, p0loss=2.4231, vloss=0.7473, pslr=4.525e-04,wdtc=2.500e+06, norm=2.61884e+05, attn_norm=9.59263e+04
b32c512h16tfrs: nsamp=3430400, time=91.40, p0loss=2.4171, vloss=0.7460, pslr=4.525e-04,wdtc=2.500e+06, norm=2.65949e+05, attn_norm=9.81728e+04
b32c512h16tfrs: nsamp=3481600, time=91.31, p0loss=2.4052, vloss=0.7516, pslr=4.525e-04,wdtc=2.500e+06, norm=2.69861e+05, attn_norm=1.00365e+05
b32c512h16tfrs: nsamp=3532800, time=91.13, p0loss=2.4211, vloss=0.7492, pslr=4.525e-04,wdtc=2.500e+06, norm=2.73701e+05, attn_norm=1.02555e+05
b32c512h16tfrs: nsamp=3584000, time=91.45, p0loss=2.3902, vloss=0.7454, pslr=4.525e-04,wdtc=2.500e+06, norm=2.77443e+05, attn_norm=1.04722e+05
b32c512h16tfrs: nsamp=3635200, time=91.38, p0loss=2.4014, vloss=0.7461, pslr=4.525e-04,wdtc=2.500e+06, norm=2.80893e+05, attn_norm=1.06808e+05
b32c512h16tfrs: nsamp=3686400, time=92.04, p0loss=2.3911, vloss=0.7510, pslr=4.525e-04,wdtc=2.500e+06, norm=2.84293e+05, attn_norm=1.08888e+05
b32c512h16tfrs: nsamp=3737600, time=91.42, p0loss=2.3736, vloss=0.7497, pslr=4.525e-04,wdtc=2.500e+06, norm=2.87568e+05, attn_norm=1.10915e+05
b32c512h16tfrs: nsamp=3788800, time=91.40, p0loss=2.3812, vloss=0.7490, pslr=4.525e-04,wdtc=2.500e+06, norm=2.90728e+05, attn_norm=1.12919e+05
b32c512h16tfrs: nsamp=3840000, time=91.06, p0loss=2.3916, vloss=0.7484, pslr=4.525e-04,wdtc=2.500e+06, norm=2.93928e+05, attn_norm=1.14913e+05
b32c512h16tfrs: nsamp=3891200, time=91.24, p0loss=2.3842, vloss=0.7462, pslr=4.525e-04,wdtc=2.500e+06, norm=2.96831e+05, attn_norm=1.16862e+05
b32c512h16tfrs: nsamp=3942400, time=91.56, p0loss=2.3776, vloss=0.7424, pslr=4.525e-04,wdtc=2.500e+06, norm=2.99776e+05, attn_norm=1.18801e+05
b32c512h16tfrs: nsamp=3993600, time=91.26, p0loss=2.3803, vloss=0.7489, pslr=4.525e-04,wdtc=2.500e+06, norm=3.02517e+05, attn_norm=1.20666e+05
b32c512h16tfrs: nsamp=4044800, time=91.17, p0loss=2.3812, vloss=0.7456, pslr=4.525e-04,wdtc=2.500e+06, norm=3.05267e+05, attn_norm=1.22521e+05
b32c512h16tfrs: nsamp=4096000, time=91.72, p0loss=2.4076, vloss=0.7499, pslr=4.525e-04,wdtc=2.500e+06, norm=3.07919e+05, attn_norm=1.24362e+05
b32c512h16tfrs: nsamp=4147200, time=91.81, p0loss=2.3986, vloss=0.7437, pslr=4.525e-04,wdtc=2.500e+06, norm=3.10430e+05, attn_norm=1.26162e+05
b32c512h16tfrs: nsamp=4198400, time=91.43, p0loss=2.3956, vloss=0.7445, pslr=4.525e-04,wdtc=2.500e+06, norm=3.12807e+05, attn_norm=1.27905e+05
b32c512h16tfrs: nsamp=4249600, time=91.79, p0loss=2.3706, vloss=0.7469, pslr=4.525e-04,wdtc=2.500e+06, norm=3.15117e+05, attn_norm=1.29626e+05
b32c512h16tfrs: nsamp=4300800, time=91.82, p0loss=2.3815, vloss=0.7425, pslr=4.525e-04,wdtc=2.500e+06, norm=3.17341e+05, attn_norm=1.31304e+05
b32c512h16tfrs: nsamp=4352000, time=90.77, p0loss=2.3873, vloss=0.7418, pslr=4.525e-04,wdtc=2.500e+06, norm=3.19529e+05, attn_norm=1.32973e+05
b32c512h16tfrs: nsamp=4403200, time=91.14, p0loss=2.3740, vloss=0.7437, pslr=4.525e-04,wdtc=2.500e+06, norm=3.21522e+05, attn_norm=1.34631e+05
b32c512h16tfrs: nsamp=4454400, time=90.56, p0loss=2.3728, vloss=0.7392, pslr=4.525e-04,wdtc=2.500e+06, norm=3.23391e+05, attn_norm=1.36235e+05
b32c512h16tfrs: nsamp=4505600, time=90.80, p0loss=2.3772, vloss=0.7386, pslr=4.525e-04,wdtc=2.500e+06, norm=3.25299e+05, attn_norm=1.37820e+05
b32c512h16tfrs: nsamp=4556800, time=90.76, p0loss=2.3567, vloss=0.7499, pslr=4.525e-04,wdtc=2.500e+06, norm=3.27148e+05, attn_norm=1.39374e+05
b32c512h16tfrs: nsamp=4608000, time=90.74, p0loss=2.3554, vloss=0.7442, pslr=4.525e-04,wdtc=2.500e+06, norm=3.28846e+05, attn_norm=1.40911e+05
b32c512h16tfrs: nsamp=4659200, time=90.59, p0loss=2.3574, vloss=0.7407, pslr=4.525e-04,wdtc=2.500e+06, norm=3.30581e+05, attn_norm=1.42428e+05
b32c512h16tfrs: nsamp=4710400, time=91.16, p0loss=2.3617, vloss=0.7428, pslr=4.525e-04,wdtc=2.500e+06, norm=3.32198e+05, attn_norm=1.43910e+05
b32c512h16tfrs: nsamp=4761600, time=91.65, p0loss=2.3705, vloss=0.7415, pslr=4.525e-04,wdtc=2.500e+06, norm=3.33756e+05, attn_norm=1.45374e+05
b32c512h16tfrs: nsamp=4812800, time=91.47, p0loss=2.3445, vloss=0.7452, pslr=4.525e-04,wdtc=2.500e+06, norm=3.35297e+05, attn_norm=1.46812e+05
b32c512h16tfrs: nsamp=4864000, time=91.43, p0loss=2.3564, vloss=0.7436, pslr=4.525e-04,wdtc=2.500e+06, norm=3.36905e+05, attn_norm=1.48223e+05
b32c512h16tfrs: nsamp=4915200, time=91.51, p0loss=2.3670, vloss=0.7439, pslr=4.525e-04,wdtc=2.500e+06, norm=3.38469e+05, attn_norm=1.49616e+05
Finished training subepoch!
Saving checkpoint: /root/katago_project/data/train/b32c512h16tfrs/checkpoint.ckpt
Beginning validation after epoch!
p0loss = 2.371327, p1loss = 0.403663, p0softloss = 5.244575, p1softloss = 0.704181, p0lopt = 0.000000, p0loptw = 0.000000, p0sopt = 0.613158, p0soptw = 0.267776, vloss = 0.811173, tdvloss1 = 0.365517, tdvloss2 = 0.303757, tdvloss3 = 0.274555, tdsloss = 0.020529, oloss = 0.477573, sloss = 0.386974, fploss = 0.183456, skloss = 0.100276, smloss = 0.057572, sbcdfloss = 0.147423, sbpdfloss = 0.075032, sdregloss = 0.001177, leadloss = 0.034795, vtimeloss = 0.057126, evstloss = 0.105734, esstloss = 0.038341, loss = 52.867196, pacc1 = 0.381552, vsquare = 0.234407, wsum = 160227.439941, nsamp = 160256.000000, ptentr = 0.924267, ptsoftentr = 4.970883, sekiweightscale = 6.998752, norm_normal_batch = 339940.406250, norm_normal_gamma_batch = 0.000000, norm_normal_attn_batch = 151028.843750, norm_output_batch = 616.270752, norm_noreg_batch = 16343.283203, norm_output_noreg_batch = 0.470863, nsamp_train = 4941824.000000, wsum_train = 4940961.678711
Validation took 206.4651083550416 seconds
Validating swa_scale=32.0
p0loss = 5.730498, p1loss = 0.750217, p0softloss = 5.755858, p1softloss = 0.752721, p0lopt = 0.000000, p0loptw = 0.000000, p0sopt = 1.660202, p0soptw = 0.288589, vloss = 0.927027, tdvloss1 = 0.541502, tdvloss2 = 0.542350, tdvloss3 = 0.532433, tdsloss = 0.035438, oloss = 0.902265, sloss = 0.806033, fploss = 2.026373, skloss = 2.162421, smloss = 0.057721, sbcdfloss = 1.087254, sbpdfloss = 0.114362, sdregloss = 1.857879, leadloss = 0.050666, vtimeloss = 0.072517, evstloss = 0.155741, esstloss = 0.095019, loss = 69.026912, pacc1 = 0.116464, vsquare = 0.002068, wsum = 160227.439941, nsamp = 160256.000000, ptentr = 0.924267, ptsoftentr = 4.970883, sekiweightscale = 6.998752, norm_normal_batch = 21441.884766, norm_normal_gamma_batch = 0.000000, norm_normal_attn_batch = 11459.857422, norm_output_batch = 608.438660, norm_noreg_batch = 16380.805664, norm_output_noreg_batch = 0.370966, nsamp_train = 4941824.000000, wsum_train = 4940961.678711
Validation swa took 202.26599425077438 seconds
Export cycle counter = 1
Skipping export model this time
GC collect
=========================================================================
BEGINNING NEXT EPOCH 1
=========================================================================
Current time: 2026-02-28 09:27:52.167718
Global step: 4941824 samples
Currently up to data row 95431473
Training dir: /root/katago_project/data/train/b32c512h16tfrs
Export dir: /root/katago_project/data/torchmodels_toexport_extra
Current grad scale: 128.0
Beginning training subepoch!
Currently up to data row 95431473
b32c512h16tfrs: nsamp=4993024, time=103.58, p0loss=2.3521, vloss=0.7435, pslr=4.525e-04,wdtc=2.500e+06, norm=3.40253e+05, attn_norm=1.51341e+05
b32c512h16tfrs: nsamp=5044224, time=91.48, p0loss=2.3640, vloss=0.7370, pslr=4.525e-04,wdtc=2.500e+06, norm=3.41984e+05, attn_norm=1.53026e+05
b32c512h16tfrs: nsamp=5095424, time=91.44, p0loss=2.3512, vloss=0.7390, pslr=4.525e-04,wdtc=2.500e+06, norm=3.43257e+05, attn_norm=1.54360e+05
b32c512h16tfrs: nsamp=5146624, time=91.24, p0loss=2.3540, vloss=0.7353, pslr=4.525e-04,wdtc=2.500e+06, norm=3.44400e+05, attn_norm=1.55660e+05
b32c512h16tfrs: nsamp=5197824, time=91.66, p0loss=2.3472, vloss=0.7432, pslr=4.525e-04,wdtc=2.500e+06, norm=3.45617e+05, attn_norm=1.56947e+05
b32c512h16tfrs: nsamp=5249024, time=91.72, p0loss=2.3612, vloss=0.7408, pslr=4.525e-04,wdtc=2.500e+06, norm=3.46792e+05, attn_norm=1.58206e+05
b32c512h16tfrs: nsamp=5300224, time=91.75, p0loss=2.3537, vloss=0.7432, pslr=4.525e-04,wdtc=2.500e+06, norm=3.47859e+05, attn_norm=1.59430e+05
b32c512h16tfrs: nsamp=5351424, time=91.58, p0loss=2.3499, vloss=0.7420, pslr=4.525e-04,wdtc=2.500e+06, norm=3.49074e+05, attn_norm=1.60676e+05
b32c512h16tfrs: nsamp=5402624, time=91.72, p0loss=2.3563, vloss=0.7357, pslr=4.525e-04,wdtc=2.500e+06, norm=3.50257e+05, attn_norm=1.61865e+05
b32c512h16tfrs: nsamp=5453824, time=91.09, p0loss=2.3423, vloss=0.7371, pslr=4.525e-04,wdtc=2.500e+06, norm=3.51375e+05, attn_norm=1.63063e+05
b32c512h16tfrs: nsamp=5505024, time=91.75, p0loss=2.3328, vloss=0.7354, pslr=4.525e-04,wdtc=2.500e+06, norm=3.52423e+05, attn_norm=1.64218e+05
b32c512h16tfrs: nsamp=5556224, time=91.57, p0loss=2.3496, vloss=0.7343, pslr=4.525e-04,wdtc=2.500e+06, norm=3.53498e+05, attn_norm=1.65382e+05
b32c512h16tfrs: nsamp=5607424, time=90.92, p0loss=2.3427, vloss=0.7330, pslr=4.525e-04,wdtc=2.500e+06, norm=3.54540e+05, attn_norm=1.66523e+05
b32c512h16tfrs: nsamp=5658624, time=91.10, p0loss=2.3543, vloss=0.7390, pslr=4.525e-04,wdtc=2.500e+06, norm=3.55501e+05, attn_norm=1.67655e+05
b32c512h16tfrs: nsamp=5709824, time=91.65, p0loss=2.3508, vloss=0.7413, pslr=4.525e-04,wdtc=2.500e+06, norm=3.56354e+05, attn_norm=1.68762e+05
b32c512h16tfrs: nsamp=5761024, time=91.70, p0loss=2.3488, vloss=0.7359, pslr=4.525e-04,wdtc=2.500e+06, norm=3.57222e+05, attn_norm=1.69840e+05
b32c512h16tfrs: nsamp=5812224, time=91.06, p0loss=2.3498, vloss=0.7383, pslr=4.525e-04,wdtc=2.500e+06, norm=3.58105e+05, attn_norm=1.70925e+05
b32c512h16tfrs: nsamp=5863424, time=90.67, p0loss=2.3511, vloss=0.7355, pslr=4.525e-04,wdtc=2.500e+06, norm=3.58908e+05, attn_norm=1.71966e+05
b32c512h16tfrs: nsamp=5914624, time=90.49, p0loss=2.3419, vloss=0.7363, pslr=4.525e-04,wdtc=2.500e+06, norm=3.59657e+05, attn_norm=1.72997e+05
b32c512h16tfrs: nsamp=5965824, time=91.14, p0loss=2.3247, vloss=0.7370, pslr=4.525e-04,wdtc=2.500e+06, norm=3.60416e+05, attn_norm=1.73980e+05
b32c512h16tfrs: nsamp=6017024, time=91.51, p0loss=2.3367, vloss=0.7363, pslr=4.525e-04,wdtc=2.500e+06, norm=3.61222e+05, attn_norm=1.74979e+05
b32c512h16tfrs: nsamp=6068224, time=91.44, p0loss=2.3352, vloss=0.7323, pslr=4.525e-04,wdtc=2.500e+06, norm=3.62030e+05, attn_norm=1.75990e+05
b32c512h16tfrs: nsamp=6119424, time=90.51, p0loss=2.3254, vloss=0.7275, pslr=4.525e-04,wdtc=2.500e+06, norm=3.62693e+05, attn_norm=1.76970e+05
b32c512h16tfrs: nsamp=6170624, time=91.35, p0loss=2.3338, vloss=0.7355, pslr=4.525e-04,wdtc=2.500e+06, norm=3.63397e+05, attn_norm=1.77934e+05
b32c512h16tfrs: nsamp=6221824, time=91.83, p0loss=2.3547, vloss=0.7406, pslr=4.525e-04,wdtc=2.500e+06, norm=3.64101e+05, attn_norm=1.78895e+05
b32c512h16tfrs: nsamp=6273024, time=91.73, p0loss=2.3373, vloss=0.7399, pslr=4.525e-04,wdtc=2.500e+06, norm=3.64861e+05, attn_norm=1.79847e+05
b32c512h16tfrs: nsamp=6324224, time=91.54, p0loss=2.3476, vloss=0.7335, pslr=4.525e-04,wdtc=2.500e+06, norm=3.65583e+05, attn_norm=1.80750e+05
b32c512h16tfrs: nsamp=6375424, time=91.22, p0loss=2.3536, vloss=0.7335, pslr=4.525e-04,wdtc=2.500e+06, norm=3.66364e+05, attn_norm=1.81628e+05
b32c512h16tfrs: nsamp=6426624, time=91.07, p0loss=2.3519, vloss=0.7291, pslr=4.525e-04,wdtc=2.500e+06, norm=3.67164e+05, attn_norm=1.82505e+05
b32c512h16tfrs: nsamp=6477824, time=90.88, p0loss=2.3437, vloss=0.7398, pslr=4.525e-04,wdtc=2.500e+06, norm=3.67921e+05, attn_norm=1.83404e+05
b32c512h16tfrs: nsamp=6529024, time=90.91, p0loss=2.3305, vloss=0.7327, pslr=4.525e-04,wdtc=2.500e+06, norm=3.68612e+05, attn_norm=1.84280e+05
b32c512h16tfrs: nsamp=6580224, time=91.84, p0loss=2.3431, vloss=0.7300, pslr=4.525e-04,wdtc=2.500e+06, norm=3.69236e+05, attn_norm=1.85123e+05
b32c512h16tfrs: nsamp=6631424, time=91.65, p0loss=2.3354, vloss=0.7307, pslr=4.525e-04,wdtc=2.500e+06, norm=3.69701e+05, attn_norm=1.85976e+05
b32c512h16tfrs: nsamp=6682624, time=91.73, p0loss=2.3138, vloss=0.7353, pslr=4.525e-04,wdtc=2.500e+06, norm=3.70236e+05, attn_norm=1.86783e+05
b32c512h16tfrs: nsamp=6733824, time=91.64, p0loss=2.3253, vloss=0.7294, pslr=4.525e-04,wdtc=2.500e+06, norm=3.70798e+05, attn_norm=1.87602e+05
b32c512h16tfrs: nsamp=6785024, time=91.11, p0loss=2.3214, vloss=0.7316, pslr=4.525e-04,wdtc=2.500e+06, norm=3.71325e+05, attn_norm=1.88397e+05
b32c512h16tfrs: nsamp=6836224, time=90.81, p0loss=2.3189, vloss=0.7354, pslr=4.525e-04,wdtc=2.500e+06, norm=3.71732e+05, attn_norm=1.89183e+05
b32c512h16tfrs: nsamp=6887424, time=91.48, p0loss=2.3331, vloss=0.7322, pslr=4.525e-04,wdtc=2.500e+06, norm=3.72269e+05, attn_norm=1.89958e+05
b32c512h16tfrs: nsamp=6938624, time=91.70, p0loss=2.3307, vloss=0.7305, pslr=4.525e-04,wdtc=2.500e+06, norm=3.72760e+05, attn_norm=1.90748e+05
b32c512h16tfrs: nsamp=6989824, time=91.85, p0loss=2.3322, vloss=0.7290, pslr=4.525e-04,wdtc=2.500e+06, norm=3.73063e+05, attn_norm=1.91508e+05
b32c512h16tfrs: nsamp=7041024, time=91.48, p0loss=2.3349, vloss=0.7346, pslr=4.525e-04,wdtc=2.500e+06, norm=3.73512e+05, attn_norm=1.92236e+05
b32c512h16tfrs: nsamp=7092224, time=90.61, p0loss=2.3405, vloss=0.7334, pslr=4.525e-04,wdtc=2.500e+06, norm=3.73959e+05, attn_norm=1.92968e+05
b32c512h16tfrs: nsamp=7143424, time=90.72, p0loss=2.3320, vloss=0.7382, pslr=4.525e-04,wdtc=2.500e+06, norm=3.74319e+05, attn_norm=1.93692e+05
b32c512h16tfrs: nsamp=7194624, time=90.76, p0loss=2.3322, vloss=0.7354, pslr=4.525e-04,wdtc=2.500e+06, norm=3.74645e+05, attn_norm=1.94404e+05
b32c512h16tfrs: nsamp=7245824, time=90.89, p0loss=2.3232, vloss=0.7362, pslr=4.525e-04,wdtc=2.500e+06, norm=3.74941e+05, attn_norm=1.95070e+05
b32c512h16tfrs: nsamp=7297024, time=91.53, p0loss=2.3302, vloss=0.7369, pslr=4.525e-04,wdtc=2.500e+06, norm=3.75137e+05, attn_norm=1.95760e+05
b32c512h16tfrs: nsamp=7348224, time=90.78, p0loss=2.3262, vloss=0.7386, pslr=4.525e-04,wdtc=2.500e+06, norm=3.75462e+05, attn_norm=1.96438e+05
b32c512h16tfrs: nsamp=7399424, time=91.42, p0loss=2.3230, vloss=0.7334, pslr=4.525e-04,wdtc=2.500e+06, norm=3.75883e+05, attn_norm=1.97085e+05
b32c512h16tfrs: nsamp=7450624, time=91.44, p0loss=2.3321, vloss=0.7272, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76176e+05, attn_norm=1.97728e+05
b32c512h16tfrs: nsamp=7501824, time=90.56, p0loss=2.3256, vloss=0.7302, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76518e+05, attn_norm=1.98398e+05
b32c512h16tfrs: nsamp=7553024, time=90.88, p0loss=2.3355, vloss=0.7362, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76740e+05, attn_norm=1.99025e+05
b32c512h16tfrs: nsamp=7604224, time=90.71, p0loss=2.3295, vloss=0.7330, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77134e+05, attn_norm=1.99657e+05
b32c512h16tfrs: nsamp=7655424, time=90.70, p0loss=2.3391, vloss=0.7295, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77586e+05, attn_norm=2.00292e+05
b32c512h16tfrs: nsamp=7706624, time=90.55, p0loss=2.3324, vloss=0.7272, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77925e+05, attn_norm=2.00918e+05
b32c512h16tfrs: nsamp=7757824, time=90.71, p0loss=2.3370, vloss=0.7262, pslr=4.525e-04,wdtc=2.500e+06, norm=3.78145e+05, attn_norm=2.01514e+05
b32c512h16tfrs: nsamp=7809024, time=91.34, p0loss=2.3346, vloss=0.7333, pslr=4.525e-04,wdtc=2.500e+06, norm=3.78176e+05, attn_norm=2.02100e+05
b32c512h16tfrs: nsamp=7860224, time=91.91, p0loss=2.3308, vloss=0.7334, pslr=4.525e-04,wdtc=2.500e+06, norm=3.78346e+05, attn_norm=2.02688e+05
b32c512h16tfrs: nsamp=7911424, time=90.57, p0loss=2.3336, vloss=0.7381, pslr=4.525e-04,wdtc=2.500e+06, norm=3.78687e+05, attn_norm=2.03293e+05
b32c512h16tfrs: nsamp=7962624, time=91.40, p0loss=2.3321, vloss=0.7308, pslr=4.525e-04,wdtc=2.500e+06, norm=3.78987e+05, attn_norm=2.03875e+05
b32c512h16tfrs: nsamp=8013824, time=92.00, p0loss=2.3478, vloss=0.7353, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79022e+05, attn_norm=2.04441e+05
b32c512h16tfrs: nsamp=8065024, time=91.84, p0loss=2.3165, vloss=0.7308, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79218e+05, attn_norm=2.04961e+05
b32c512h16tfrs: nsamp=8116224, time=90.59, p0loss=2.3330, vloss=0.7295, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79599e+05, attn_norm=2.05488e+05
b32c512h16tfrs: nsamp=8167424, time=90.70, p0loss=2.3156, vloss=0.7336, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79658e+05, attn_norm=2.06019e+05
b32c512h16tfrs: nsamp=8218624, time=90.70, p0loss=2.3093, vloss=0.7358, pslr=4.525e-04,wdtc=2.500e+06, norm=3.80027e+05, attn_norm=2.06565e+05
b32c512h16tfrs: nsamp=8269824, time=90.95, p0loss=2.3212, vloss=0.7275, pslr=4.525e-04,wdtc=2.500e+06, norm=3.80276e+05, attn_norm=2.07083e+05
b32c512h16tfrs: nsamp=8321024, time=90.56, p0loss=2.3141, vloss=0.7238, pslr=4.525e-04,wdtc=2.500e+06, norm=3.80417e+05, attn_norm=2.07578e+05
b32c512h16tfrs: nsamp=8372224, time=90.70, p0loss=2.3352, vloss=0.7305, pslr=4.525e-04,wdtc=2.500e+06, norm=3.80563e+05, attn_norm=2.08068e+05
b32c512h16tfrs: nsamp=8423424, time=90.66, p0loss=2.3305, vloss=0.7305, pslr=4.525e-04,wdtc=2.500e+06, norm=3.80513e+05, attn_norm=2.08612e+05
b32c512h16tfrs: nsamp=8474624, time=90.50, p0loss=2.3355, vloss=0.7246, pslr=4.525e-04,wdtc=2.500e+06, norm=3.80694e+05, attn_norm=2.09135e+05
b32c512h16tfrs: nsamp=8525824, time=90.68, p0loss=2.3349, vloss=0.7307, pslr=4.525e-04,wdtc=2.500e+06, norm=3.80868e+05, attn_norm=2.09616e+05
b32c512h16tfrs: nsamp=8577024, time=91.49, p0loss=2.3194, vloss=0.7323, pslr=4.525e-04,wdtc=2.500e+06, norm=3.81147e+05, attn_norm=2.10096e+05
b32c512h16tfrs: nsamp=8628224, time=91.98, p0loss=2.3267, vloss=0.7335, pslr=4.525e-04,wdtc=2.500e+06, norm=3.81514e+05, attn_norm=2.10562e+05
b32c512h16tfrs: nsamp=8679424, time=91.91, p0loss=2.3377, vloss=0.7372, pslr=4.525e-04,wdtc=2.500e+06, norm=3.81834e+05, attn_norm=2.11028e+05
b32c512h16tfrs: nsamp=8730624, time=92.00, p0loss=2.3381, vloss=0.7292, pslr=4.525e-04,wdtc=2.500e+06, norm=3.82186e+05, attn_norm=2.11495e+05
b32c512h16tfrs: nsamp=8781824, time=91.29, p0loss=2.3311, vloss=0.7280, pslr=4.525e-04,wdtc=2.500e+06, norm=3.82362e+05, attn_norm=2.11968e+05
b32c512h16tfrs: nsamp=8833024, time=90.61, p0loss=2.3312, vloss=0.7261, pslr=4.525e-04,wdtc=2.500e+06, norm=3.82546e+05, attn_norm=2.12441e+05
b32c512h16tfrs: nsamp=8884224, time=90.52, p0loss=2.3077, vloss=0.7297, pslr=4.525e-04,wdtc=2.500e+06, norm=3.82691e+05, attn_norm=2.12866e+05
b32c512h16tfrs: nsamp=8935424, time=90.76, p0loss=2.3159, vloss=0.7276, pslr=4.525e-04,wdtc=2.500e+06, norm=3.82943e+05, attn_norm=2.13292e+05
b32c512h16tfrs: nsamp=8986624, time=90.73, p0loss=2.3106, vloss=0.7262, pslr=4.525e-04,wdtc=2.500e+06, norm=3.82961e+05, attn_norm=2.13701e+05
b32c512h16tfrs: nsamp=9037824, time=90.73, p0loss=2.3322, vloss=0.7271, pslr=4.525e-04,wdtc=2.500e+06, norm=3.83095e+05, attn_norm=2.14149e+05
b32c512h16tfrs: nsamp=9089024, time=90.54, p0loss=2.3289, vloss=0.7270, pslr=4.525e-04,wdtc=2.500e+06, norm=3.83139e+05, attn_norm=2.14585e+05
b32c512h16tfrs: nsamp=9140224, time=91.68, p0loss=2.3301, vloss=0.7325, pslr=4.525e-04,wdtc=2.500e+06, norm=3.82806e+05, attn_norm=2.14959e+05
b32c512h16tfrs: nsamp=9191424, time=91.75, p0loss=2.3095, vloss=0.7331, pslr=4.525e-04,wdtc=2.500e+06, norm=3.82491e+05, attn_norm=2.15394e+05
b32c512h16tfrs: nsamp=9242624, time=92.00, p0loss=2.3133, vloss=0.7298, pslr=4.525e-04,wdtc=2.500e+06, norm=3.82356e+05, attn_norm=2.15806e+05
b32c512h16tfrs: nsamp=9293824, time=91.24, p0loss=2.3143, vloss=0.7324, pslr=4.525e-04,wdtc=2.500e+06, norm=3.82601e+05, attn_norm=2.16239e+05
b32c512h16tfrs: nsamp=9345024, time=91.96, p0loss=2.3228, vloss=0.7307, pslr=4.525e-04,wdtc=2.500e+06, norm=3.82558e+05, attn_norm=2.16641e+05
b32c512h16tfrs: nsamp=9396224, time=91.84, p0loss=2.3305, vloss=0.7326, pslr=4.525e-04,wdtc=2.500e+06, norm=3.82444e+05, attn_norm=2.17026e+05
b32c512h16tfrs: nsamp=9447424, time=91.92, p0loss=2.3132, vloss=0.7280, pslr=4.525e-04,wdtc=2.500e+06, norm=3.82730e+05, attn_norm=2.17386e+05
b32c512h16tfrs: nsamp=9498624, time=90.74, p0loss=2.3212, vloss=0.7328, pslr=4.525e-04,wdtc=2.500e+06, norm=3.83089e+05, attn_norm=2.17770e+05
b32c512h16tfrs: nsamp=9549824, time=91.22, p0loss=2.3270, vloss=0.7287, pslr=4.525e-04,wdtc=2.500e+06, norm=3.83332e+05, attn_norm=2.18197e+05
b32c512h16tfrs: nsamp=9601024, time=91.49, p0loss=2.3329, vloss=0.7302, pslr=4.525e-04,wdtc=2.500e+06, norm=3.83278e+05, attn_norm=2.18563e+05
b32c512h16tfrs: nsamp=9652224, time=90.56, p0loss=2.3168, vloss=0.7334, pslr=4.525e-04,wdtc=2.500e+06, norm=3.83313e+05, attn_norm=2.18932e+05
b32c512h16tfrs: nsamp=9703424, time=91.67, p0loss=2.3102, vloss=0.7300, pslr=4.525e-04,wdtc=2.500e+06, norm=3.83597e+05, attn_norm=2.19292e+05
b32c512h16tfrs: nsamp=9754624, time=91.68, p0loss=2.3047, vloss=0.7275, pslr=4.525e-04,wdtc=2.500e+06, norm=3.83869e+05, attn_norm=2.19676e+05
b32c512h16tfrs: nsamp=9805824, time=91.79, p0loss=2.3134, vloss=0.7307, pslr=4.525e-04,wdtc=2.500e+06, norm=3.83798e+05, attn_norm=2.20018e+05
b32c512h16tfrs: nsamp=9857024, time=91.76, p0loss=2.3420, vloss=0.7301, pslr=4.525e-04,wdtc=2.500e+06, norm=3.83856e+05, attn_norm=2.20347e+05
Finished training subepoch!
Saving checkpoint: /root/katago_project/data/train/b32c512h16tfrs/checkpoint.ckpt
Beginning validation after epoch!
p0loss = 2.332301, p1loss = 0.399646, p0softloss = 5.236738, p1softloss = 0.703438, p0lopt = 0.000000, p0loptw = 0.000000, p0sopt = 0.478309, p0soptw = 0.214303, vloss = 0.743516, tdvloss1 = 0.309186, tdvloss2 = 0.245162, tdvloss3 = 0.221296, tdsloss = 0.017075, oloss = 0.476121, sloss = 0.384582, fploss = 0.181053, skloss = 0.100680, smloss = 0.050213, sbcdfloss = 0.089741, sbpdfloss = 0.062151, sdregloss = 0.000099, leadloss = 0.025250, vtimeloss = 0.055087, evstloss = 0.069294, esstloss = 0.033772, loss = 52.450081, pacc1 = 0.387430, vsquare = 0.149626, wsum = 160227.439941, nsamp = 160256.000000, ptentr = 0.924267, ptsoftentr = 4.970883, sekiweightscale = 6.998752, norm_normal_batch = 384012.687500, norm_normal_gamma_batch = 0.000000, norm_normal_attn_batch = 220548.906250, norm_output_batch = 602.971008, norm_noreg_batch = 16306.035156, norm_output_noreg_batch = 0.505753, nsamp_train = 9867264.000000, wsum_train = 9865696.957520
Validation took 202.6623823158443 seconds
Validating swa_scale=32.0
p0loss = 5.687970, p1loss = 0.744349, p0softloss = 5.746513, p1softloss = 0.751798, p0lopt = 0.000000, p0loptw = 0.000000, p0sopt = 1.670241, p0soptw = 0.291740, vloss = 0.864351, tdvloss1 = 0.456863, tdvloss2 = 0.382019, tdvloss3 = 0.414191, tdsloss = 0.041455, oloss = 0.889823, sloss = 0.771803, fploss = 2.000546, skloss = 0.326982, smloss = 0.074104, sbcdfloss = 0.397655, sbpdfloss = 0.093611, sdregloss = 0.709961, leadloss = 0.053432, vtimeloss = 0.068573, evstloss = 0.170142, esstloss = 0.112099, loss = 64.949435, pacc1 = 0.113668, vsquare = 0.002587, wsum = 160227.439941, nsamp = 160256.000000, ptentr = 0.924267, ptsoftentr = 4.970883, sekiweightscale = 6.998752, norm_normal_batch = 36461.734375, norm_normal_gamma_batch = 0.000000, norm_normal_attn_batch = 21986.136719, norm_output_batch = 602.493591, norm_noreg_batch = 16365.204102, norm_output_noreg_batch = 0.386196, nsamp_train = 9867264.000000, wsum_train = 9865696.957520
Validation swa took 202.0242652320303 seconds
Export cycle counter = 1
Skipping export model this time
GC collect
=========================================================================
BEGINNING NEXT EPOCH 2
=========================================================================
Current time: 2026-02-28 12:01:09.575621
Global step: 9867264 samples
Currently up to data row 95431473
Training dir: /root/katago_project/data/train/b32c512h16tfrs
Export dir: /root/katago_project/data/torchmodels_toexport_extra
Current grad scale: 256.0
Beginning training subepoch!
Currently up to data row 95431473
b32c512h16tfrs: nsamp=9918464, time=102.95, p0loss=2.3293, vloss=0.7263, pslr=4.525e-04,wdtc=2.500e+06, norm=3.83989e+05, attn_norm=2.20695e+05
b32c512h16tfrs: nsamp=9969664, time=91.35, p0loss=2.3443, vloss=0.7289, pslr=4.525e-04,wdtc=2.500e+06, norm=3.83790e+05, attn_norm=2.21065e+05
b32c512h16tfrs: nsamp=10020864, time=91.14, p0loss=2.3262, vloss=0.7283, pslr=4.525e-04,wdtc=2.500e+06, norm=3.83790e+05, attn_norm=2.21391e+05
b32c512h16tfrs: nsamp=10072064, time=91.16, p0loss=2.3368, vloss=0.7319, pslr=4.525e-04,wdtc=2.500e+06, norm=3.83797e+05, attn_norm=2.21732e+05
b32c512h16tfrs: nsamp=10123264, time=91.29, p0loss=2.3353, vloss=0.7299, pslr=4.525e-04,wdtc=2.500e+06, norm=3.83805e+05, attn_norm=2.22096e+05
b32c512h16tfrs: nsamp=10174464, time=90.92, p0loss=2.3259, vloss=0.7242, pslr=4.525e-04,wdtc=2.500e+06, norm=3.83879e+05, attn_norm=2.22463e+05
b32c512h16tfrs: nsamp=10225664, time=90.69, p0loss=2.3175, vloss=0.7221, pslr=4.525e-04,wdtc=2.500e+06, norm=3.84071e+05, attn_norm=2.22818e+05
b32c512h16tfrs: nsamp=10276864, time=90.56, p0loss=2.3223, vloss=0.7220, pslr=4.525e-04,wdtc=2.500e+06, norm=3.84288e+05, attn_norm=2.23145e+05
b32c512h16tfrs: nsamp=10328064, time=90.71, p0loss=2.3264, vloss=0.7358, pslr=4.525e-04,wdtc=2.500e+06, norm=3.84257e+05, attn_norm=2.23474e+05
b32c512h16tfrs: nsamp=10379264, time=90.68, p0loss=2.3292, vloss=0.7270, pslr=4.525e-04,wdtc=2.500e+06, norm=3.84365e+05, attn_norm=2.23811e+05
b32c512h16tfrs: nsamp=10430464, time=90.76, p0loss=2.3366, vloss=0.7277, pslr=4.525e-04,wdtc=2.500e+06, norm=3.84437e+05, attn_norm=2.24128e+05
b32c512h16tfrs: nsamp=10481664, time=91.57, p0loss=2.3306, vloss=0.7261, pslr=4.525e-04,wdtc=2.500e+06, norm=3.84419e+05, attn_norm=2.24409e+05
b32c512h16tfrs: nsamp=10532864, time=91.15, p0loss=2.3221, vloss=0.7299, pslr=4.525e-04,wdtc=2.500e+06, norm=3.84206e+05, attn_norm=2.24731e+05
b32c512h16tfrs: nsamp=10584064, time=91.44, p0loss=2.3227, vloss=0.7275, pslr=4.525e-04,wdtc=2.500e+06, norm=3.83975e+05, attn_norm=2.25021e+05
b32c512h16tfrs: nsamp=10635264, time=90.88, p0loss=2.3173, vloss=0.7368, pslr=4.525e-04,wdtc=2.500e+06, norm=3.83664e+05, attn_norm=2.25292e+05
b32c512h16tfrs: nsamp=10686464, time=90.51, p0loss=2.3161, vloss=0.7261, pslr=4.525e-04,wdtc=2.500e+06, norm=3.83464e+05, attn_norm=2.25611e+05
b32c512h16tfrs: nsamp=10737664, time=90.67, p0loss=2.3279, vloss=0.7255, pslr=4.525e-04,wdtc=2.500e+06, norm=3.83242e+05, attn_norm=2.25903e+05
b32c512h16tfrs: nsamp=10788864, time=91.31, p0loss=2.3309, vloss=0.7280, pslr=4.525e-04,wdtc=2.500e+06, norm=3.82948e+05, attn_norm=2.26189e+05
b32c512h16tfrs: nsamp=10840064, time=90.67, p0loss=2.3414, vloss=0.7256, pslr=4.525e-04,wdtc=2.500e+06, norm=3.82940e+05, attn_norm=2.26497e+05
b32c512h16tfrs: nsamp=10891264, time=90.54, p0loss=2.3375, vloss=0.7285, pslr=4.525e-04,wdtc=2.500e+06, norm=3.82972e+05, attn_norm=2.26788e+05
b32c512h16tfrs: nsamp=10942464, time=90.66, p0loss=2.3339, vloss=0.7304, pslr=4.525e-04,wdtc=2.500e+06, norm=3.82689e+05, attn_norm=2.27039e+05
b32c512h16tfrs: nsamp=10993664, time=91.04, p0loss=2.3185, vloss=0.7296, pslr=4.525e-04,wdtc=2.500e+06, norm=3.82692e+05, attn_norm=2.27318e+05
b32c512h16tfrs: nsamp=11044864, time=91.92, p0loss=2.3216, vloss=0.7274, pslr=4.525e-04,wdtc=2.500e+06, norm=3.82606e+05, attn_norm=2.27562e+05
b32c512h16tfrs: nsamp=11096064, time=90.66, p0loss=2.3268, vloss=0.7312, pslr=4.525e-04,wdtc=2.500e+06, norm=3.82625e+05, attn_norm=2.27836e+05
b32c512h16tfrs: nsamp=11147264, time=90.62, p0loss=2.3433, vloss=0.7349, pslr=4.525e-04,wdtc=2.500e+06, norm=3.82583e+05, attn_norm=2.28081e+05
b32c512h16tfrs: nsamp=11198464, time=90.65, p0loss=2.3292, vloss=0.7381, pslr=4.525e-04,wdtc=2.500e+06, norm=3.82323e+05, attn_norm=2.28332e+05
b32c512h16tfrs: nsamp=11249664, time=90.50, p0loss=2.3345, vloss=0.7338, pslr=4.525e-04,wdtc=2.500e+06, norm=3.82148e+05, attn_norm=2.28565e+05
b32c512h16tfrs: nsamp=11300864, time=90.65, p0loss=2.3342, vloss=0.7302, pslr=4.525e-04,wdtc=2.500e+06, norm=3.81680e+05, attn_norm=2.28825e+05
b32c512h16tfrs: nsamp=11352064, time=90.71, p0loss=2.3169, vloss=0.7252, pslr=4.525e-04,wdtc=2.500e+06, norm=3.81218e+05, attn_norm=2.29108e+05
b32c512h16tfrs: nsamp=11403264, time=90.68, p0loss=2.3376, vloss=0.7309, pslr=4.525e-04,wdtc=2.500e+06, norm=3.80851e+05, attn_norm=2.29374e+05
b32c512h16tfrs: nsamp=11454464, time=90.50, p0loss=2.3298, vloss=0.7276, pslr=4.525e-04,wdtc=2.500e+06, norm=3.80964e+05, attn_norm=2.29626e+05
b32c512h16tfrs: nsamp=11505664, time=90.66, p0loss=2.3437, vloss=0.7259, pslr=4.525e-04,wdtc=2.500e+06, norm=3.81090e+05, attn_norm=2.29858e+05
b32c512h16tfrs: nsamp=11556864, time=90.63, p0loss=2.3282, vloss=0.7297, pslr=4.525e-04,wdtc=2.500e+06, norm=3.81055e+05, attn_norm=2.30104e+05
b32c512h16tfrs: nsamp=11608064, time=90.67, p0loss=2.3204, vloss=0.7315, pslr=4.525e-04,wdtc=2.500e+06, norm=3.81035e+05, attn_norm=2.30323e+05
b32c512h16tfrs: nsamp=11659264, time=90.57, p0loss=2.3232, vloss=0.7265, pslr=4.525e-04,wdtc=2.500e+06, norm=3.81224e+05, attn_norm=2.30579e+05
b32c512h16tfrs: nsamp=11710464, time=90.76, p0loss=2.3180, vloss=0.7363, pslr=4.525e-04,wdtc=2.500e+06, norm=3.81200e+05, attn_norm=2.30827e+05
b32c512h16tfrs: nsamp=11761664, time=90.67, p0loss=2.3269, vloss=0.7348, pslr=4.525e-04,wdtc=2.500e+06, norm=3.81097e+05, attn_norm=2.31070e+05
b32c512h16tfrs: nsamp=11812864, time=90.65, p0loss=2.3221, vloss=0.7265, pslr=4.525e-04,wdtc=2.500e+06, norm=3.80662e+05, attn_norm=2.31271e+05
b32c512h16tfrs: nsamp=11864064, time=90.50, p0loss=2.3418, vloss=0.7257, pslr=4.525e-04,wdtc=2.500e+06, norm=3.80639e+05, attn_norm=2.31519e+05
b32c512h16tfrs: nsamp=11915264, time=90.72, p0loss=2.3389, vloss=0.7290, pslr=4.525e-04,wdtc=2.500e+06, norm=3.80452e+05, attn_norm=2.31769e+05
b32c512h16tfrs: nsamp=11966464, time=90.75, p0loss=2.3423, vloss=0.7260, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79953e+05, attn_norm=2.32000e+05
b32c512h16tfrs: nsamp=12017664, time=90.70, p0loss=2.3309, vloss=0.7276, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79752e+05, attn_norm=2.32234e+05
b32c512h16tfrs: nsamp=12068864, time=91.96, p0loss=2.3262, vloss=0.7263, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79885e+05, attn_norm=2.32440e+05
b32c512h16tfrs: nsamp=12120064, time=92.02, p0loss=2.3241, vloss=0.7329, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79952e+05, attn_norm=2.32656e+05
b32c512h16tfrs: nsamp=12171264, time=91.35, p0loss=2.3273, vloss=0.7307, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79876e+05, attn_norm=2.32894e+05
b32c512h16tfrs: nsamp=12222464, time=90.55, p0loss=2.3356, vloss=0.7317, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79724e+05, attn_norm=2.33104e+05
b32c512h16tfrs: nsamp=12273664, time=90.73, p0loss=2.3298, vloss=0.7321, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79657e+05, attn_norm=2.33332e+05
b32c512h16tfrs: nsamp=12324864, time=90.80, p0loss=2.3326, vloss=0.7265, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79850e+05, attn_norm=2.33548e+05
b32c512h16tfrs: nsamp=12376064, time=91.71, p0loss=2.3234, vloss=0.7231, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79701e+05, attn_norm=2.33754e+05
b32c512h16tfrs: nsamp=12427264, time=91.55, p0loss=2.3366, vloss=0.7240, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79757e+05, attn_norm=2.33930e+05
b32c512h16tfrs: nsamp=12478464, time=91.12, p0loss=2.3438, vloss=0.7284, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79790e+05, attn_norm=2.34112e+05
b32c512h16tfrs: nsamp=12529664, time=91.87, p0loss=2.3262, vloss=0.7243, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79812e+05, attn_norm=2.34294e+05
b32c512h16tfrs: nsamp=12580864, time=91.63, p0loss=2.3179, vloss=0.7328, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79911e+05, attn_norm=2.34455e+05
b32c512h16tfrs: nsamp=12632064, time=91.89, p0loss=2.2988, vloss=0.7385, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79835e+05, attn_norm=2.34602e+05
b32c512h16tfrs: nsamp=12683264, time=91.98, p0loss=2.3243, vloss=0.7301, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79518e+05, attn_norm=2.34780e+05
b32c512h16tfrs: nsamp=12734464, time=91.53, p0loss=2.3220, vloss=0.7316, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79539e+05, attn_norm=2.34963e+05
b32c512h16tfrs: nsamp=12785664, time=91.41, p0loss=2.3256, vloss=0.7285, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79282e+05, attn_norm=2.35153e+05
b32c512h16tfrs: nsamp=12836864, time=91.87, p0loss=2.3233, vloss=0.7263, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79523e+05, attn_norm=2.35337e+05
b32c512h16tfrs: nsamp=12888064, time=90.74, p0loss=2.3148, vloss=0.7272, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79892e+05, attn_norm=2.35541e+05
b32c512h16tfrs: nsamp=12939264, time=91.20, p0loss=2.3395, vloss=0.7223, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79902e+05, attn_norm=2.35710e+05
b32c512h16tfrs: nsamp=12990464, time=92.08, p0loss=2.3331, vloss=0.7219, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79930e+05, attn_norm=2.35900e+05
b32c512h16tfrs: nsamp=13041664, time=90.80, p0loss=2.3357, vloss=0.7217, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79803e+05, attn_norm=2.36104e+05
b32c512h16tfrs: nsamp=13092864, time=91.70, p0loss=2.3234, vloss=0.7317, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79618e+05, attn_norm=2.36304e+05
b32c512h16tfrs: nsamp=13144064, time=92.07, p0loss=2.3101, vloss=0.7260, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79349e+05, attn_norm=2.36462e+05
b32c512h16tfrs: nsamp=13195264, time=91.57, p0loss=2.3150, vloss=0.7353, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79157e+05, attn_norm=2.36608e+05
b32c512h16tfrs: nsamp=13246464, time=91.59, p0loss=2.3318, vloss=0.7304, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79151e+05, attn_norm=2.36812e+05
b32c512h16tfrs: nsamp=13297664, time=91.62, p0loss=2.3049, vloss=0.7260, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79259e+05, attn_norm=2.37027e+05
b32c512h16tfrs: nsamp=13348864, time=91.46, p0loss=2.3182, vloss=0.7314, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79418e+05, attn_norm=2.37208e+05
b32c512h16tfrs: nsamp=13400064, time=90.53, p0loss=2.3350, vloss=0.7256, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79495e+05, attn_norm=2.37362e+05
b32c512h16tfrs: nsamp=13451264, time=90.84, p0loss=2.3444, vloss=0.7212, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79479e+05, attn_norm=2.37499e+05
b32c512h16tfrs: nsamp=13502464, time=90.72, p0loss=2.3448, vloss=0.7280, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79786e+05, attn_norm=2.37661e+05
b32c512h16tfrs: nsamp=13553664, time=90.73, p0loss=2.3218, vloss=0.7240, pslr=4.525e-04,wdtc=2.500e+06, norm=3.80081e+05, attn_norm=2.37832e+05
b32c512h16tfrs: nsamp=13604864, time=90.56, p0loss=2.3229, vloss=0.7244, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79479e+05, attn_norm=2.37966e+05
b32c512h16tfrs: nsamp=13656064, time=90.68, p0loss=2.3276, vloss=0.7200, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79099e+05, attn_norm=2.38127e+05
b32c512h16tfrs: nsamp=13707264, time=90.69, p0loss=2.3484, vloss=0.7274, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79264e+05, attn_norm=2.38287e+05
b32c512h16tfrs: nsamp=13758464, time=90.74, p0loss=2.3338, vloss=0.7237, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79392e+05, attn_norm=2.38478e+05
b32c512h16tfrs: nsamp=13809664, time=90.60, p0loss=2.3220, vloss=0.7260, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79292e+05, attn_norm=2.38625e+05
b32c512h16tfrs: nsamp=13860864, time=90.73, p0loss=2.3340, vloss=0.7222, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79200e+05, attn_norm=2.38772e+05
b32c512h16tfrs: nsamp=13912064, time=90.72, p0loss=2.3286, vloss=0.7331, pslr=4.525e-04,wdtc=2.500e+06, norm=3.79028e+05, attn_norm=2.38928e+05
b32c512h16tfrs: nsamp=13963264, time=90.58, p0loss=2.3416, vloss=0.7317, pslr=4.525e-04,wdtc=2.500e+06, norm=3.78498e+05, attn_norm=2.39061e+05
b32c512h16tfrs: nsamp=14014464, time=90.71, p0loss=2.3109, vloss=0.7283, pslr=4.525e-04,wdtc=2.500e+06, norm=3.78054e+05, attn_norm=2.39215e+05
b32c512h16tfrs: nsamp=14065664, time=90.70, p0loss=2.3048, vloss=0.7263, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77934e+05, attn_norm=2.39358e+05
b32c512h16tfrs: nsamp=14116864, time=90.68, p0loss=2.3412, vloss=0.7255, pslr=4.525e-04,wdtc=2.500e+06, norm=3.78015e+05, attn_norm=2.39495e+05
b32c512h16tfrs: nsamp=14168064, time=90.90, p0loss=2.3329, vloss=0.7289, pslr=4.525e-04,wdtc=2.500e+06, norm=3.78148e+05, attn_norm=2.39620e+05
b32c512h16tfrs: nsamp=14219264, time=90.86, p0loss=2.3262, vloss=0.7271, pslr=4.525e-04,wdtc=2.500e+06, norm=3.78439e+05, attn_norm=2.39723e+05
b32c512h16tfrs: nsamp=14270464, time=91.71, p0loss=2.3327, vloss=0.7253, pslr=4.525e-04,wdtc=2.500e+06, norm=3.78205e+05, attn_norm=2.39859e+05
b32c512h16tfrs: nsamp=14321664, time=91.40, p0loss=2.3450, vloss=0.7301, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77956e+05, attn_norm=2.40014e+05
b32c512h16tfrs: nsamp=14372864, time=91.44, p0loss=2.3377, vloss=0.7296, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77907e+05, attn_norm=2.40164e+05
b32c512h16tfrs: nsamp=14424064, time=91.48, p0loss=2.3194, vloss=0.7232, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77845e+05, attn_norm=2.40267e+05
b32c512h16tfrs: nsamp=14475264, time=91.31, p0loss=2.3266, vloss=0.7276, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77865e+05, attn_norm=2.40399e+05
b32c512h16tfrs: nsamp=14526464, time=91.28, p0loss=2.3517, vloss=0.7315, pslr=4.525e-04,wdtc=2.500e+06, norm=3.78103e+05, attn_norm=2.40538e+05
b32c512h16tfrs: nsamp=14577664, time=90.52, p0loss=2.3355, vloss=0.7296, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77932e+05, attn_norm=2.40694e+05
b32c512h16tfrs: nsamp=14628864, time=90.76, p0loss=2.3245, vloss=0.7293, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77983e+05, attn_norm=2.40837e+05
b32c512h16tfrs: nsamp=14680064, time=90.72, p0loss=2.3354, vloss=0.7277, pslr=4.525e-04,wdtc=2.500e+06, norm=3.78284e+05, attn_norm=2.40953e+05
b32c512h16tfrs: nsamp=14731264, time=91.33, p0loss=2.3253, vloss=0.7307, pslr=4.525e-04,wdtc=2.500e+06, norm=3.78509e+05, attn_norm=2.41071e+05
b32c512h16tfrs: nsamp=14782464, time=91.18, p0loss=2.3181, vloss=0.7331, pslr=4.525e-04,wdtc=2.500e+06, norm=3.78412e+05, attn_norm=2.41189e+05
b32c512h16tfrs: nsamp=14833664, time=91.46, p0loss=2.3215, vloss=0.7359, pslr=4.525e-04,wdtc=2.500e+06, norm=3.78101e+05, attn_norm=2.41319e+05
Finished training subepoch!
Saving checkpoint: /root/katago_project/data/train/b32c512h16tfrs/checkpoint.ckpt
Beginning validation after epoch!
p0loss = 2.341567, p1loss = 0.400214, p0softloss = 5.237783, p1softloss = 0.703708, p0lopt = 0.000000, p0loptw = 0.000000, p0sopt = 0.296306, p0soptw = 0.136169, vloss = 0.752306, tdvloss1 = 0.311417, tdvloss2 = 0.253676, tdvloss3 = 0.225140, tdsloss = 0.015680, oloss = 0.476048, sloss = 0.384227, fploss = 0.181116, skloss = 0.093115, smloss = 0.047596, sbcdfloss = 0.080799, sbpdfloss = 0.057904, sdregloss = 0.000127, leadloss = 0.022235, vtimeloss = 0.053886, evstloss = 0.083718, esstloss = 0.029818, loss = 52.428744, pacc1 = 0.386235, vsquare = 0.180943, wsum = 160227.439941, nsamp = 160256.000000, ptentr = 0.924267, ptsoftentr = 4.970883, sekiweightscale = 6.998752, norm_normal_batch = 377719.906250, norm_normal_gamma_batch = 0.000000, norm_normal_attn_batch = 241462.453125, norm_output_batch = 592.766174, norm_noreg_batch = 16275.084961, norm_output_noreg_batch = 0.534045, nsamp_train = 14866432.000000, wsum_train = 14864035.836426
Validation took 202.28734406409785 seconds
Validating swa_scale=32.0
p0loss = 5.237532, p1loss = 0.688374, p0softloss = 5.662972, p1softloss = 0.744298, p0lopt = 0.000000, p0loptw = 0.000000, p0sopt = 1.601140, p0soptw = 0.300618, vloss = 0.849054, tdvloss1 = 0.402024, tdvloss2 = 0.388145, tdvloss3 = 0.369590, tdsloss = 0.041179, oloss = 0.790097, sloss = 0.667042, fploss = 1.404246, skloss = 0.203684, smloss = 0.074374, sbcdfloss = 0.172956, sbpdfloss = 0.077570, sdregloss = 0.094488, leadloss = 0.047335, vtimeloss = 0.066642, evstloss = 0.167381, esstloss = 0.097634, loss = 61.914063, pacc1 = 0.266224, vsquare = 0.023509, wsum = 160227.439941, nsamp = 160256.000000, ptentr = 0.924267, ptsoftentr = 4.970883, sekiweightscale = 6.998752, norm_normal_batch = 51785.746094, norm_normal_gamma_batch = 0.000000, norm_normal_attn_batch = 37621.867188, norm_output_batch = 594.209534, norm_noreg_batch = 16344.697266, norm_output_noreg_batch = 0.409633, nsamp_train = 14866432.000000, wsum_train = 14864035.836426
Validation swa took 202.23874938068911 seconds
Export cycle counter = 1
Skipping export model this time
GC collect
=========================================================================
BEGINNING NEXT EPOCH 3
=========================================================================
Current time: 2026-02-28 14:36:17.414207
Global step: 14866432 samples
Currently up to data row 95431473
Training dir: /root/katago_project/data/train/b32c512h16tfrs
Export dir: /root/katago_project/data/torchmodels_toexport_extra
Current grad scale: 128.0
Beginning training subepoch!
Currently up to data row 95431473
b32c512h16tfrs: nsamp=14917632, time=103.03, p0loss=2.3328, vloss=0.7286, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77670e+05, attn_norm=2.41480e+05
b32c512h16tfrs: nsamp=14968832, time=91.43, p0loss=2.3358, vloss=0.7305, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77719e+05, attn_norm=2.41649e+05
b32c512h16tfrs: nsamp=15020032, time=91.32, p0loss=2.3336, vloss=0.7247, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77905e+05, attn_norm=2.41725e+05
b32c512h16tfrs: nsamp=15071232, time=91.80, p0loss=2.3312, vloss=0.7277, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77909e+05, attn_norm=2.41867e+05
b32c512h16tfrs: nsamp=15122432, time=91.45, p0loss=2.3321, vloss=0.7253, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77892e+05, attn_norm=2.41997e+05
b32c512h16tfrs: nsamp=15173632, time=91.40, p0loss=2.3304, vloss=0.7282, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77996e+05, attn_norm=2.42133e+05
b32c512h16tfrs: nsamp=15224832, time=91.53, p0loss=2.3223, vloss=0.7230, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77751e+05, attn_norm=2.42272e+05
b32c512h16tfrs: nsamp=15276032, time=90.83, p0loss=2.3256, vloss=0.7298, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77665e+05, attn_norm=2.42377e+05
b32c512h16tfrs: nsamp=15327232, time=91.84, p0loss=2.3085, vloss=0.7278, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77966e+05, attn_norm=2.42506e+05
b32c512h16tfrs: nsamp=15378432, time=91.69, p0loss=2.3173, vloss=0.7316, pslr=4.525e-04,wdtc=2.500e+06, norm=3.78056e+05, attn_norm=2.42615e+05
b32c512h16tfrs: nsamp=15429632, time=91.76, p0loss=2.3092, vloss=0.7237, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77827e+05, attn_norm=2.42739e+05
b32c512h16tfrs: nsamp=15480832, time=91.49, p0loss=2.3258, vloss=0.7325, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77354e+05, attn_norm=2.42843e+05
b32c512h16tfrs: nsamp=15532032, time=91.34, p0loss=2.3174, vloss=0.7297, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77088e+05, attn_norm=2.42913e+05
b32c512h16tfrs: nsamp=15583232, time=91.73, p0loss=2.3462, vloss=0.7320, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76892e+05, attn_norm=2.42986e+05
b32c512h16tfrs: nsamp=15634432, time=91.79, p0loss=2.3335, vloss=0.7322, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76963e+05, attn_norm=2.43086e+05
b32c512h16tfrs: nsamp=15685632, time=91.65, p0loss=2.3167, vloss=0.7329, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77290e+05, attn_norm=2.43155e+05
b32c512h16tfrs: nsamp=15736832, time=91.53, p0loss=2.3162, vloss=0.7326, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77084e+05, attn_norm=2.43245e+05
b32c512h16tfrs: nsamp=15788032, time=91.75, p0loss=2.3183, vloss=0.7310, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76834e+05, attn_norm=2.43339e+05
b32c512h16tfrs: nsamp=15839232, time=91.34, p0loss=2.3332, vloss=0.7244, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77027e+05, attn_norm=2.43408e+05
b32c512h16tfrs: nsamp=15890432, time=91.68, p0loss=2.3351, vloss=0.7283, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77023e+05, attn_norm=2.43501e+05
b32c512h16tfrs: nsamp=15941632, time=91.60, p0loss=2.3314, vloss=0.7280, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77206e+05, attn_norm=2.43619e+05
b32c512h16tfrs: nsamp=15992832, time=91.57, p0loss=2.3240, vloss=0.7285, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77470e+05, attn_norm=2.43702e+05
b32c512h16tfrs: nsamp=16044032, time=91.84, p0loss=2.3280, vloss=0.7354, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77320e+05, attn_norm=2.43819e+05
b32c512h16tfrs: nsamp=16095232, time=91.49, p0loss=2.3290, vloss=0.7294, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77106e+05, attn_norm=2.43897e+05
b32c512h16tfrs: nsamp=16146432, time=91.85, p0loss=2.3186, vloss=0.7280, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76974e+05, attn_norm=2.43984e+05
b32c512h16tfrs: nsamp=16197632, time=91.70, p0loss=2.3295, vloss=0.7215, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77106e+05, attn_norm=2.44103e+05
b32c512h16tfrs: nsamp=16248832, time=91.28, p0loss=2.3385, vloss=0.7247, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77117e+05, attn_norm=2.44203e+05
b32c512h16tfrs: nsamp=16300032, time=91.44, p0loss=2.3369, vloss=0.7257, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77076e+05, attn_norm=2.44315e+05
b32c512h16tfrs: nsamp=16351232, time=91.53, p0loss=2.3214, vloss=0.7289, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77095e+05, attn_norm=2.44441e+05
b32c512h16tfrs: nsamp=16402432, time=91.30, p0loss=2.3445, vloss=0.7271, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76773e+05, attn_norm=2.44533e+05
b32c512h16tfrs: nsamp=16453632, time=90.77, p0loss=2.3285, vloss=0.7365, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76572e+05, attn_norm=2.44645e+05
b32c512h16tfrs: nsamp=16504832, time=90.85, p0loss=2.3391, vloss=0.7382, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76337e+05, attn_norm=2.44706e+05
b32c512h16tfrs: nsamp=16556032, time=90.83, p0loss=2.3595, vloss=0.7312, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76081e+05, attn_norm=2.44837e+05
b32c512h16tfrs: nsamp=16607232, time=90.65, p0loss=2.3358, vloss=0.7323, pslr=4.525e-04,wdtc=2.500e+06, norm=3.75877e+05, attn_norm=2.44962e+05
b32c512h16tfrs: nsamp=16658432, time=91.89, p0loss=2.3325, vloss=0.7360, pslr=4.525e-04,wdtc=2.500e+06, norm=3.75858e+05, attn_norm=2.45041e+05
b32c512h16tfrs: nsamp=16709632, time=91.88, p0loss=2.3243, vloss=0.7299, pslr=4.525e-04,wdtc=2.500e+06, norm=3.75640e+05, attn_norm=2.45146e+05
b32c512h16tfrs: nsamp=16760832, time=91.56, p0loss=2.3288, vloss=0.7290, pslr=4.525e-04,wdtc=2.500e+06, norm=3.75414e+05, attn_norm=2.45243e+05
b32c512h16tfrs: nsamp=16812032, time=91.33, p0loss=2.3296, vloss=0.7235, pslr=4.525e-04,wdtc=2.500e+06, norm=3.75611e+05, attn_norm=2.45336e+05
b32c512h16tfrs: nsamp=16863232, time=91.50, p0loss=2.3247, vloss=0.7307, pslr=4.525e-04,wdtc=2.500e+06, norm=3.75768e+05, attn_norm=2.45428e+05
b32c512h16tfrs: nsamp=16914432, time=91.24, p0loss=2.3292, vloss=0.7370, pslr=4.525e-04,wdtc=2.500e+06, norm=3.75469e+05, attn_norm=2.45498e+05
b32c512h16tfrs: nsamp=16965632, time=90.74, p0loss=2.3382, vloss=0.7356, pslr=4.525e-04,wdtc=2.500e+06, norm=3.75127e+05, attn_norm=2.45571e+05
b32c512h16tfrs: nsamp=17016832, time=90.69, p0loss=2.3375, vloss=0.7341, pslr=4.525e-04,wdtc=2.500e+06, norm=3.75257e+05, attn_norm=2.45629e+05
b32c512h16tfrs: nsamp=17068032, time=90.75, p0loss=2.3210, vloss=0.7290, pslr=4.525e-04,wdtc=2.500e+06, norm=3.75705e+05, attn_norm=2.45693e+05
b32c512h16tfrs: nsamp=17119232, time=90.80, p0loss=2.3419, vloss=0.7340, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76124e+05, attn_norm=2.45832e+05
b32c512h16tfrs: nsamp=17170432, time=90.89, p0loss=2.3522, vloss=0.7374, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76643e+05, attn_norm=2.45897e+05
b32c512h16tfrs: nsamp=17221632, time=91.23, p0loss=2.3352, vloss=0.7292, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76799e+05, attn_norm=2.45987e+05
b32c512h16tfrs: nsamp=17272832, time=90.69, p0loss=2.3320, vloss=0.7262, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76540e+05, attn_norm=2.46031e+05
b32c512h16tfrs: nsamp=17324032, time=90.66, p0loss=2.3271, vloss=0.7254, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76425e+05, attn_norm=2.46112e+05
b32c512h16tfrs: nsamp=17375232, time=90.61, p0loss=2.3286, vloss=0.7262, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76674e+05, attn_norm=2.46181e+05
b32c512h16tfrs: nsamp=17426432, time=90.51, p0loss=2.3238, vloss=0.7340, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76965e+05, attn_norm=2.46226e+05
b32c512h16tfrs: nsamp=17477632, time=90.97, p0loss=2.3157, vloss=0.7367, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77139e+05, attn_norm=2.46285e+05
b32c512h16tfrs: nsamp=17528832, time=91.07, p0loss=2.3250, vloss=0.7268, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77307e+05, attn_norm=2.46341e+05
b32c512h16tfrs: nsamp=17580032, time=90.57, p0loss=2.3214, vloss=0.7232, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77481e+05, attn_norm=2.46386e+05
b32c512h16tfrs: nsamp=17631232, time=90.73, p0loss=2.3281, vloss=0.7312, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77127e+05, attn_norm=2.46456e+05
b32c512h16tfrs: nsamp=17682432, time=90.65, p0loss=2.3102, vloss=0.7300, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76851e+05, attn_norm=2.46508e+05
b32c512h16tfrs: nsamp=17733632, time=90.61, p0loss=2.3161, vloss=0.7277, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76734e+05, attn_norm=2.46586e+05
b32c512h16tfrs: nsamp=17784832, time=90.56, p0loss=2.3086, vloss=0.7284, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76547e+05, attn_norm=2.46667e+05
b32c512h16tfrs: nsamp=17836032, time=90.69, p0loss=2.3132, vloss=0.7374, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76679e+05, attn_norm=2.46770e+05
b32c512h16tfrs: nsamp=17887232, time=90.66, p0loss=2.3240, vloss=0.7324, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76882e+05, attn_norm=2.46869e+05
b32c512h16tfrs: nsamp=17938432, time=90.92, p0loss=2.3243, vloss=0.7251, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76918e+05, attn_norm=2.46927e+05
b32c512h16tfrs: nsamp=17989632, time=90.97, p0loss=2.3236, vloss=0.7280, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76981e+05, attn_norm=2.46971e+05
b32c512h16tfrs: nsamp=18040832, time=90.83, p0loss=2.3330, vloss=0.7236, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76938e+05, attn_norm=2.47025e+05
b32c512h16tfrs: nsamp=18092032, time=90.71, p0loss=2.3244, vloss=0.7158, pslr=4.525e-04,wdtc=2.500e+06, norm=3.77106e+05, attn_norm=2.47103e+05
b32c512h16tfrs: nsamp=18143232, time=90.65, p0loss=2.3323, vloss=0.7216, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76808e+05, attn_norm=2.47168e+05
b32c512h16tfrs: nsamp=18194432, time=90.51, p0loss=2.3337, vloss=0.7320, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76586e+05, attn_norm=2.47238e+05
b32c512h16tfrs: nsamp=18245632, time=90.63, p0loss=2.3211, vloss=0.7314, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76798e+05, attn_norm=2.47362e+05
b32c512h16tfrs: nsamp=18296832, time=90.63, p0loss=2.3148, vloss=0.7298, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76867e+05, attn_norm=2.47472e+05
b32c512h16tfrs: nsamp=18348032, time=90.51, p0loss=2.3420, vloss=0.7287, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76961e+05, attn_norm=2.47540e+05
b32c512h16tfrs: nsamp=18399232, time=90.89, p0loss=2.3301, vloss=0.7238, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76872e+05, attn_norm=2.47611e+05
b32c512h16tfrs: nsamp=18450432, time=90.65, p0loss=2.3384, vloss=0.7336, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76606e+05, attn_norm=2.47704e+05
b32c512h16tfrs: nsamp=18501632, time=90.63, p0loss=2.3283, vloss=0.7308, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76094e+05, attn_norm=2.47780e+05
b32c512h16tfrs: nsamp=18552832, time=90.55, p0loss=2.3428, vloss=0.7265, pslr=4.525e-04,wdtc=2.500e+06, norm=3.75578e+05, attn_norm=2.47862e+05
b32c512h16tfrs: nsamp=18604032, time=90.61, p0loss=2.3486, vloss=0.7247, pslr=4.525e-04,wdtc=2.500e+06, norm=3.75376e+05, attn_norm=2.47898e+05
b32c512h16tfrs: nsamp=18655232, time=90.62, p0loss=2.3446, vloss=0.7240, pslr=4.525e-04,wdtc=2.500e+06, norm=3.75322e+05, attn_norm=2.47968e+05
b32c512h16tfrs: nsamp=18706432, time=90.63, p0loss=2.3358, vloss=0.7290, pslr=4.525e-04,wdtc=2.500e+06, norm=3.75241e+05, attn_norm=2.48033e+05
b32c512h16tfrs: nsamp=18757632, time=90.51, p0loss=2.3233, vloss=0.7221, pslr=4.525e-04,wdtc=2.500e+06, norm=3.75373e+05, attn_norm=2.48088e+05
b32c512h16tfrs: nsamp=18808832, time=90.63, p0loss=2.3276, vloss=0.7208, pslr=4.525e-04,wdtc=2.500e+06, norm=3.75948e+05, attn_norm=2.48190e+05
b32c512h16tfrs: nsamp=18860032, time=90.63, p0loss=2.3538, vloss=0.7338, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76102e+05, attn_norm=2.48289e+05
b32c512h16tfrs: nsamp=18911232, time=90.62, p0loss=2.3460, vloss=0.7267, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76271e+05, attn_norm=2.48382e+05
b32c512h16tfrs: nsamp=18962432, time=90.48, p0loss=2.3460, vloss=0.7277, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76190e+05, attn_norm=2.48413e+05
b32c512h16tfrs: nsamp=19013632, time=91.50, p0loss=2.3180, vloss=0.7330, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76196e+05, attn_norm=2.48514e+05
b32c512h16tfrs: nsamp=19064832, time=91.60, p0loss=2.3295, vloss=0.7249, pslr=4.525e-04,wdtc=2.500e+06, norm=3.75818e+05, attn_norm=2.48584e+05
b32c512h16tfrs: nsamp=19116032, time=90.65, p0loss=2.3435, vloss=0.7220, pslr=4.525e-04,wdtc=2.500e+06, norm=3.75785e+05, attn_norm=2.48634e+05
b32c512h16tfrs: nsamp=19167232, time=91.14, p0loss=2.3093, vloss=0.7278, pslr=4.525e-04,wdtc=2.500e+06, norm=3.75262e+05, attn_norm=2.48674e+05
b32c512h16tfrs: nsamp=19218432, time=90.96, p0loss=2.3199, vloss=0.7212, pslr=4.525e-04,wdtc=2.500e+06, norm=3.75224e+05, attn_norm=2.48726e+05
b32c512h16tfrs: nsamp=19269632, time=91.24, p0loss=2.3328, vloss=0.7291, pslr=4.525e-04,wdtc=2.500e+06, norm=3.75435e+05, attn_norm=2.48769e+05
b32c512h16tfrs: nsamp=19320832, time=90.56, p0loss=2.3491, vloss=0.7371, pslr=4.525e-04,wdtc=2.500e+06, norm=3.75497e+05, attn_norm=2.48806e+05
b32c512h16tfrs: nsamp=19372032, time=90.80, p0loss=2.3386, vloss=0.7245, pslr=4.525e-04,wdtc=2.500e+06, norm=3.75688e+05, attn_norm=2.48867e+05
b32c512h16tfrs: nsamp=19423232, time=91.09, p0loss=2.3271, vloss=0.7285, pslr=4.525e-04,wdtc=2.500e+06, norm=3.75797e+05, attn_norm=2.48925e+05
b32c512h16tfrs: nsamp=19474432, time=91.35, p0loss=2.3143, vloss=0.7225, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76000e+05, attn_norm=2.48993e+05
b32c512h16tfrs: nsamp=19525632, time=91.77, p0loss=2.3390, vloss=0.7268, pslr=4.525e-04,wdtc=2.500e+06, norm=3.75808e+05, attn_norm=2.49008e+05
b32c512h16tfrs: nsamp=19576832, time=91.06, p0loss=2.3311, vloss=0.7281, pslr=4.525e-04,wdtc=2.500e+06, norm=3.75843e+05, attn_norm=2.49037e+05
b32c512h16tfrs: nsamp=19628032, time=90.71, p0loss=2.3301, vloss=0.7271, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76170e+05, attn_norm=2.49115e+05
b32c512h16tfrs: nsamp=19679232, time=90.68, p0loss=2.3223, vloss=0.7280, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76501e+05, attn_norm=2.49192e+05
b32c512h16tfrs: nsamp=19730432, time=90.55, p0loss=2.3413, vloss=0.7270, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76330e+05, attn_norm=2.49266e+05
b32c512h16tfrs: nsamp=19781632, time=90.65, p0loss=2.3263, vloss=0.7236, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76369e+05, attn_norm=2.49299e+05
b32c512h16tfrs: nsamp=19832832, time=90.55, p0loss=2.3308, vloss=0.7209, pslr=4.525e-04,wdtc=2.500e+06, norm=3.76573e+05, attn_norm=2.49363e+05
Finished training subepoch!
Saving checkpoint: /root/katago_project/data/train/b32c512h16tfrs/checkpoint.ckpt
Beginning validation after epoch!
p0loss = 2.339970, p1loss = 0.399827, p0softloss = 5.237064, p1softloss = 0.703599, p0lopt = 0.000000, p0loptw = 0.000000, p0sopt = 0.192606, p0soptw = 0.091360, vloss = 0.759391, tdvloss1 = 0.316218, tdvloss2 = 0.259083, tdvloss3 = 0.231260, tdsloss = 0.017175, oloss = 0.476113, sloss = 0.384415, fploss = 0.180967, skloss = 0.096518, smloss = 0.049023, sbcdfloss = 0.081022, sbpdfloss = 0.056938, sdregloss = 0.000095, leadloss = 0.025402, vtimeloss = 0.055028, evstloss = 0.085036, esstloss = 0.032954, loss = 52.427875, pacc1 = 0.387680, vsquare = 0.216632, wsum = 160227.439941, nsamp = 160256.000000, ptentr = 0.924267, ptsoftentr = 4.970883, sekiweightscale = 6.998752, norm_normal_batch = 376531.656250, norm_normal_gamma_batch = 0.000000, norm_normal_attn_batch = 249429.750000, norm_output_batch = 584.347473, norm_noreg_batch = 16248.388672, norm_output_noreg_batch = 0.561899, nsamp_train = 19873792.000000, wsum_train = 19870695.835449
Validation took 202.66237933607772 seconds
Validating swa_scale=32.0
['./train_muon_ki.py', '-traindir', '/root/katago_project/data/train/b32c512h16tfrs', '-datadir', '/root/katago_project/data/shuffleddata/current', '-exportdir', '/root/katago_project/data/torchmodels_toexport_extra', '-exportprefix', 'b32c512h16tfrs', '-max-epochs-this-instance', '100', '-pos-len', '19', '-samples-per-epoch', '5000000', '-lr-scale', '1.0', '-swa-scales', '32.0', '-use-fp16', '-symmetry-type', 'xyt', '-batch-size', '64', '-model-kind', 'b32c512h16tfrs-bng-silu', '-swa-period-samples', '500000', '-max-val-samples', '20000', '-enable-history-matrices', '-multi-gpus', '0,1,2,3,4,5,6,7', '-gnorm-clip-scale', '1.0', '-wd-scale', '1.0', '-export-prob', '0.003', '-master-port', '23456']
Running torch.distributed.init_process_group
['./train_muon_ki.py', '-traindir', '/root/katago_project/data/train/b32c512h16tfrs', '-datadir', '/root/katago_project/data/shuffleddata/current', '-exportdir', '/root/katago_project/data/torchmodels_toexport_extra', '-exportprefix', 'b32c512h16tfrs', '-max-epochs-this-instance', '100', '-pos-len', '19', '-samples-per-epoch', '5000000', '-lr-scale', '1.0', '-swa-scales', '32.0', '-use-fp16', '-symmetry-type', 'xyt', '-batch-size', '64', '-model-kind', 'b32c512h16tfrs-bng-silu', '-swa-period-samples', '500000', '-max-val-samples', '20000', '-enable-history-matrices', '-multi-gpus', '0,1,2,3,4,5,6,7', '-gnorm-clip-scale', '1.0', '-wd-scale', '1.0', '-export-prob', '0.003', '-master-port', '23456']
Running torch.distributed.init_process_group
['./train_muon_ki.py', '-traindir', '/root/katago_project/data/train/b32c512h16tfrs', '-datadir', '/root/katago_project/data/shuffleddata/current', '-exportdir', '/root/katago_project/data/torchmodels_toexport_extra', '-exportprefix', 'b32c512h16tfrs', '-max-epochs-this-instance', '100', '-pos-len', '19', '-samples-per-epoch', '5000000', '-lr-scale', '1.0', '-swa-scales', '32.0', '-use-fp16', '-symmetry-type', 'xyt', '-batch-size', '64', '-model-kind', 'b32c512h16tfrs-bng-silu', '-swa-period-samples', '500000', '-max-val-samples', '20000', '-enable-history-matrices', '-multi-gpus', '0,1,2,3,4,5,6,7', '-gnorm-clip-scale', '1.0', '-wd-scale', '1.0', '-export-prob', '0.003', '-master-port', '23456']
Running torch.distributed.init_process_group
['./train_muon_ki.py', '-traindir', '/root/katago_project/data/train/b32c512h16tfrs', '-datadir', '/root/katago_project/data/shuffleddata/current', '-exportdir', '/root/katago_project/data/torchmodels_toexport_extra', '-exportprefix', 'b32c512h16tfrs', '-max-epochs-this-instance', '100', '-pos-len', '19', '-samples-per-epoch', '5000000', '-lr-scale', '1.0', '-swa-scales', '32.0', '-use-fp16', '-symmetry-type', 'xyt', '-batch-size', '64', '-model-kind', 'b32c512h16tfrs-bng-silu', '-swa-period-samples', '500000', '-max-val-samples', '20000', '-enable-history-matrices', '-multi-gpus', '0,1,2,3,4,5,6,7', '-gnorm-clip-scale', '1.0', '-wd-scale', '1.0', '-export-prob', '0.003', '-master-port', '23456']
Running torch.distributed.init_process_group
['./train_muon_ki.py', '-traindir', '/root/katago_project/data/train/b32c512h16tfrs', '-datadir', '/root/katago_project/data/shuffleddata/current', '-exportdir', '/root/katago_project/data/torchmodels_toexport_extra', '-exportprefix', 'b32c512h16tfrs', '-max-epochs-this-instance', '100', '-pos-len', '19', '-samples-per-epoch', '5000000', '-lr-scale', '1.0', '-swa-scales', '32.0', '-use-fp16', '-symmetry-type', 'xyt', '-batch-size', '64', '-model-kind', 'b32c512h16tfrs-bng-silu', '-swa-period-samples', '500000', '-max-val-samples', '20000', '-enable-history-matrices', '-multi-gpus', '0,1,2,3,4,5,6,7', '-gnorm-clip-scale', '1.0', '-wd-scale', '1.0', '-export-prob', '0.003', '-master-port', '23456']
Running torch.distributed.init_process_group
['./train_muon_ki.py', '-traindir', '/root/katago_project/data/train/b32c512h16tfrs', '-datadir', '/root/katago_project/data/shuffleddata/current', '-exportdir', '/root/katago_project/data/torchmodels_toexport_extra', '-exportprefix', 'b32c512h16tfrs', '-max-epochs-this-instance', '100', '-pos-len', '19', '-samples-per-epoch', '5000000', '-lr-scale', '1.0', '-swa-scales', '32.0', '-use-fp16', '-symmetry-type', 'xyt', '-batch-size', '64', '-model-kind', 'b32c512h16tfrs-bng-silu', '-swa-period-samples', '500000', '-max-val-samples', '20000', '-enable-history-matrices', '-multi-gpus', '0,1,2,3,4,5,6,7', '-gnorm-clip-scale', '1.0', '-wd-scale', '1.0', '-export-prob', '0.003', '-master-port', '23456']
Running torch.distributed.init_process_group
['./train_muon_ki.py', '-traindir', '/root/katago_project/data/train/b32c512h16tfrs', '-datadir', '/root/katago_project/data/shuffleddata/current', '-exportdir', '/root/katago_project/data/torchmodels_toexport_extra', '-exportprefix', 'b32c512h16tfrs', '-max-epochs-this-instance', '100', '-pos-len', '19', '-samples-per-epoch', '5000000', '-lr-scale', '1.0', '-swa-scales', '32.0', '-use-fp16', '-symmetry-type', 'xyt', '-batch-size', '64', '-model-kind', 'b32c512h16tfrs-bng-silu', '-swa-period-samples', '500000', '-max-val-samples', '20000', '-enable-history-matrices', '-multi-gpus', '0,1,2,3,4,5,6,7', '-gnorm-clip-scale', '1.0', '-wd-scale', '1.0', '-export-prob', '0.003', '-master-port', '23456']
Running torch.distributed.init_process_group
['./train_muon_ki.py', '-traindir', '/root/katago_project/data/train/b32c512h16tfrs', '-datadir', '/root/katago_project/data/shuffleddata/current', '-exportdir', '/root/katago_project/data/torchmodels_toexport_extra', '-exportprefix', 'b32c512h16tfrs', '-max-epochs-this-instance', '100', '-pos-len', '19', '-samples-per-epoch', '5000000', '-lr-scale', '1.0', '-swa-scales', '32.0', '-use-fp16', '-symmetry-type', 'xyt', '-batch-size', '64', '-model-kind', 'b32c512h16tfrs-bng-silu', '-swa-period-samples', '500000', '-max-val-samples', '20000', '-enable-history-matrices', '-multi-gpus', '0,1,2,3,4,5,6,7', '-gnorm-clip-scale', '1.0', '-wd-scale', '1.0', '-export-prob', '0.003', '-master-port', '23456']
Running torch.distributed.init_process_group
Returned from torch.distributed.init_process_group, my rank = 0, world_size=8
Using GPU device: Tesla V100-SXM2-32GB
Seeding torch with 5656788319411565
{'version': 15, 'norm_kind': 'bnorm', 'bnorm_epsilon': 0.0001, 'bnorm_running_avg_momentum': 0.001, 'initial_conv_1x1': False, 'trunk_num_channels': 512, 'mid_num_channels': 512, 'gpool_num_channels': 64, 'transformer_ffn_channels': 1536, 'transformer_heads': 16, 'transformer_kv_heads': 16, 'use_attention_pool': False, 'num_attention_pool_heads': 4, 'block_kind': [['rconv1', 'transformerropesg'], ['rconv2', 'transformerropesg'], ['rconv3', 'transformerropesg'], ['rconv4', 'transformerropesg'], ['rconv5', 'transformerropesg'], ['rconv6', 'transformerropesg'], ['rconv7', 'transformerropesg'], ['rconv8', 'transformerropesg'], ['rconv9', 'transformerropesg'], ['rconv10', 'transformerropesg'], ['rconv11', 'transformerropesg'], ['rconv12', 'transformerropesg'], ['rconv13', 'transformerropesg'], ['rconv14', 'transformerropesg'], ['rconv15', 'transformerropesg'], ['rconv16', 'transformerropesg'], ['rconv17', 'transformerropesg'], ['rconv18', 'transformerropesg'], ['rconv19', 'transformerropesg'], ['rconv20', 'transformerropesg'], ['rconv21', 'transformerropesg'], ['rconv22', 'transformerropesg'], ['rconv23', 'transformerropesg'], ['rconv24', 'transformerropesg'], ['rconv25', 'transformerropesg'], ['rconv26', 'transformerropesg'], ['rconv27', 'transformerropesg'], ['rconv28', 'transformerropesg'], ['rconv29', 'transformerropesg'], ['rconv30', 'transformerropesg'], ['rconv31', 'transformerropesg'], ['rconv32', 'transformerropesg']], 'p1_num_channels': 64, 'g1_num_channels': 64, 'v1_num_channels': 128, 'sbv2_num_channels': 128, 'num_scorebeliefs': 8, 'v2_size': 144, 'bnorm_use_gamma': True, 'activation': 'silu'}
No NaN/Inf in BatchNorm layers
Load swa model 0
swa_period_samples 500000.0
swa_scales [32.0]
lookahead_alpha None
lookahead_k None
soft_policy_weight_scale 8.0
disable_optimistic_policy False
meta_kata_only_soft_policy False
value_loss_scale 0.6
td_value_loss_scales [0.6, 0.6, 0.6]
seki_loss_scale 1.0
variance_time_loss_scale 1.0
main_loss_scale None
intermediate_loss_scale None
Parameters in model:
Total num params: 109468997
Total trainable params: 109468997
Training in FP16! Creating scaler
Updated training data: /root/katago_project/data/shuffleddata/current
Train steps since last reload: 19873792 -> 0
Skipping 0/1362 files in: /root/katago_project/data/shuffleddata/current/train as already used first pass
GC collect
=========================================================================
BEGINNING NEXT EPOCH 0
=========================================================================
Current time: 2026-02-28 17:17:02.989805
Global step: 19873792 samples
Currently up to data row 95431473
Training dir: /root/katago_project/data/train/b32c512h16tfrs
Export dir: /root/katago_project/data/torchmodels_toexport_extra
Current grad scale: 65536.0
Beginning training subepoch!
Currently up to data row 95431473
['./train_muon_ki.py', '-traindir', '/root/katago_project/data/train/b32c512h16tfrs', '-datadir', '/root/katago_project/data/shuffleddata/current', '-exportdir', '/root/katago_project/data/torchmodels_toexport_extra', '-exportprefix', 'b32c512h16tfrs', '-max-epochs-this-instance', '100', '-pos-len', '19', '-samples-per-epoch', '5000000', '-lr-scale', '1.0', '-swa-scales', '32.0', '-use-fp16', '-symmetry-type', 'xyt', '-batch-size', '64', '-model-kind', 'b32c512h16tfrs-bng-silu', '-swa-period-samples', '500000', '-max-val-samples', '20000', '-enable-history-matrices', '-multi-gpus', '0,1,2,3,4,5,6,7', '-gnorm-clip-scale', '1.0', '-wd-scale', '1.0', '-export-prob', '0.003', '-master-port', '23456']
Running torch.distributed.init_process_group
Returned from torch.distributed.init_process_group, my rank = 0, world_size=8
Using GPU device: Tesla V100-SXM2-32GB
Seeding torch with 68863258454367175
{'version': 15, 'norm_kind': 'bnorm', 'bnorm_epsilon': 0.0001, 'bnorm_running_avg_momentum': 0.001, 'initial_conv_1x1': False, 'trunk_num_channels': 512, 'mid_num_channels': 512, 'gpool_num_channels': 64, 'transformer_ffn_channels': 1536, 'transformer_heads': 16, 'transformer_kv_heads': 16, 'use_attention_pool': False, 'num_attention_pool_heads': 4, 'block_kind': [['rconv1', 'transformerropesg'], ['rconv2', 'transformerropesg'], ['rconv3', 'transformerropesg'], ['rconv4', 'transformerropesg'], ['rconv5', 'transformerropesg'], ['rconv6', 'transformerropesg'], ['rconv7', 'transformerropesg'], ['rconv8', 'transformerropesg'], ['rconv9', 'transformerropesg'], ['rconv10', 'transformerropesg'], ['rconv11', 'transformerropesg'], ['rconv12', 'transformerropesg'], ['rconv13', 'transformerropesg'], ['rconv14', 'transformerropesg'], ['rconv15', 'transformerropesg'], ['rconv16', 'transformerropesg'], ['rconv17', 'transformerropesg'], ['rconv18', 'transformerropesg'], ['rconv19', 'transformerropesg'], ['rconv20', 'transformerropesg'], ['rconv21', 'transformerropesg'], ['rconv22', 'transformerropesg'], ['rconv23', 'transformerropesg'], ['rconv24', 'transformerropesg'], ['rconv25', 'transformerropesg'], ['rconv26', 'transformerropesg'], ['rconv27', 'transformerropesg'], ['rconv28', 'transformerropesg'], ['rconv29', 'transformerropesg'], ['rconv30', 'transformerropesg'], ['rconv31', 'transformerropesg'], ['rconv32', 'transformerropesg']], 'p1_num_channels': 64, 'g1_num_channels': 64, 'v1_num_channels': 128, 'sbv2_num_channels': 128, 'num_scorebeliefs': 8, 'v2_size': 144, 'bnorm_use_gamma': True, 'activation': 'silu'}
['./train_muon_ki.py', '-traindir', '/root/katago_project/data/train/b32c512h16tfrs', '-datadir', '/root/katago_project/data/shuffleddata/current', '-exportdir', '/root/katago_project/data/torchmodels_toexport_extra', '-exportprefix', 'b32c512h16tfrs', '-max-epochs-this-instance', '100', '-pos-len', '19', '-samples-per-epoch', '5000000', '-lr-scale', '1.0', '-swa-scales', '32.0', '-use-fp16', '-symmetry-type', 'xyt', '-batch-size', '64', '-model-kind', 'b32c512h16tfrs-bng-silu', '-swa-period-samples', '500000', '-max-val-samples', '20000', '-enable-history-matrices', '-multi-gpus', '0,1,2,3,4,5,6,7', '-gnorm-clip-scale', '1.0', '-wd-scale', '1.0', '-export-prob', '0.003', '-master-port', '23456']
Running torch.distributed.init_process_group
Returned from torch.distributed.init_process_group, my rank = 0, world_size=8
Using GPU device: Tesla V100-SXM2-32GB
Seeding torch with 9495636863324197
{'version': 15, 'norm_kind': 'bnorm', 'bnorm_epsilon': 0.0001, 'bnorm_running_avg_momentum': 0.001, 'initial_conv_1x1': False, 'trunk_num_channels': 512, 'mid_num_channels': 512, 'gpool_num_channels': 64, 'transformer_ffn_channels': 1536, 'transformer_heads': 16, 'transformer_kv_heads': 16, 'use_attention_pool': False, 'num_attention_pool_heads': 4, 'block_kind': [['rconv1', 'transformerropesg'], ['rconv2', 'transformerropesg'], ['rconv3', 'transformerropesg'], ['rconv4', 'transformerropesg'], ['rconv5', 'transformerropesg'], ['rconv6', 'transformerropesg'], ['rconv7', 'transformerropesg'], ['rconv8', 'transformerropesg'], ['rconv9', 'transformerropesg'], ['rconv10', 'transformerropesg'], ['rconv11', 'transformerropesg'], ['rconv12', 'transformerropesg'], ['rconv13', 'transformerropesg'], ['rconv14', 'transformerropesg'], ['rconv15', 'transformerropesg'], ['rconv16', 'transformerropesg'], ['rconv17', 'transformerropesg'], ['rconv18', 'transformerropesg'], ['rconv19', 'transformerropesg'], ['rconv20', 'transformerropesg'], ['rconv21', 'transformerropesg'], ['rconv22', 'transformerropesg'], ['rconv23', 'transformerropesg'], ['rconv24', 'transformerropesg'], ['rconv25', 'transformerropesg'], ['rconv26', 'transformerropesg'], ['rconv27', 'transformerropesg'], ['rconv28', 'transformerropesg'], ['rconv29', 'transformerropesg'], ['rconv30', 'transformerropesg'], ['rconv31', 'transformerropesg'], ['rconv32', 'transformerropesg']], 'p1_num_channels': 64, 'g1_num_channels': 64, 'v1_num_channels': 128, 'sbv2_num_channels': 128, 'num_scorebeliefs': 8, 'v2_size': 144, 'bnorm_use_gamma': True, 'activation': 'silu'}
['./train_muon_ki.py', '-traindir', '/root/katago_project/data/train/b32c512h16tfrs', '-datadir', '/root/katago_project/data/shuffleddata/current', '-exportdir', '/root/katago_project/data/torchmodels_toexport_extra', '-exportprefix', 'b32c512h16tfrs', '-max-epochs-this-instance', '100', '-pos-len', '19', '-samples-per-epoch', '5000000', '-lr-scale', '1.0', '-swa-scales', '32.0', '-use-fp16', '-symmetry-type', 'xyt', '-batch-size', '64', '-model-kind', 'b32c512h16tfrs-bng-silu', '-swa-period-samples', '500000', '-max-val-samples', '20000', '-enable-history-matrices', '-multi-gpus', '0,1,2,3,4,5,6,7', '-gnorm-clip-scale', '1.0', '-wd-scale', '1.0', '-export-prob', '0.003', '-master-port', '23456', '-lr-scale', '0.1']
Running torch.distributed.init_process_group
Returned from torch.distributed.init_process_group, my rank = 0, world_size=8
Using GPU device: Tesla V100-SXM2-32GB
Seeding torch with 34484648426720988
{'version': 15, 'norm_kind': 'bnorm', 'bnorm_epsilon': 0.0001, 'bnorm_running_avg_momentum': 0.001, 'initial_conv_1x1': False, 'trunk_num_channels': 512, 'mid_num_channels': 512, 'gpool_num_channels': 64, 'transformer_ffn_channels': 1536, 'transformer_heads': 16, 'transformer_kv_heads': 16, 'use_attention_pool': False, 'num_attention_pool_heads': 4, 'block_kind': [['rconv1', 'transformerropesg'], ['rconv2', 'transformerropesg'], ['rconv3', 'transformerropesg'], ['rconv4', 'transformerropesg'], ['rconv5', 'transformerropesg'], ['rconv6', 'transformerropesg'], ['rconv7', 'transformerropesg'], ['rconv8', 'transformerropesg'], ['rconv9', 'transformerropesg'], ['rconv10', 'transformerropesg'], ['rconv11', 'transformerropesg'], ['rconv12', 'transformerropesg'], ['rconv13', 'transformerropesg'], ['rconv14', 'transformerropesg'], ['rconv15', 'transformerropesg'], ['rconv16', 'transformerropesg'], ['rconv17', 'transformerropesg'], ['rconv18', 'transformerropesg'], ['rconv19', 'transformerropesg'], ['rconv20', 'transformerropesg'], ['rconv21', 'transformerropesg'], ['rconv22', 'transformerropesg'], ['rconv23', 'transformerropesg'], ['rconv24', 'transformerropesg'], ['rconv25', 'transformerropesg'], ['rconv26', 'transformerropesg'], ['rconv27', 'transformerropesg'], ['rconv28', 'transformerropesg'], ['rconv29', 'transformerropesg'], ['rconv30', 'transformerropesg'], ['rconv31', 'transformerropesg'], ['rconv32', 'transformerropesg']], 'p1_num_channels': 64, 'g1_num_channels': 64, 'v1_num_channels': 128, 'sbv2_num_channels': 128, 'num_scorebeliefs': 8, 'v2_size': 144, 'bnorm_use_gamma': True, 'activation': 'silu'}
['./train_muon_ki.py', '-traindir', '/root/katago_project/data/train/b32c512h16tfrs', '-datadir', '/root/katago_project/data/shuffleddata/current', '-exportdir', '/root/katago_project/data/torchmodels_toexport_extra', '-exportprefix', 'b32c512h16tfrs', '-max-epochs-this-instance', '100', '-pos-len', '19', '-samples-per-epoch', '5000000', '-lr-scale', '1.0', '-swa-scales', '32.0', '-use-fp16', '-symmetry-type', 'xyt', '-batch-size', '64', '-model-kind', 'b32c512h16tfrs-bng-silu', '-swa-period-samples', '500000', '-max-val-samples', '20000', '-enable-history-matrices', '-multi-gpus', '0,1,2,3,4,5,6,7', '-gnorm-clip-scale', '1.0', '-wd-scale', '1.0', '-export-prob', '0.003', '-master-port', '23456', '-lr-scale', '1.0']
Running torch.distributed.init_process_group
Returned from torch.distributed.init_process_group, my rank = 0, world_size=8
Using GPU device: Tesla V100-SXM2-32GB
Seeding torch with 59302549903813282
{'version': 15, 'norm_kind': 'bnorm', 'bnorm_epsilon': 0.0001, 'bnorm_running_avg_momentum': 0.001, 'initial_conv_1x1': False, 'trunk_num_channels': 512, 'mid_num_channels': 512, 'gpool_num_channels': 64, 'transformer_ffn_channels': 1536, 'transformer_heads': 16, 'transformer_kv_heads': 16, 'use_attention_pool': False, 'num_attention_pool_heads': 4, 'block_kind': [['rconv1', 'transformerropesg'], ['rconv2', 'transformerropesg'], ['rconv3', 'transformerropesg'], ['rconv4', 'transformerropesg'], ['rconv5', 'transformerropesg'], ['rconv6', 'transformerropesg'], ['rconv7', 'transformerropesg'], ['rconv8', 'transformerropesg'], ['rconv9', 'transformerropesg'], ['rconv10', 'transformerropesg'], ['rconv11', 'transformerropesg'], ['rconv12', 'transformerropesg'], ['rconv13', 'transformerropesg'], ['rconv14', 'transformerropesg'], ['rconv15', 'transformerropesg'], ['rconv16', 'transformerropesg'], ['rconv17', 'transformerropesg'], ['rconv18', 'transformerropesg'], ['rconv19', 'transformerropesg'], ['rconv20', 'transformerropesg'], ['rconv21', 'transformerropesg'], ['rconv22', 'transformerropesg'], ['rconv23', 'transformerropesg'], ['rconv24', 'transformerropesg'], ['rconv25', 'transformerropesg'], ['rconv26', 'transformerropesg'], ['rconv27', 'transformerropesg'], ['rconv28', 'transformerropesg'], ['rconv29', 'transformerropesg'], ['rconv30', 'transformerropesg'], ['rconv31', 'transformerropesg'], ['rconv32', 'transformerropesg']], 'p1_num_channels': 64, 'g1_num_channels': 64, 'v1_num_channels': 128, 'sbv2_num_channels': 128, 'num_scorebeliefs': 8, 'v2_size': 144, 'bnorm_use_gamma': True, 'activation': 'silu'}
No NaN/Inf in BatchNorm layers
Load swa model 0
swa_period_samples 500000.0
swa_scales [32.0]
lookahead_alpha None
lookahead_k None
soft_policy_weight_scale 8.0
disable_optimistic_policy False
meta_kata_only_soft_policy False
value_loss_scale 0.6
td_value_loss_scales [0.6, 0.6, 0.6]
seki_loss_scale 1.0
variance_time_loss_scale 1.0
main_loss_scale None
intermediate_loss_scale None
Parameters in model:
Total num params: 109468997
Total trainable params: 109468997
Training in FP16! Creating scaler
Updated training data: /root/katago_project/data/shuffleddata/current
Train steps since last reload: 19873792 -> 0
Skipping 0/1362 files in: /root/katago_project/data/shuffleddata/current/train as already used first pass
GC collect
=========================================================================
BEGINNING NEXT EPOCH 0
=========================================================================
Current time: 2026-02-28 17:25:46.832458
Global step: 19873792 samples
Currently up to data row 95431473
Training dir: /root/katago_project/data/train/b32c512h16tfrs
Export dir: /root/katago_project/data/torchmodels_toexport_extra
Current grad scale: 65536.0
Beginning training subepoch!
Currently up to data row 95431473
['./train_muon_ki.py', '-traindir', '/root/katago_project/data/train/b32c512h16tfrs', '-datadir', '/root/katago_project/data/shuffleddata/current', '-exportdir', '/root/katago_project/data/torchmodels_toexport_extra', '-exportprefix', 'b32c512h16tfrs', '-max-epochs-this-instance', '100', '-pos-len', '19', '-samples-per-epoch', '5000000', '-lr-scale', '1.0', '-swa-scales', '32.0', '-use-fp16', '-symmetry-type', 'xyt', '-batch-size', '64', '-model-kind', 'b32c512h16tfrs-bng-silu', '-swa-period-samples', '500000', '-max-val-samples', '20000', '-enable-history-matrices', '-multi-gpus', '0,1,2,3,4,5,6,7', '-gnorm-clip-scale', '1.0', '-wd-scale', '1.0', '-export-prob', '0.003', '-master-port', '23456', '-lr-scale', '1.0']
Running torch.distributed.init_process_group
Returned from torch.distributed.init_process_group, my rank = 0, world_size=8
Using GPU device: Tesla V100-SXM2-32GB
Seeding torch with 44500781948998741
{'version': 15, 'norm_kind': 'bnorm', 'bnorm_epsilon': 0.0001, 'bnorm_running_avg_momentum': 0.001, 'initial_conv_1x1': False, 'trunk_num_channels': 512, 'mid_num_channels': 512, 'gpool_num_channels': 64, 'transformer_ffn_channels': 1536, 'transformer_heads': 16, 'transformer_kv_heads': 16, 'use_attention_pool': False, 'num_attention_pool_heads': 4, 'block_kind': [['rconv1', 'transformerropesg'], ['rconv2', 'transformerropesg'], ['rconv3', 'transformerropesg'], ['rconv4', 'transformerropesg'], ['rconv5', 'transformerropesg'], ['rconv6', 'transformerropesg'], ['rconv7', 'transformerropesg'], ['rconv8', 'transformerropesg'], ['rconv9', 'transformerropesg'], ['rconv10', 'transformerropesg'], ['rconv11', 'transformerropesg'], ['rconv12', 'transformerropesg'], ['rconv13', 'transformerropesg'], ['rconv14', 'transformerropesg'], ['rconv15', 'transformerropesg'], ['rconv16', 'transformerropesg'], ['rconv17', 'transformerropesg'], ['rconv18', 'transformerropesg'], ['rconv19', 'transformerropesg'], ['rconv20', 'transformerropesg'], ['rconv21', 'transformerropesg'], ['rconv22', 'transformerropesg'], ['rconv23', 'transformerropesg'], ['rconv24', 'transformerropesg'], ['rconv25', 'transformerropesg'], ['rconv26', 'transformerropesg'], ['rconv27', 'transformerropesg'], ['rconv28', 'transformerropesg'], ['rconv29', 'transformerropesg'], ['rconv30', 'transformerropesg'], ['rconv31', 'transformerropesg'], ['rconv32', 'transformerropesg']], 'p1_num_channels': 64, 'g1_num_channels': 64, 'v1_num_channels': 128, 'sbv2_num_channels': 128, 'num_scorebeliefs': 8, 'v2_size': 144, 'bnorm_use_gamma': True, 'activation': 'silu'}
No NaN/Inf in BatchNorm layers
Load swa model 0
swa_period_samples 500000.0
swa_scales [32.0]
lookahead_alpha None
lookahead_k None
soft_policy_weight_scale 8.0
disable_optimistic_policy False
meta_kata_only_soft_policy False
value_loss_scale 0.6
td_value_loss_scales [0.6, 0.6, 0.6]
seki_loss_scale 1.0
variance_time_loss_scale 1.0
main_loss_scale None
intermediate_loss_scale None
Parameters in model:
Total num params: 109468997
Total trainable params: 109468997
Training in FP16! Creating scaler
Updated training data: /root/katago_project/data/shuffleddata/current
Train steps since last reload: 19873792 -> 0
Skipping 0/1362 files in: /root/katago_project/data/shuffleddata/current/train as already used first pass
GC collect
=========================================================================
BEGINNING NEXT EPOCH 0
=========================================================================
Current time: 2026-02-28 17:31:05.528655
Global step: 19873792 samples
Currently up to data row 95431473
Training dir: /root/katago_project/data/train/b32c512h16tfrs
Export dir: /root/katago_project/data/torchmodels_toexport_extra
Current grad scale: 65536.0
Beginning training subepoch!
Currently up to data row 95431473
['./train_muon_ki.py', '-traindir', '/root/katago_project/data/train/b32c512h16tfrs', '-datadir', '/root/katago_project/data/shuffleddata/current', '-exportdir', '/root/katago_project/data/torchmodels_toexport_extra', '-exportprefix', 'b32c512h16tfrs', '-max-epochs-this-instance', '100', '-pos-len', '19', '-samples-per-epoch', '5000000', '-lr-scale', '1.0', '-swa-scales', '32.0', '-use-fp16', '-symmetry-type', 'xyt', '-batch-size', '64', '-model-kind', 'b32c512h16tfrs-bng-silu', '-swa-period-samples', '500000', '-max-val-samples', '20000', '-enable-history-matrices', '-multi-gpus', '0,1,2,3,4,5,6,7', '-gnorm-clip-scale', '1.0', '-wd-scale', '1.0', '-export-prob', '0.003', '-master-port', '23456', '-lr-scale', '1.0']
Running torch.distributed.init_process_group
Returned from torch.distributed.init_process_group, my rank = 0, world_size=8
Using GPU device: Tesla V100-SXM2-32GB
Seeding torch with 41190165315774035
{'version': 15, 'norm_kind': 'bnorm', 'bnorm_epsilon': 0.0001, 'bnorm_running_avg_momentum': 0.001, 'initial_conv_1x1': False, 'trunk_num_channels': 512, 'mid_num_channels': 512, 'gpool_num_channels': 64, 'transformer_ffn_channels': 1536, 'transformer_heads': 16, 'transformer_kv_heads': 16, 'use_attention_pool': False, 'num_attention_pool_heads': 4, 'block_kind': [['rconv1', 'transformerropesg'], ['rconv2', 'transformerropesg'], ['rconv3', 'transformerropesg'], ['rconv4', 'transformerropesg'], ['rconv5', 'transformerropesg'], ['rconv6', 'transformerropesg'], ['rconv7', 'transformerropesg'], ['rconv8', 'transformerropesg'], ['rconv9', 'transformerropesg'], ['rconv10', 'transformerropesg'], ['rconv11', 'transformerropesg'], ['rconv12', 'transformerropesg'], ['rconv13', 'transformerropesg'], ['rconv14', 'transformerropesg'], ['rconv15', 'transformerropesg'], ['rconv16', 'transformerropesg'], ['rconv17', 'transformerropesg'], ['rconv18', 'transformerropesg'], ['rconv19', 'transformerropesg'], ['rconv20', 'transformerropesg'], ['rconv21', 'transformerropesg'], ['rconv22', 'transformerropesg'], ['rconv23', 'transformerropesg'], ['rconv24', 'transformerropesg'], ['rconv25', 'transformerropesg'], ['rconv26', 'transformerropesg'], ['rconv27', 'transformerropesg'], ['rconv28', 'transformerropesg'], ['rconv29', 'transformerropesg'], ['rconv30', 'transformerropesg'], ['rconv31', 'transformerropesg'], ['rconv32', 'transformerropesg']], 'p1_num_channels': 64, 'g1_num_channels': 64, 'v1_num_channels': 128, 'sbv2_num_channels': 128, 'num_scorebeliefs': 8, 'v2_size': 144, 'bnorm_use_gamma': True, 'activation': 'silu'}
['./train_muon_ki.py', '-traindir', '/root/katago_project/data/train/b32c512h16tfrs', '-datadir', '/root/katago_project/data/shuffleddata/current', '-exportdir', '/root/katago_project/data/torchmodels_toexport_extra', '-exportprefix', 'b32c512h16tfrs', '-max-epochs-this-instance', '100', '-pos-len', '19', '-samples-per-epoch', '5000000', '-lr-scale', '1.0', '-swa-scales', '32.0', '-use-fp16', '-symmetry-type', 'xyt', '-batch-size', '64', '-model-kind', 'b32c512h16tfrs-bng-silu', '-swa-period-samples', '500000', '-max-val-samples', '20000', '-enable-history-matrices', '-multi-gpus', '0,1,2,3,4,5,6,7', '-gnorm-clip-scale', '1.0', '-wd-scale', '1.0', '-export-prob', '0.003', '-master-port', '23456', '-lr-scale', '1.0']
Running torch.distributed.init_process_group
Returned from torch.distributed.init_process_group, my rank = 0, world_size=8
Using GPU device: Tesla V100-SXM2-32GB
Seeding torch with 7477717832099051
{'version': 15, 'norm_kind': 'bnorm', 'bnorm_epsilon': 0.0001, 'bnorm_running_avg_momentum': 0.001, 'initial_conv_1x1': False, 'trunk_num_channels': 512, 'mid_num_channels': 512, 'gpool_num_channels': 64, 'transformer_ffn_channels': 1536, 'transformer_heads': 16, 'transformer_kv_heads': 16, 'use_attention_pool': False, 'num_attention_pool_heads': 4, 'block_kind': [['rconv1', 'transformerropesg'], ['rconv2', 'transformerropesg'], ['rconv3', 'transformerropesg'], ['rconv4', 'transformerropesg'], ['rconv5', 'transformerropesg'], ['rconv6', 'transformerropesg'], ['rconv7', 'transformerropesg'], ['rconv8', 'transformerropesg'], ['rconv9', 'transformerropesg'], ['rconv10', 'transformerropesg'], ['rconv11', 'transformerropesg'], ['rconv12', 'transformerropesg'], ['rconv13', 'transformerropesg'], ['rconv14', 'transformerropesg'], ['rconv15', 'transformerropesg'], ['rconv16', 'transformerropesg'], ['rconv17', 'transformerropesg'], ['rconv18', 'transformerropesg'], ['rconv19', 'transformerropesg'], ['rconv20', 'transformerropesg'], ['rconv21', 'transformerropesg'], ['rconv22', 'transformerropesg'], ['rconv23', 'transformerropesg'], ['rconv24', 'transformerropesg'], ['rconv25', 'transformerropesg'], ['rconv26', 'transformerropesg'], ['rconv27', 'transformerropesg'], ['rconv28', 'transformerropesg'], ['rconv29', 'transformerropesg'], ['rconv30', 'transformerropesg'], ['rconv31', 'transformerropesg'], ['rconv32', 'transformerropesg']], 'p1_num_channels': 64, 'g1_num_channels': 64, 'v1_num_channels': 128, 'sbv2_num_channels': 128, 'num_scorebeliefs': 8, 'v2_size': 144, 'bnorm_use_gamma': True, 'activation': 'silu'}
No NaN/Inf in BatchNorm layers
Load swa model 0
swa_period_samples 500000.0
swa_scales [32.0]
lookahead_alpha None
lookahead_k None
soft_policy_weight_scale 8.0
disable_optimistic_policy False
meta_kata_only_soft_policy False
value_loss_scale 0.6
td_value_loss_scales [0.6, 0.6, 0.6]
seki_loss_scale 1.0
variance_time_loss_scale 1.0
main_loss_scale None
intermediate_loss_scale None
Parameters in model:
Total num params: 109468997
Total trainable params: 109468997
Training in FP16! Creating scaler
Updated training data: /root/katago_project/data/shuffleddata/current
Train steps since last reload: 19873792 -> 0
Skipping 0/1362 files in: /root/katago_project/data/shuffleddata/current/train as already used first pass
GC collect
=========================================================================
BEGINNING NEXT EPOCH 0
=========================================================================
Current time: 2026-02-28 17:45:35.357092
Global step: 19873792 samples
Currently up to data row 95431473
Training dir: /root/katago_project/data/train/b32c512h16tfrs
Export dir: /root/katago_project/data/torchmodels_toexport_extra
Current grad scale: 65536.0
Beginning training subepoch!
Currently up to data row 95431473
b32c512h16tfrs: nsamp=19924992, time=110.63, p0loss=2.3021, vloss=0.7136, pslr=3.269e-04,wdtc=4.791e+06, norm=3.76275e+05, attn_norm=2.49336e+05
b32c512h16tfrs: nsamp=19976192, time=90.69, p0loss=2.2627, vloss=0.7150, pslr=2.265e-04,wdtc=9.984e+06, norm=3.75996e+05, attn_norm=2.49214e+05
b32c512h16tfrs: nsamp=20027392, time=90.79, p0loss=2.2576, vloss=0.7078, pslr=2.263e-04,wdtc=1.000e+07, norm=3.76159e+05, attn_norm=2.49213e+05
b32c512h16tfrs: nsamp=20078592, time=91.40, p0loss=2.2601, vloss=0.7063, pslr=2.263e-04,wdtc=1.000e+07, norm=3.76301e+05, attn_norm=2.49204e+05
b32c512h16tfrs: nsamp=20129792, time=91.49, p0loss=2.2445, vloss=0.7097, pslr=2.263e-04,wdtc=1.000e+07, norm=3.76429e+05, attn_norm=2.49177e+05
b32c512h16tfrs: nsamp=20180992, time=91.24, p0loss=2.2520, vloss=0.7090, pslr=2.263e-04,wdtc=1.000e+07, norm=3.76538e+05, attn_norm=2.49164e+05
b32c512h16tfrs: nsamp=20232192, time=90.99, p0loss=2.2526, vloss=0.7075, pslr=2.263e-04,wdtc=1.000e+07, norm=3.76610e+05, attn_norm=2.49142e+05
b32c512h16tfrs: nsamp=20283392, time=90.80, p0loss=2.2501, vloss=0.6932, pslr=2.263e-04,wdtc=1.000e+07, norm=3.76667e+05, attn_norm=2.49114e+05
b32c512h16tfrs: nsamp=20334592, time=91.27, p0loss=2.2274, vloss=0.7009, pslr=2.263e-04,wdtc=1.000e+07, norm=3.76800e+05, attn_norm=2.49092e+05
b32c512h16tfrs: nsamp=20385792, time=91.60, p0loss=2.2203, vloss=0.6998, pslr=2.263e-04,wdtc=1.000e+07, norm=3.76852e+05, attn_norm=2.49068e+05
b32c512h16tfrs: nsamp=20436992, time=91.48, p0loss=2.2280, vloss=0.7003, pslr=2.263e-04,wdtc=1.000e+07, norm=3.76904e+05, attn_norm=2.49056e+05
b32c512h16tfrs: nsamp=20488192, time=91.49, p0loss=2.2160, vloss=0.7076, pslr=2.263e-04,wdtc=1.000e+07, norm=3.77011e+05, attn_norm=2.49033e+05
b32c512h16tfrs: nsamp=20539392, time=91.65, p0loss=2.2220, vloss=0.7022, pslr=2.263e-04,wdtc=1.000e+07, norm=3.77066e+05, attn_norm=2.49004e+05
b32c512h16tfrs: nsamp=20590592, time=91.57, p0loss=2.2213, vloss=0.6993, pslr=2.263e-04,wdtc=1.000e+07, norm=3.77149e+05, attn_norm=2.48975e+05
b32c512h16tfrs: nsamp=20641792, time=91.53, p0loss=2.2101, vloss=0.7014, pslr=2.263e-04,wdtc=1.000e+07, norm=3.77169e+05, attn_norm=2.48950e+05
b32c512h16tfrs: nsamp=20692992, time=91.12, p0loss=2.2312, vloss=0.6974, pslr=2.263e-04,wdtc=1.000e+07, norm=3.77189e+05, attn_norm=2.48940e+05
b32c512h16tfrs: nsamp=20744192, time=91.66, p0loss=2.2170, vloss=0.7028, pslr=2.263e-04,wdtc=1.000e+07, norm=3.77249e+05, attn_norm=2.48912e+05
b32c512h16tfrs: nsamp=20795392, time=91.81, p0loss=2.2297, vloss=0.7013, pslr=2.263e-04,wdtc=1.000e+07, norm=3.77292e+05, attn_norm=2.48888e+05
b32c512h16tfrs: nsamp=20846592, time=91.48, p0loss=2.2199, vloss=0.6975, pslr=2.263e-04,wdtc=1.000e+07, norm=3.77421e+05, attn_norm=2.48866e+05
b32c512h16tfrs: nsamp=20897792, time=91.05, p0loss=2.2002, vloss=0.7088, pslr=2.263e-04,wdtc=1.000e+07, norm=3.77582e+05, attn_norm=2.48836e+05
b32c512h16tfrs: nsamp=20948992, time=91.45, p0loss=2.2217, vloss=0.7026, pslr=2.263e-04,wdtc=1.000e+07, norm=3.77705e+05, attn_norm=2.48810e+05
b32c512h16tfrs: nsamp=21000192, time=91.56, p0loss=2.2230, vloss=0.6961, pslr=2.263e-04,wdtc=1.000e+07, norm=3.77786e+05, attn_norm=2.48784e+05
b32c512h16tfrs: nsamp=21051392, time=91.42, p0loss=2.2381, vloss=0.6877, pslr=2.263e-04,wdtc=1.000e+07, norm=3.77900e+05, attn_norm=2.48762e+05
b32c512h16tfrs: nsamp=21102592, time=91.49, p0loss=2.2010, vloss=0.6979, pslr=2.263e-04,wdtc=1.000e+07, norm=3.77911e+05, attn_norm=2.48729e+05
b32c512h16tfrs: nsamp=21153792, time=91.57, p0loss=2.1973, vloss=0.6997, pslr=2.263e-04,wdtc=1.000e+07, norm=3.77959e+05, attn_norm=2.48690e+05
b32c512h16tfrs: nsamp=21204992, time=91.24, p0loss=2.2131, vloss=0.6996, pslr=2.263e-04,wdtc=1.000e+07, norm=3.78024e+05, attn_norm=2.48671e+05
b32c512h16tfrs: nsamp=21256192, time=91.13, p0loss=2.1867, vloss=0.7017, pslr=2.263e-04,wdtc=1.000e+07, norm=3.78062e+05, attn_norm=2.48646e+05
b32c512h16tfrs: nsamp=21307392, time=90.84, p0loss=2.1805, vloss=0.7022, pslr=2.263e-04,wdtc=1.000e+07, norm=3.78112e+05, attn_norm=2.48623e+05
b32c512h16tfrs: nsamp=21358592, time=91.37, p0loss=2.2080, vloss=0.6916, pslr=2.263e-04,wdtc=1.000e+07, norm=3.78157e+05, attn_norm=2.48603e+05
b32c512h16tfrs: nsamp=21409792, time=91.81, p0loss=2.1993, vloss=0.7019, pslr=2.263e-04,wdtc=1.000e+07, norm=3.78262e+05, attn_norm=2.48581e+05
b32c512h16tfrs: nsamp=21460992, time=91.66, p0loss=2.1997, vloss=0.6997, pslr=2.263e-04,wdtc=1.000e+07, norm=3.78406e+05, attn_norm=2.48540e+05
b32c512h16tfrs: nsamp=21512192, time=91.12, p0loss=2.2159, vloss=0.6992, pslr=2.263e-04,wdtc=1.000e+07, norm=3.78532e+05, attn_norm=2.48526e+05
b32c512h16tfrs: nsamp=21563392, time=91.03, p0loss=2.2113, vloss=0.6954, pslr=2.263e-04,wdtc=1.000e+07, norm=3.78627e+05, attn_norm=2.48519e+05
b32c512h16tfrs: nsamp=21614592, time=91.48, p0loss=2.1980, vloss=0.6946, pslr=2.263e-04,wdtc=1.000e+07, norm=3.78719e+05, attn_norm=2.48508e+05
b32c512h16tfrs: nsamp=21665792, time=90.64, p0loss=2.1994, vloss=0.7011, pslr=2.263e-04,wdtc=1.000e+07, norm=3.78799e+05, attn_norm=2.48488e+05
b32c512h16tfrs: nsamp=21716992, time=91.35, p0loss=2.2008, vloss=0.6986, pslr=2.263e-04,wdtc=1.000e+07, norm=3.78841e+05, attn_norm=2.48458e+05
b32c512h16tfrs: nsamp=21768192, time=91.12, p0loss=2.2064, vloss=0.6999, pslr=2.263e-04,wdtc=1.000e+07, norm=3.78846e+05, attn_norm=2.48439e+05
b32c512h16tfrs: nsamp=21819392, time=90.76, p0loss=2.1961, vloss=0.6956, pslr=2.263e-04,wdtc=1.000e+07, norm=3.78894e+05, attn_norm=2.48422e+05
b32c512h16tfrs: nsamp=21870592, time=91.07, p0loss=2.1901, vloss=0.6924, pslr=2.263e-04,wdtc=1.000e+07, norm=3.78925e+05, attn_norm=2.48394e+05
b32c512h16tfrs: nsamp=21921792, time=91.58, p0loss=2.1774, vloss=0.6964, pslr=2.263e-04,wdtc=1.000e+07, norm=3.78959e+05, attn_norm=2.48364e+05
b32c512h16tfrs: nsamp=21972992, time=91.64, p0loss=2.1972, vloss=0.6917, pslr=2.263e-04,wdtc=1.000e+07, norm=3.78938e+05, attn_norm=2.48340e+05
b32c512h16tfrs: nsamp=22024192, time=91.59, p0loss=2.1769, vloss=0.6873, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79014e+05, attn_norm=2.48316e+05
b32c512h16tfrs: nsamp=22075392, time=90.99, p0loss=2.1700, vloss=0.6952, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79038e+05, attn_norm=2.48295e+05
b32c512h16tfrs: nsamp=22126592, time=91.71, p0loss=2.1588, vloss=0.6943, pslr=2.263e-04,wdtc=1.000e+07, norm=3.78981e+05, attn_norm=2.48279e+05
b32c512h16tfrs: nsamp=22177792, time=92.21, p0loss=2.1687, vloss=0.6925, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79006e+05, attn_norm=2.48243e+05
b32c512h16tfrs: nsamp=22228992, time=92.05, p0loss=2.1739, vloss=0.6916, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79096e+05, attn_norm=2.48220e+05
b32c512h16tfrs: nsamp=22280192, time=91.68, p0loss=2.1791, vloss=0.6946, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79152e+05, attn_norm=2.48185e+05
b32c512h16tfrs: nsamp=22331392, time=92.14, p0loss=2.1711, vloss=0.6952, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79241e+05, attn_norm=2.48157e+05
b32c512h16tfrs: nsamp=22382592, time=92.08, p0loss=2.1544, vloss=0.6864, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79275e+05, attn_norm=2.48142e+05
b32c512h16tfrs: nsamp=22433792, time=91.37, p0loss=2.1684, vloss=0.6828, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79239e+05, attn_norm=2.48109e+05
b32c512h16tfrs: nsamp=22484992, time=91.95, p0loss=2.1817, vloss=0.6920, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79263e+05, attn_norm=2.48076e+05
b32c512h16tfrs: nsamp=22536192, time=91.85, p0loss=2.1761, vloss=0.6991, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79269e+05, attn_norm=2.48053e+05
b32c512h16tfrs: nsamp=22587392, time=91.45, p0loss=2.1680, vloss=0.6938, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79331e+05, attn_norm=2.48038e+05
b32c512h16tfrs: nsamp=22638592, time=91.32, p0loss=2.1555, vloss=0.6896, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79392e+05, attn_norm=2.48006e+05
b32c512h16tfrs: nsamp=22689792, time=91.53, p0loss=2.1658, vloss=0.6899, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79457e+05, attn_norm=2.47981e+05
b32c512h16tfrs: nsamp=22740992, time=91.51, p0loss=2.1750, vloss=0.6918, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79453e+05, attn_norm=2.47953e+05
b32c512h16tfrs: nsamp=22792192, time=91.26, p0loss=2.1602, vloss=0.6907, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79426e+05, attn_norm=2.47920e+05
b32c512h16tfrs: nsamp=22843392, time=91.38, p0loss=2.1795, vloss=0.6890, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79321e+05, attn_norm=2.47895e+05
b32c512h16tfrs: nsamp=22894592, time=90.89, p0loss=2.1756, vloss=0.6953, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79277e+05, attn_norm=2.47878e+05
b32c512h16tfrs: nsamp=22945792, time=90.70, p0loss=2.1590, vloss=0.6963, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79190e+05, attn_norm=2.47859e+05
b32c512h16tfrs: nsamp=22996992, time=90.69, p0loss=2.1646, vloss=0.6951, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79125e+05, attn_norm=2.47819e+05
b32c512h16tfrs: nsamp=23048192, time=90.62, p0loss=2.1742, vloss=0.6908, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79077e+05, attn_norm=2.47792e+05
b32c512h16tfrs: nsamp=23099392, time=90.69, p0loss=2.1826, vloss=0.6878, pslr=2.263e-04,wdtc=1.000e+07, norm=3.78981e+05, attn_norm=2.47791e+05
b32c512h16tfrs: nsamp=23150592, time=90.67, p0loss=2.1578, vloss=0.6922, pslr=2.263e-04,wdtc=1.000e+07, norm=3.78922e+05, attn_norm=2.47767e+05
b32c512h16tfrs: nsamp=23201792, time=90.57, p0loss=2.1710, vloss=0.6962, pslr=2.263e-04,wdtc=1.000e+07, norm=3.78912e+05, attn_norm=2.47757e+05
b32c512h16tfrs: nsamp=23252992, time=90.73, p0loss=2.1506, vloss=0.6951, pslr=2.263e-04,wdtc=1.000e+07, norm=3.78838e+05, attn_norm=2.47739e+05
b32c512h16tfrs: nsamp=23304192, time=90.70, p0loss=2.1588, vloss=0.6938, pslr=2.263e-04,wdtc=1.000e+07, norm=3.78879e+05, attn_norm=2.47715e+05
b32c512h16tfrs: nsamp=23355392, time=90.72, p0loss=2.1733, vloss=0.6964, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79011e+05, attn_norm=2.47684e+05
b32c512h16tfrs: nsamp=23406592, time=90.58, p0loss=2.1840, vloss=0.6920, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79047e+05, attn_norm=2.47668e+05
b32c512h16tfrs: nsamp=23457792, time=91.24, p0loss=2.1821, vloss=0.6900, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79147e+05, attn_norm=2.47658e+05
b32c512h16tfrs: nsamp=23508992, time=91.01, p0loss=2.1580, vloss=0.6892, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79202e+05, attn_norm=2.47645e+05
b32c512h16tfrs: nsamp=23560192, time=90.72, p0loss=2.1534, vloss=0.6998, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79183e+05, attn_norm=2.47630e+05
b32c512h16tfrs: nsamp=23611392, time=90.60, p0loss=2.1511, vloss=0.6947, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79221e+05, attn_norm=2.47608e+05
b32c512h16tfrs: nsamp=23662592, time=90.70, p0loss=2.1455, vloss=0.6904, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79246e+05, attn_norm=2.47586e+05
b32c512h16tfrs: nsamp=23713792, time=90.93, p0loss=2.1598, vloss=0.6879, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79262e+05, attn_norm=2.47569e+05
b32c512h16tfrs: nsamp=23764992, time=90.68, p0loss=2.1459, vloss=0.6862, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79368e+05, attn_norm=2.47558e+05
b32c512h16tfrs: nsamp=23816192, time=90.70, p0loss=2.1535, vloss=0.6863, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79415e+05, attn_norm=2.47533e+05
b32c512h16tfrs: nsamp=23867392, time=90.70, p0loss=2.1366, vloss=0.6955, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79463e+05, attn_norm=2.47508e+05
b32c512h16tfrs: nsamp=23918592, time=91.05, p0loss=2.1613, vloss=0.6872, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79487e+05, attn_norm=2.47486e+05
b32c512h16tfrs: nsamp=23969792, time=90.84, p0loss=2.1686, vloss=0.6889, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79550e+05, attn_norm=2.47459e+05
b32c512h16tfrs: nsamp=24020992, time=90.70, p0loss=2.1697, vloss=0.6877, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79598e+05, attn_norm=2.47416e+05
b32c512h16tfrs: nsamp=24072192, time=91.00, p0loss=2.1359, vloss=0.6890, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79608e+05, attn_norm=2.47391e+05
b32c512h16tfrs: nsamp=24123392, time=91.51, p0loss=2.1501, vloss=0.6886, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79609e+05, attn_norm=2.47359e+05
b32c512h16tfrs: nsamp=24174592, time=91.39, p0loss=2.1582, vloss=0.6829, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79645e+05, attn_norm=2.47336e+05
b32c512h16tfrs: nsamp=24225792, time=91.51, p0loss=2.1571, vloss=0.6932, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79707e+05, attn_norm=2.47305e+05
b32c512h16tfrs: nsamp=24276992, time=91.12, p0loss=2.1565, vloss=0.6919, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79811e+05, attn_norm=2.47286e+05
b32c512h16tfrs: nsamp=24328192, time=90.74, p0loss=2.1542, vloss=0.6908, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79975e+05, attn_norm=2.47270e+05
b32c512h16tfrs: nsamp=24379392, time=90.54, p0loss=2.1589, vloss=0.6876, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80016e+05, attn_norm=2.47246e+05
b32c512h16tfrs: nsamp=24430592, time=91.26, p0loss=2.1500, vloss=0.6873, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80051e+05, attn_norm=2.47239e+05
b32c512h16tfrs: nsamp=24481792, time=90.68, p0loss=2.1545, vloss=0.6893, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80082e+05, attn_norm=2.47222e+05
b32c512h16tfrs: nsamp=24532992, time=90.79, p0loss=2.1425, vloss=0.6831, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80065e+05, attn_norm=2.47208e+05
b32c512h16tfrs: nsamp=24584192, time=91.27, p0loss=2.1492, vloss=0.6897, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80136e+05, attn_norm=2.47176e+05
b32c512h16tfrs: nsamp=24635392, time=90.83, p0loss=2.1523, vloss=0.6867, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80209e+05, attn_norm=2.47152e+05
b32c512h16tfrs: nsamp=24686592, time=90.75, p0loss=2.1517, vloss=0.6896, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80278e+05, attn_norm=2.47124e+05
b32c512h16tfrs: nsamp=24737792, time=91.29, p0loss=2.1614, vloss=0.6926, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80298e+05, attn_norm=2.47115e+05
b32c512h16tfrs: nsamp=24788992, time=91.17, p0loss=2.1569, vloss=0.6922, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80316e+05, attn_norm=2.47099e+05
Finished training subepoch!
Saving checkpoint: /root/katago_project/data/train/b32c512h16tfrs/checkpoint.ckpt
Beginning validation after epoch!
p0loss = 2.182116, p1loss = 0.375272, p0softloss = 5.213546, p1softloss = 0.699407, p0lopt = 0.000000, p0loptw = 0.000000, p0sopt = 0.066437, p0soptw = 0.037485, vloss = 0.946695, tdvloss1 = 0.490402, tdvloss2 = 0.434805, tdvloss3 = 0.391525, tdsloss = 0.023363, oloss = 0.471655, sloss = 0.380984, fploss = 0.177095, skloss = 0.089773, smloss = 0.055733, sbcdfloss = 0.088738, sbpdfloss = 0.058499, sdregloss = 0.000092, leadloss = 0.037717, vtimeloss = 0.060653, evstloss = 0.195252, esstloss = 0.031959, loss = 52.560627, pacc1 = 0.417033, vsquare = 0.438018, wsum = 160227.439941, nsamp = 160256.000000, ptentr = 0.924267, ptsoftentr = 4.970883, sekiweightscale = 6.998752, norm_normal_batch = 380318.218750, norm_normal_gamma_batch = 0.000000, norm_normal_attn_batch = 247085.406250, norm_output_batch = 589.951660, norm_noreg_batch = 16242.935547, norm_output_noreg_batch = 0.575687, nsamp_train = 24803328.000000, wsum_train = 24799429.514160
Validation took 207.02861108304933 seconds
Validating swa_scale=32.0
p0loss = 3.828700, p1loss = 0.546428, p0softloss = 5.468020, p1softloss = 0.725838, p0lopt = 0.000000, p0loptw = 0.000000, p0sopt = 1.670777, p0soptw = 0.432044, vloss = 0.949784, tdvloss1 = 0.451989, tdvloss2 = 0.447146, tdvloss3 = 0.450449, tdsloss = 0.036325, oloss = 0.558551, sloss = 0.466453, fploss = 0.462130, skloss = 0.143967, smloss = 0.068254, sbcdfloss = 0.113306, sbpdfloss = 0.065682, sdregloss = 0.041332, leadloss = 0.059380, vtimeloss = 0.063818, evstloss = 0.243531, esstloss = 0.069420, loss = 57.530679, pacc1 = 0.302666, vsquare = 0.227307, wsum = 160227.439941, nsamp = 160256.000000, ptentr = 0.924267, ptsoftentr = 4.970883, sekiweightscale = 6.998752, norm_normal_batch = 83207.070312, norm_normal_gamma_batch = 0.000000, norm_normal_attn_batch = 70438.117188, norm_output_batch = 579.992004, norm_noreg_batch = 16301.172852, norm_output_noreg_batch = 0.466393, nsamp_train = 24803328.000000, wsum_train = 24799429.514160
Validation swa took 203.3367468980141 seconds
Export cycle counter = 2
Skipping export model this time
GC collect
=========================================================================
BEGINNING NEXT EPOCH 1
=========================================================================
Current time: 2026-02-28 20:19:08.251157
Global step: 24803328 samples
Currently up to data row 95431473
Training dir: /root/katago_project/data/train/b32c512h16tfrs
Export dir: /root/katago_project/data/torchmodels_toexport_extra
Current grad scale: 64.0
Beginning training subepoch!
Currently up to data row 95431473
b32c512h16tfrs: nsamp=24854528, time=103.54, p0loss=2.1572, vloss=0.6901, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80359e+05, attn_norm=2.47077e+05
b32c512h16tfrs: nsamp=24905728, time=91.53, p0loss=2.1624, vloss=0.6828, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80377e+05, attn_norm=2.47063e+05
b32c512h16tfrs: nsamp=24956928, time=91.52, p0loss=2.1479, vloss=0.6902, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80352e+05, attn_norm=2.47034e+05
b32c512h16tfrs: nsamp=25008128, time=91.40, p0loss=2.1388, vloss=0.6789, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80335e+05, attn_norm=2.47017e+05
b32c512h16tfrs: nsamp=25059328, time=90.72, p0loss=2.1521, vloss=0.6773, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80297e+05, attn_norm=2.47004e+05
b32c512h16tfrs: nsamp=25110528, time=90.75, p0loss=2.1426, vloss=0.6863, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80269e+05, attn_norm=2.46993e+05
b32c512h16tfrs: nsamp=25161728, time=91.26, p0loss=2.1520, vloss=0.6828, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80292e+05, attn_norm=2.46964e+05
b32c512h16tfrs: nsamp=25212928, time=90.72, p0loss=2.1370, vloss=0.6818, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80345e+05, attn_norm=2.46944e+05
b32c512h16tfrs: nsamp=25264128, time=91.30, p0loss=2.1395, vloss=0.6822, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80305e+05, attn_norm=2.46935e+05
b32c512h16tfrs: nsamp=25315328, time=91.01, p0loss=2.1534, vloss=0.6879, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80241e+05, attn_norm=2.46908e+05
b32c512h16tfrs: nsamp=25366528, time=90.63, p0loss=2.1518, vloss=0.6878, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80229e+05, attn_norm=2.46868e+05
b32c512h16tfrs: nsamp=25417728, time=90.51, p0loss=2.1593, vloss=0.6819, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80257e+05, attn_norm=2.46839e+05
b32c512h16tfrs: nsamp=25468928, time=91.30, p0loss=2.1408, vloss=0.6822, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80348e+05, attn_norm=2.46819e+05
b32c512h16tfrs: nsamp=25520128, time=91.19, p0loss=2.1342, vloss=0.6868, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80353e+05, attn_norm=2.46804e+05
b32c512h16tfrs: nsamp=25571328, time=90.71, p0loss=2.1511, vloss=0.6868, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80315e+05, attn_norm=2.46789e+05
b32c512h16tfrs: nsamp=25622528, time=91.52, p0loss=2.1602, vloss=0.6871, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80240e+05, attn_norm=2.46792e+05
b32c512h16tfrs: nsamp=25673728, time=90.97, p0loss=2.1557, vloss=0.6800, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80233e+05, attn_norm=2.46782e+05
b32c512h16tfrs: nsamp=25724928, time=91.38, p0loss=2.1385, vloss=0.6876, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80293e+05, attn_norm=2.46758e+05
b32c512h16tfrs: nsamp=25776128, time=91.58, p0loss=2.1528, vloss=0.6880, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80319e+05, attn_norm=2.46724e+05
b32c512h16tfrs: nsamp=25827328, time=90.84, p0loss=2.1408, vloss=0.6919, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80330e+05, attn_norm=2.46694e+05
b32c512h16tfrs: nsamp=25878528, time=91.81, p0loss=2.1398, vloss=0.6860, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80304e+05, attn_norm=2.46695e+05
b32c512h16tfrs: nsamp=25929728, time=91.61, p0loss=2.1447, vloss=0.6864, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80287e+05, attn_norm=2.46690e+05
b32c512h16tfrs: nsamp=25980928, time=91.48, p0loss=2.1387, vloss=0.6903, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80351e+05, attn_norm=2.46675e+05
b32c512h16tfrs: nsamp=26032128, time=91.50, p0loss=2.1338, vloss=0.6891, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80435e+05, attn_norm=2.46665e+05
b32c512h16tfrs: nsamp=26083328, time=91.61, p0loss=2.1347, vloss=0.6855, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80464e+05, attn_norm=2.46650e+05
b32c512h16tfrs: nsamp=26134528, time=91.44, p0loss=2.1385, vloss=0.6891, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80526e+05, attn_norm=2.46632e+05
b32c512h16tfrs: nsamp=26185728, time=90.54, p0loss=2.1483, vloss=0.6839, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80580e+05, attn_norm=2.46621e+05
b32c512h16tfrs: nsamp=26236928, time=90.70, p0loss=2.1432, vloss=0.6787, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80619e+05, attn_norm=2.46617e+05
b32c512h16tfrs: nsamp=26288128, time=90.70, p0loss=2.1378, vloss=0.6817, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80564e+05, attn_norm=2.46611e+05
b32c512h16tfrs: nsamp=26339328, time=90.75, p0loss=2.1341, vloss=0.6862, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80543e+05, attn_norm=2.46594e+05
b32c512h16tfrs: nsamp=26390528, time=91.58, p0loss=2.1184, vloss=0.6843, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80565e+05, attn_norm=2.46576e+05
b32c512h16tfrs: nsamp=26441728, time=91.33, p0loss=2.1302, vloss=0.6814, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80599e+05, attn_norm=2.46555e+05
b32c512h16tfrs: nsamp=26492928, time=91.39, p0loss=2.1471, vloss=0.6823, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80563e+05, attn_norm=2.46572e+05
b32c512h16tfrs: nsamp=26544128, time=91.47, p0loss=2.1574, vloss=0.6829, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80449e+05, attn_norm=2.46574e+05
b32c512h16tfrs: nsamp=26595328, time=90.63, p0loss=2.1335, vloss=0.6808, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80406e+05, attn_norm=2.46550e+05
b32c512h16tfrs: nsamp=26646528, time=90.75, p0loss=2.1169, vloss=0.6787, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80459e+05, attn_norm=2.46537e+05
b32c512h16tfrs: nsamp=26697728, time=91.00, p0loss=2.1330, vloss=0.6785, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80467e+05, attn_norm=2.46516e+05
b32c512h16tfrs: nsamp=26748928, time=91.69, p0loss=2.1235, vloss=0.6808, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80540e+05, attn_norm=2.46500e+05
b32c512h16tfrs: nsamp=26800128, time=91.45, p0loss=2.1284, vloss=0.6820, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80568e+05, attn_norm=2.46490e+05
b32c512h16tfrs: nsamp=26851328, time=91.30, p0loss=2.1174, vloss=0.6713, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80510e+05, attn_norm=2.46480e+05
b32c512h16tfrs: nsamp=26902528, time=91.34, p0loss=2.1244, vloss=0.6848, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80473e+05, attn_norm=2.46483e+05
b32c512h16tfrs: nsamp=26953728, time=91.26, p0loss=2.1244, vloss=0.6770, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80429e+05, attn_norm=2.46472e+05
b32c512h16tfrs: nsamp=27004928, time=91.51, p0loss=2.1078, vloss=0.6876, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80334e+05, attn_norm=2.46462e+05
b32c512h16tfrs: nsamp=27056128, time=90.99, p0loss=2.1233, vloss=0.6878, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80352e+05, attn_norm=2.46440e+05
b32c512h16tfrs: nsamp=27107328, time=91.49, p0loss=2.1271, vloss=0.6786, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80386e+05, attn_norm=2.46424e+05
b32c512h16tfrs: nsamp=27158528, time=91.40, p0loss=2.1245, vloss=0.6811, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80327e+05, attn_norm=2.46419e+05
b32c512h16tfrs: nsamp=27209728, time=90.75, p0loss=2.1325, vloss=0.6892, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80304e+05, attn_norm=2.46409e+05
b32c512h16tfrs: nsamp=27260928, time=91.03, p0loss=2.1149, vloss=0.6827, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80276e+05, attn_norm=2.46399e+05
b32c512h16tfrs: nsamp=27312128, time=91.35, p0loss=2.1215, vloss=0.6811, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80320e+05, attn_norm=2.46380e+05
b32c512h16tfrs: nsamp=27363328, time=91.26, p0loss=2.1275, vloss=0.6811, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80355e+05, attn_norm=2.46372e+05
b32c512h16tfrs: nsamp=27414528, time=91.40, p0loss=2.1261, vloss=0.6809, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80359e+05, attn_norm=2.46364e+05
b32c512h16tfrs: nsamp=27465728, time=91.35, p0loss=2.1289, vloss=0.6840, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80416e+05, attn_norm=2.46351e+05
b32c512h16tfrs: nsamp=27516928, time=91.42, p0loss=2.1267, vloss=0.6839, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80330e+05, attn_norm=2.46334e+05
b32c512h16tfrs: nsamp=27568128, time=91.30, p0loss=2.1151, vloss=0.6821, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80227e+05, attn_norm=2.46317e+05
b32c512h16tfrs: nsamp=27619328, time=91.38, p0loss=2.1115, vloss=0.6785, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80250e+05, attn_norm=2.46299e+05
b32c512h16tfrs: nsamp=27670528, time=91.09, p0loss=2.1070, vloss=0.6844, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80206e+05, attn_norm=2.46299e+05
b32c512h16tfrs: nsamp=27721728, time=91.34, p0loss=2.1057, vloss=0.6853, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80193e+05, attn_norm=2.46298e+05
b32c512h16tfrs: nsamp=27772928, time=91.29, p0loss=2.1209, vloss=0.6830, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80194e+05, attn_norm=2.46283e+05
b32c512h16tfrs: nsamp=27824128, time=90.93, p0loss=2.1169, vloss=0.6910, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80271e+05, attn_norm=2.46269e+05
b32c512h16tfrs: nsamp=27875328, time=90.67, p0loss=2.1147, vloss=0.6844, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80270e+05, attn_norm=2.46258e+05
b32c512h16tfrs: nsamp=27926528, time=90.57, p0loss=2.1289, vloss=0.6821, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80289e+05, attn_norm=2.46256e+05
b32c512h16tfrs: nsamp=27977728, time=91.32, p0loss=2.1171, vloss=0.6864, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80319e+05, attn_norm=2.46251e+05
b32c512h16tfrs: nsamp=28028928, time=90.78, p0loss=2.1254, vloss=0.6888, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80276e+05, attn_norm=2.46249e+05
b32c512h16tfrs: nsamp=28080128, time=91.03, p0loss=2.1181, vloss=0.6822, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80211e+05, attn_norm=2.46239e+05
b32c512h16tfrs: nsamp=28131328, time=91.42, p0loss=2.1167, vloss=0.6919, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80216e+05, attn_norm=2.46240e+05
b32c512h16tfrs: nsamp=28182528, time=91.59, p0loss=2.1293, vloss=0.6815, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80294e+05, attn_norm=2.46230e+05
b32c512h16tfrs: nsamp=28233728, time=91.16, p0loss=2.1381, vloss=0.6811, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80219e+05, attn_norm=2.46223e+05
b32c512h16tfrs: nsamp=28284928, time=90.68, p0loss=2.1253, vloss=0.6865, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80129e+05, attn_norm=2.46221e+05
b32c512h16tfrs: nsamp=28336128, time=90.66, p0loss=2.1118, vloss=0.6759, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80141e+05, attn_norm=2.46201e+05
b32c512h16tfrs: nsamp=28387328, time=91.14, p0loss=2.1211, vloss=0.6780, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80197e+05, attn_norm=2.46195e+05
b32c512h16tfrs: nsamp=28438528, time=91.63, p0loss=2.0988, vloss=0.6809, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80169e+05, attn_norm=2.46177e+05
b32c512h16tfrs: nsamp=28489728, time=91.39, p0loss=2.1257, vloss=0.6788, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80050e+05, attn_norm=2.46163e+05
b32c512h16tfrs: nsamp=28540928, time=90.81, p0loss=2.1187, vloss=0.6844, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79970e+05, attn_norm=2.46188e+05
b32c512h16tfrs: nsamp=28592128, time=91.30, p0loss=2.1195, vloss=0.6794, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79898e+05, attn_norm=2.46186e+05
b32c512h16tfrs: nsamp=28643328, time=91.56, p0loss=2.1247, vloss=0.6799, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79900e+05, attn_norm=2.46165e+05
b32c512h16tfrs: nsamp=28694528, time=91.30, p0loss=2.1206, vloss=0.6799, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79905e+05, attn_norm=2.46160e+05
b32c512h16tfrs: nsamp=28745728, time=90.62, p0loss=2.1114, vloss=0.6762, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79918e+05, attn_norm=2.46164e+05
b32c512h16tfrs: nsamp=28796928, time=91.15, p0loss=2.1187, vloss=0.6774, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79991e+05, attn_norm=2.46160e+05
b32c512h16tfrs: nsamp=28848128, time=90.99, p0loss=2.1290, vloss=0.6831, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80005e+05, attn_norm=2.46160e+05
b32c512h16tfrs: nsamp=28899328, time=90.58, p0loss=2.1358, vloss=0.6813, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79981e+05, attn_norm=2.46141e+05
b32c512h16tfrs: nsamp=28950528, time=90.69, p0loss=2.1128, vloss=0.6754, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79982e+05, attn_norm=2.46118e+05
b32c512h16tfrs: nsamp=29001728, time=91.37, p0loss=2.1107, vloss=0.6757, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79923e+05, attn_norm=2.46110e+05
b32c512h16tfrs: nsamp=29052928, time=91.26, p0loss=2.1249, vloss=0.6729, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79940e+05, attn_norm=2.46102e+05
b32c512h16tfrs: nsamp=29104128, time=91.13, p0loss=2.1253, vloss=0.6807, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79939e+05, attn_norm=2.46100e+05
b32c512h16tfrs: nsamp=29155328, time=91.35, p0loss=2.1242, vloss=0.6822, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79985e+05, attn_norm=2.46077e+05
b32c512h16tfrs: nsamp=29206528, time=91.11, p0loss=2.1223, vloss=0.6935, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79964e+05, attn_norm=2.46063e+05
b32c512h16tfrs: nsamp=29257728, time=91.37, p0loss=2.1232, vloss=0.6789, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79878e+05, attn_norm=2.46049e+05
b32c512h16tfrs: nsamp=29308928, time=91.59, p0loss=2.1107, vloss=0.6835, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79782e+05, attn_norm=2.46053e+05
b32c512h16tfrs: nsamp=29360128, time=91.46, p0loss=2.1124, vloss=0.6800, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79758e+05, attn_norm=2.46046e+05
b32c512h16tfrs: nsamp=29411328, time=91.03, p0loss=2.1196, vloss=0.6818, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79824e+05, attn_norm=2.46037e+05
b32c512h16tfrs: nsamp=29462528, time=90.76, p0loss=2.1054, vloss=0.6850, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79793e+05, attn_norm=2.46032e+05
b32c512h16tfrs: nsamp=29513728, time=91.33, p0loss=2.1137, vloss=0.6890, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79774e+05, attn_norm=2.46027e+05
b32c512h16tfrs: nsamp=29564928, time=91.54, p0loss=2.1263, vloss=0.6833, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79819e+05, attn_norm=2.46032e+05
b32c512h16tfrs: nsamp=29616128, time=91.41, p0loss=2.1216, vloss=0.6801, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79793e+05, attn_norm=2.46029e+05
b32c512h16tfrs: nsamp=29667328, time=91.47, p0loss=2.1120, vloss=0.6790, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79690e+05, attn_norm=2.46022e+05
b32c512h16tfrs: nsamp=29718528, time=91.25, p0loss=2.0782, vloss=0.6752, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79620e+05, attn_norm=2.46011e+05
b32c512h16tfrs: nsamp=29769728, time=90.83, p0loss=2.1128, vloss=0.6780, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79640e+05, attn_norm=2.45997e+05
Finished training subepoch!
Saving checkpoint: /root/katago_project/data/train/b32c512h16tfrs/checkpoint.ckpt
Beginning validation after epoch!
p0loss = 2.121933, p1loss = 0.366947, p0softloss = 5.195734, p1softloss = 0.697887, p0lopt = 0.000000, p0loptw = 0.000000, p0sopt = 0.119656, p0soptw = 0.067069, vloss = 0.723038, tdvloss1 = 0.277112, tdvloss2 = 0.223036, tdvloss3 = 0.196283, tdsloss = 0.013658, oloss = 0.470467, sloss = 0.379919, fploss = 0.175661, skloss = 0.090431, smloss = 0.044504, sbcdfloss = 0.075086, sbpdfloss = 0.054513, sdregloss = 0.000077, leadloss = 0.016716, vtimeloss = 0.053494, evstloss = 0.078997, esstloss = 0.022524, loss = 51.651011, pacc1 = 0.425722, vsquare = 0.285062, wsum = 160227.439941, nsamp = 160256.000000, ptentr = 0.924267, ptsoftentr = 4.970883, sekiweightscale = 6.998752, norm_normal_batch = 379633.093750, norm_normal_gamma_batch = 0.000000, norm_normal_attn_batch = 245995.312500, norm_output_batch = 595.274902, norm_noreg_batch = 16236.431641, norm_output_noreg_batch = 0.586044, nsamp_train = 29798400.000000, wsum_train = 29793666.312988
Validation took 208.05123970890418 seconds
Validating swa_scale=32.0
p0loss = 3.247708, p1loss = 0.487305, p0softloss = 5.387498, p1softloss = 0.718207, p0lopt = 0.000000, p0loptw = 0.000000, p0sopt = 1.281004, p0soptw = 0.395525, vloss = 0.889597, tdvloss1 = 0.438214, tdvloss2 = 0.399107, tdvloss3 = 0.394830, tdsloss = 0.033175, oloss = 0.511317, sloss = 0.423332, fploss = 0.265316, skloss = 0.136675, smloss = 0.062011, sbcdfloss = 0.103853, sbpdfloss = 0.063130, sdregloss = 0.018203, leadloss = 0.049001, vtimeloss = 0.061496, evstloss = 0.200699, esstloss = 0.061746, loss = 55.660852, pacc1 = 0.330122, vsquare = 0.221911, wsum = 160227.439941, nsamp = 160256.000000, ptentr = 0.924267, ptsoftentr = 4.970883, sekiweightscale = 6.998752, norm_normal_batch = 112510.929688, norm_normal_gamma_batch = 0.000000, norm_normal_attn_batch = 92332.867188, norm_output_batch = 579.017334, norm_noreg_batch = 16284.375977, norm_output_noreg_batch = 0.493160, nsamp_train = 29798400.000000, wsum_train = 29793666.312988
Validation swa took 203.56892519490793 seconds
Export cycle counter = 1
Skipping export model this time
GC collect
=========================================================================
BEGINNING NEXT EPOCH 2
=========================================================================
Current time: 2026-02-28 22:54:29.579045
Global step: 29798400 samples
Currently up to data row 95431473
Training dir: /root/katago_project/data/train/b32c512h16tfrs
Export dir: /root/katago_project/data/torchmodels_toexport_extra
Current grad scale: 32.0
Beginning training subepoch!
Currently up to data row 95431473
b32c512h16tfrs: nsamp=29849600, time=104.08, p0loss=2.1293, vloss=0.6804, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79626e+05, attn_norm=2.45990e+05
b32c512h16tfrs: nsamp=29900800, time=91.04, p0loss=2.1038, vloss=0.6801, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79623e+05, attn_norm=2.45982e+05
b32c512h16tfrs: nsamp=29952000, time=91.57, p0loss=2.1091, vloss=0.6742, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79638e+05, attn_norm=2.45969e+05
b32c512h16tfrs: nsamp=30003200, time=91.50, p0loss=2.1195, vloss=0.6712, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79620e+05, attn_norm=2.45973e+05
b32c512h16tfrs: nsamp=30054400, time=91.54, p0loss=2.0988, vloss=0.6733, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79604e+05, attn_norm=2.45970e+05
b32c512h16tfrs: nsamp=30105600, time=91.56, p0loss=2.1185, vloss=0.6729, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79523e+05, attn_norm=2.45979e+05
b32c512h16tfrs: nsamp=30156800, time=91.48, p0loss=2.1190, vloss=0.6771, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79479e+05, attn_norm=2.45979e+05
b32c512h16tfrs: nsamp=30208000, time=90.87, p0loss=2.1252, vloss=0.6739, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79479e+05, attn_norm=2.45959e+05
b32c512h16tfrs: nsamp=30259200, time=91.50, p0loss=2.1135, vloss=0.6796, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79489e+05, attn_norm=2.45965e+05
b32c512h16tfrs: nsamp=30310400, time=91.11, p0loss=2.1115, vloss=0.6806, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79541e+05, attn_norm=2.45973e+05
b32c512h16tfrs: nsamp=30361600, time=91.47, p0loss=2.1041, vloss=0.6739, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79509e+05, attn_norm=2.45980e+05
b32c512h16tfrs: nsamp=30412800, time=91.40, p0loss=2.0982, vloss=0.6802, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79503e+05, attn_norm=2.45978e+05
b32c512h16tfrs: nsamp=30464000, time=90.90, p0loss=2.1025, vloss=0.6779, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79488e+05, attn_norm=2.45954e+05
b32c512h16tfrs: nsamp=30515200, time=90.94, p0loss=2.0930, vloss=0.6787, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79360e+05, attn_norm=2.45938e+05
b32c512h16tfrs: nsamp=30566400, time=91.37, p0loss=2.0969, vloss=0.6790, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79262e+05, attn_norm=2.45931e+05
b32c512h16tfrs: nsamp=30617600, time=91.22, p0loss=2.0837, vloss=0.6856, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79241e+05, attn_norm=2.45930e+05
b32c512h16tfrs: nsamp=30668800, time=91.25, p0loss=2.1026, vloss=0.6792, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79243e+05, attn_norm=2.45944e+05
b32c512h16tfrs: nsamp=30720000, time=91.43, p0loss=2.1038, vloss=0.6765, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79162e+05, attn_norm=2.45936e+05
b32c512h16tfrs: nsamp=30771200, time=90.95, p0loss=2.1184, vloss=0.6775, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79130e+05, attn_norm=2.45939e+05
b32c512h16tfrs: nsamp=30822400, time=90.67, p0loss=2.0998, vloss=0.6756, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79129e+05, attn_norm=2.45941e+05
b32c512h16tfrs: nsamp=30873600, time=90.65, p0loss=2.1152, vloss=0.6729, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79094e+05, attn_norm=2.45928e+05
b32c512h16tfrs: nsamp=30924800, time=91.39, p0loss=2.1143, vloss=0.6816, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79097e+05, attn_norm=2.45909e+05
b32c512h16tfrs: nsamp=30976000, time=90.88, p0loss=2.1143, vloss=0.6798, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79132e+05, attn_norm=2.45901e+05
b32c512h16tfrs: nsamp=31027200, time=90.83, p0loss=2.0966, vloss=0.6885, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79128e+05, attn_norm=2.45887e+05
b32c512h16tfrs: nsamp=31078400, time=90.94, p0loss=2.1211, vloss=0.6800, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79114e+05, attn_norm=2.45890e+05
b32c512h16tfrs: nsamp=31129600, time=90.65, p0loss=2.0957, vloss=0.6791, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79184e+05, attn_norm=2.45884e+05
b32c512h16tfrs: nsamp=31180800, time=90.52, p0loss=2.0983, vloss=0.6759, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79227e+05, attn_norm=2.45875e+05
b32c512h16tfrs: nsamp=31232000, time=90.73, p0loss=2.1190, vloss=0.6745, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79229e+05, attn_norm=2.45869e+05
b32c512h16tfrs: nsamp=31283200, time=90.66, p0loss=2.1153, vloss=0.6743, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79228e+05, attn_norm=2.45855e+05
b32c512h16tfrs: nsamp=31334400, time=90.68, p0loss=2.1069, vloss=0.6686, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79305e+05, attn_norm=2.45862e+05
b32c512h16tfrs: nsamp=31385600, time=90.52, p0loss=2.1051, vloss=0.6774, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79346e+05, attn_norm=2.45842e+05
b32c512h16tfrs: nsamp=31436800, time=90.90, p0loss=2.0990, vloss=0.6802, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79384e+05, attn_norm=2.45819e+05
b32c512h16tfrs: nsamp=31488000, time=91.50, p0loss=2.1304, vloss=0.6721, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79403e+05, attn_norm=2.45808e+05
b32c512h16tfrs: nsamp=31539200, time=91.43, p0loss=2.1090, vloss=0.6725, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79422e+05, attn_norm=2.45787e+05
b32c512h16tfrs: nsamp=31590400, time=90.50, p0loss=2.1084, vloss=0.6714, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79517e+05, attn_norm=2.45772e+05
b32c512h16tfrs: nsamp=31641600, time=90.63, p0loss=2.1007, vloss=0.6766, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79601e+05, attn_norm=2.45777e+05
b32c512h16tfrs: nsamp=31692800, time=90.69, p0loss=2.1130, vloss=0.6732, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79627e+05, attn_norm=2.45772e+05
b32c512h16tfrs: nsamp=31744000, time=90.53, p0loss=2.1157, vloss=0.6814, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79597e+05, attn_norm=2.45765e+05
b32c512h16tfrs: nsamp=31795200, time=91.17, p0loss=2.0942, vloss=0.6743, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79600e+05, attn_norm=2.45752e+05
b32c512h16tfrs: nsamp=31846400, time=91.51, p0loss=2.0947, vloss=0.6748, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79622e+05, attn_norm=2.45730e+05
b32c512h16tfrs: nsamp=31897600, time=91.57, p0loss=2.0911, vloss=0.6696, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79534e+05, attn_norm=2.45735e+05
b32c512h16tfrs: nsamp=31948800, time=91.42, p0loss=2.1080, vloss=0.6809, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79496e+05, attn_norm=2.45744e+05
b32c512h16tfrs: nsamp=32000000, time=91.04, p0loss=2.1057, vloss=0.6747, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79463e+05, attn_norm=2.45738e+05
b32c512h16tfrs: nsamp=32051200, time=91.21, p0loss=2.0925, vloss=0.6730, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79495e+05, attn_norm=2.45738e+05
b32c512h16tfrs: nsamp=32102400, time=91.40, p0loss=2.1075, vloss=0.6737, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79433e+05, attn_norm=2.45745e+05
b32c512h16tfrs: nsamp=32153600, time=91.15, p0loss=2.1151, vloss=0.6765, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79393e+05, attn_norm=2.45747e+05
b32c512h16tfrs: nsamp=32204800, time=90.85, p0loss=2.1161, vloss=0.6733, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79413e+05, attn_norm=2.45762e+05
b32c512h16tfrs: nsamp=32256000, time=91.24, p0loss=2.1151, vloss=0.6733, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79459e+05, attn_norm=2.45778e+05
b32c512h16tfrs: nsamp=32307200, time=91.44, p0loss=2.0883, vloss=0.6756, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79455e+05, attn_norm=2.45771e+05
b32c512h16tfrs: nsamp=32358400, time=90.70, p0loss=2.1006, vloss=0.6777, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79447e+05, attn_norm=2.45783e+05
b32c512h16tfrs: nsamp=32409600, time=90.95, p0loss=2.0819, vloss=0.6757, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79484e+05, attn_norm=2.45786e+05
b32c512h16tfrs: nsamp=32460800, time=91.42, p0loss=2.0823, vloss=0.6791, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79527e+05, attn_norm=2.45800e+05
b32c512h16tfrs: nsamp=32512000, time=90.79, p0loss=2.0881, vloss=0.6702, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79562e+05, attn_norm=2.45819e+05
b32c512h16tfrs: nsamp=32563200, time=91.50, p0loss=2.0960, vloss=0.6752, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79543e+05, attn_norm=2.45827e+05
b32c512h16tfrs: nsamp=32614400, time=91.50, p0loss=2.0895, vloss=0.6752, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79554e+05, attn_norm=2.45825e+05
b32c512h16tfrs: nsamp=32665600, time=90.83, p0loss=2.0776, vloss=0.6788, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79569e+05, attn_norm=2.45833e+05
b32c512h16tfrs: nsamp=32716800, time=90.56, p0loss=2.0896, vloss=0.6672, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79540e+05, attn_norm=2.45839e+05
b32c512h16tfrs: nsamp=32768000, time=91.37, p0loss=2.1103, vloss=0.6640, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79503e+05, attn_norm=2.45841e+05
b32c512h16tfrs: nsamp=32819200, time=90.95, p0loss=2.0974, vloss=0.6731, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79537e+05, attn_norm=2.45831e+05
b32c512h16tfrs: nsamp=32870400, time=91.03, p0loss=2.0914, vloss=0.6740, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79567e+05, attn_norm=2.45815e+05
b32c512h16tfrs: nsamp=32921600, time=91.28, p0loss=2.0990, vloss=0.6810, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79612e+05, attn_norm=2.45806e+05
b32c512h16tfrs: nsamp=32972800, time=90.77, p0loss=2.0991, vloss=0.6776, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79645e+05, attn_norm=2.45791e+05
b32c512h16tfrs: nsamp=33024000, time=90.73, p0loss=2.1122, vloss=0.6780, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79688e+05, attn_norm=2.45791e+05
b32c512h16tfrs: nsamp=33075200, time=90.64, p0loss=2.0989, vloss=0.6740, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79702e+05, attn_norm=2.45798e+05
b32c512h16tfrs: nsamp=33126400, time=90.73, p0loss=2.0924, vloss=0.6708, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79622e+05, attn_norm=2.45805e+05
b32c512h16tfrs: nsamp=33177600, time=91.15, p0loss=2.0933, vloss=0.6755, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79615e+05, attn_norm=2.45809e+05
b32c512h16tfrs: nsamp=33228800, time=91.56, p0loss=2.0977, vloss=0.6800, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79677e+05, attn_norm=2.45824e+05
b32c512h16tfrs: nsamp=33280000, time=91.58, p0loss=2.0927, vloss=0.6764, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79762e+05, attn_norm=2.45838e+05
b32c512h16tfrs: nsamp=33331200, time=91.42, p0loss=2.1000, vloss=0.6801, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79805e+05, attn_norm=2.45835e+05
b32c512h16tfrs: nsamp=33382400, time=91.56, p0loss=2.0841, vloss=0.6764, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79774e+05, attn_norm=2.45827e+05
b32c512h16tfrs: nsamp=33433600, time=91.55, p0loss=2.0915, vloss=0.6768, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79701e+05, attn_norm=2.45819e+05
b32c512h16tfrs: nsamp=33484800, time=91.65, p0loss=2.0794, vloss=0.6709, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79665e+05, attn_norm=2.45827e+05
b32c512h16tfrs: nsamp=33536000, time=91.43, p0loss=2.0949, vloss=0.6736, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79685e+05, attn_norm=2.45835e+05
b32c512h16tfrs: nsamp=33587200, time=91.07, p0loss=2.0857, vloss=0.6738, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79669e+05, attn_norm=2.45834e+05
b32c512h16tfrs: nsamp=33638400, time=90.78, p0loss=2.0849, vloss=0.6738, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79687e+05, attn_norm=2.45819e+05
b32c512h16tfrs: nsamp=33689600, time=90.55, p0loss=2.0899, vloss=0.6786, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79709e+05, attn_norm=2.45833e+05
b32c512h16tfrs: nsamp=33740800, time=90.84, p0loss=2.1155, vloss=0.6720, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79733e+05, attn_norm=2.45842e+05
b32c512h16tfrs: nsamp=33792000, time=91.41, p0loss=2.1082, vloss=0.6709, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79774e+05, attn_norm=2.45837e+05
b32c512h16tfrs: nsamp=33843200, time=91.27, p0loss=2.0991, vloss=0.6692, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79726e+05, attn_norm=2.45833e+05
b32c512h16tfrs: nsamp=33894400, time=91.09, p0loss=2.0892, vloss=0.6774, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79735e+05, attn_norm=2.45839e+05
b32c512h16tfrs: nsamp=33945600, time=90.89, p0loss=2.0982, vloss=0.6711, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79696e+05, attn_norm=2.45834e+05
b32c512h16tfrs: nsamp=33996800, time=90.77, p0loss=2.0900, vloss=0.6678, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79678e+05, attn_norm=2.45814e+05
b32c512h16tfrs: nsamp=34048000, time=91.40, p0loss=2.0970, vloss=0.6670, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79660e+05, attn_norm=2.45802e+05
b32c512h16tfrs: nsamp=34099200, time=91.15, p0loss=2.0920, vloss=0.6665, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79645e+05, attn_norm=2.45814e+05
b32c512h16tfrs: nsamp=34150400, time=91.65, p0loss=2.0897, vloss=0.6755, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79665e+05, attn_norm=2.45830e+05
b32c512h16tfrs: nsamp=34201600, time=91.61, p0loss=2.0906, vloss=0.6734, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79667e+05, attn_norm=2.45830e+05
b32c512h16tfrs: nsamp=34252800, time=91.53, p0loss=2.0898, vloss=0.6720, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79713e+05, attn_norm=2.45817e+05
b32c512h16tfrs: nsamp=34304000, time=91.46, p0loss=2.0667, vloss=0.6751, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79767e+05, attn_norm=2.45816e+05
b32c512h16tfrs: nsamp=34355200, time=91.25, p0loss=2.0761, vloss=0.6754, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79832e+05, attn_norm=2.45810e+05
b32c512h16tfrs: nsamp=34406400, time=91.25, p0loss=2.0742, vloss=0.6715, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79878e+05, attn_norm=2.45800e+05
b32c512h16tfrs: nsamp=34457600, time=91.22, p0loss=2.0889, vloss=0.6736, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79840e+05, attn_norm=2.45800e+05
b32c512h16tfrs: nsamp=34508800, time=91.32, p0loss=2.0795, vloss=0.6748, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79749e+05, attn_norm=2.45813e+05
b32c512h16tfrs: nsamp=34560000, time=90.92, p0loss=2.1076, vloss=0.6719, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79655e+05, attn_norm=2.45844e+05
b32c512h16tfrs: nsamp=34611200, time=91.04, p0loss=2.0879, vloss=0.6804, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79661e+05, attn_norm=2.45844e+05
b32c512h16tfrs: nsamp=34662400, time=90.54, p0loss=2.1125, vloss=0.6795, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79769e+05, attn_norm=2.45851e+05
b32c512h16tfrs: nsamp=34713600, time=90.59, p0loss=2.0979, vloss=0.6738, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79774e+05, attn_norm=2.45851e+05
Finished training subepoch!
Saving checkpoint: /root/katago_project/data/train/b32c512h16tfrs/checkpoint.ckpt
Beginning validation after epoch!
p0loss = 2.105660, p1loss = 0.363062, p0softloss = 5.193436, p1softloss = 0.697160, p0lopt = 0.000000, p0loptw = 0.000000, p0sopt = 0.800425, p0soptw = 0.398692, vloss = 0.760267, tdvloss1 = 0.320096, tdvloss2 = 0.253913, tdvloss3 = 0.238918, tdsloss = 0.015569, oloss = 0.470504, sloss = 0.379821, fploss = 0.175198, skloss = 0.088796, smloss = 0.047002, sbcdfloss = 0.076935, sbpdfloss = 0.054611, sdregloss = 0.000068, leadloss = 0.019562, vtimeloss = 0.055771, evstloss = 0.099258, esstloss = 0.024874, loss = 51.868149, pacc1 = 0.430414, vsquare = 0.301951, wsum = 160227.439941, nsamp = 160256.000000, ptentr = 0.924267, ptsoftentr = 4.970883, sekiweightscale = 6.998752, norm_normal_batch = 379796.781250, norm_normal_gamma_batch = 0.000000, norm_normal_attn_batch = 245844.921875, norm_output_batch = 600.382812, norm_noreg_batch = 16230.269531, norm_output_noreg_batch = 0.596076, nsamp_train = 34732032.000000, wsum_train = 34726402.871613
Validation took 304.6947746342048 seconds
Validating swa_scale=32.0
p0loss = 2.815618, p1loss = 0.438400, p0softloss = 5.318329, p1softloss = 0.711124, p0lopt = 0.000000, p0loptw = 0.000000, p0sopt = 0.822876, p0soptw = 0.301178, vloss = 0.838838, tdvloss1 = 0.396059, tdvloss2 = 0.354510, tdvloss3 = 0.335094, tdsloss = 0.025839, oloss = 0.487379, sloss = 0.400297, fploss = 0.205023, skloss = 0.121233, smloss = 0.057261, sbcdfloss = 0.091278, sbpdfloss = 0.059921, sdregloss = 0.010173, leadloss = 0.038669, vtimeloss = 0.059388, evstloss = 0.154031, esstloss = 0.045680, loss = 54.167851, pacc1 = 0.361175, vsquare = 0.227126, wsum = 160227.439941, nsamp = 160256.000000, ptentr = 0.924267, ptsoftentr = 4.970883, sekiweightscale = 6.998752, norm_normal_batch = 138089.203125, norm_normal_gamma_batch = 0.000000, norm_normal_attn_batch = 111126.960938, norm_output_batch = 580.774841, norm_noreg_batch = 16270.388672, norm_output_noreg_batch = 0.516967, nsamp_train = 34732032.000000, wsum_train = 34726402.871613
Validation swa took 306.14080227259547 seconds
Export cycle counter = 1
Skipping export model this time
GC collect
=========================================================================
BEGINNING NEXT EPOCH 3
=========================================================================
Current time: 2026-03-01 01:31:15.609222
Global step: 34732032 samples
Currently up to data row 95431473
Training dir: /root/katago_project/data/train/b32c512h16tfrs
Export dir: /root/katago_project/data/torchmodels_toexport_extra
Current grad scale: 128.0
Beginning training subepoch!
Currently up to data row 95431473
b32c512h16tfrs: nsamp=34783232, time=136.26, p0loss=2.1059, vloss=0.6743, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79764e+05, attn_norm=2.45846e+05
b32c512h16tfrs: nsamp=34834432, time=91.63, p0loss=2.1056, vloss=0.6792, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79745e+05, attn_norm=2.45850e+05
b32c512h16tfrs: nsamp=34885632, time=91.41, p0loss=2.0618, vloss=0.6681, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79779e+05, attn_norm=2.45849e+05
b32c512h16tfrs: nsamp=34936832, time=91.24, p0loss=2.0623, vloss=0.6750, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79802e+05, attn_norm=2.45849e+05
b32c512h16tfrs: nsamp=34988032, time=90.77, p0loss=2.0750, vloss=0.6744, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79805e+05, attn_norm=2.45856e+05
b32c512h16tfrs: nsamp=35039232, time=90.86, p0loss=2.0925, vloss=0.6770, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79866e+05, attn_norm=2.45871e+05
b32c512h16tfrs: nsamp=35090432, time=91.44, p0loss=2.0984, vloss=0.6682, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79867e+05, attn_norm=2.45877e+05
b32c512h16tfrs: nsamp=35141632, time=91.22, p0loss=2.0734, vloss=0.6762, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79868e+05, attn_norm=2.45859e+05
b32c512h16tfrs: nsamp=35192832, time=90.93, p0loss=2.0881, vloss=0.6775, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79904e+05, attn_norm=2.45887e+05
b32c512h16tfrs: nsamp=35244032, time=90.71, p0loss=2.0853, vloss=0.6804, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79939e+05, attn_norm=2.45903e+05
b32c512h16tfrs: nsamp=35295232, time=90.65, p0loss=2.0811, vloss=0.6781, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79986e+05, attn_norm=2.45895e+05
b32c512h16tfrs: nsamp=35346432, time=90.62, p0loss=2.1017, vloss=0.6725, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80037e+05, attn_norm=2.45898e+05
b32c512h16tfrs: nsamp=35397632, time=91.38, p0loss=2.1110, vloss=0.6667, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80027e+05, attn_norm=2.45897e+05
b32c512h16tfrs: nsamp=35448832, time=91.17, p0loss=2.0933, vloss=0.6690, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79982e+05, attn_norm=2.45905e+05
b32c512h16tfrs: nsamp=35500032, time=91.21, p0loss=2.0818, vloss=0.6725, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79959e+05, attn_norm=2.45905e+05
b32c512h16tfrs: nsamp=35551232, time=90.60, p0loss=2.0884, vloss=0.6697, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79927e+05, attn_norm=2.45906e+05
b32c512h16tfrs: nsamp=35602432, time=91.38, p0loss=2.0750, vloss=0.6724, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79832e+05, attn_norm=2.45922e+05
b32c512h16tfrs: nsamp=35653632, time=91.32, p0loss=2.0684, vloss=0.6704, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79734e+05, attn_norm=2.45937e+05
b32c512h16tfrs: nsamp=35704832, time=90.53, p0loss=2.0890, vloss=0.6763, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79704e+05, attn_norm=2.45937e+05
b32c512h16tfrs: nsamp=35756032, time=90.74, p0loss=2.0791, vloss=0.6779, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79723e+05, attn_norm=2.45938e+05
b32c512h16tfrs: nsamp=35807232, time=90.98, p0loss=2.1004, vloss=0.6762, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79781e+05, attn_norm=2.45944e+05
b32c512h16tfrs: nsamp=35858432, time=91.79, p0loss=2.0936, vloss=0.6655, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79805e+05, attn_norm=2.45947e+05
b32c512h16tfrs: nsamp=35909632, time=91.58, p0loss=2.0890, vloss=0.6687, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79826e+05, attn_norm=2.45953e+05
b32c512h16tfrs: nsamp=35960832, time=91.61, p0loss=2.0740, vloss=0.6754, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79757e+05, attn_norm=2.45947e+05
b32c512h16tfrs: nsamp=36012032, time=91.66, p0loss=2.0848, vloss=0.6736, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79762e+05, attn_norm=2.45957e+05
b32c512h16tfrs: nsamp=36063232, time=91.67, p0loss=2.0735, vloss=0.6642, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79773e+05, attn_norm=2.45968e+05
b32c512h16tfrs: nsamp=36114432, time=91.36, p0loss=2.0743, vloss=0.6671, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79830e+05, attn_norm=2.45969e+05
b32c512h16tfrs: nsamp=36165632, time=91.50, p0loss=2.0766, vloss=0.6754, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79898e+05, attn_norm=2.45976e+05
b32c512h16tfrs: nsamp=36216832, time=91.16, p0loss=2.0765, vloss=0.6777, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79937e+05, attn_norm=2.45970e+05
b32c512h16tfrs: nsamp=36268032, time=91.10, p0loss=2.0878, vloss=0.6680, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79970e+05, attn_norm=2.45975e+05
b32c512h16tfrs: nsamp=36319232, time=91.51, p0loss=2.0644, vloss=0.6734, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79992e+05, attn_norm=2.45975e+05
b32c512h16tfrs: nsamp=36370432, time=91.47, p0loss=2.0748, vloss=0.6784, pslr=2.263e-04,wdtc=1.000e+07, norm=3.79965e+05, attn_norm=2.45975e+05
b32c512h16tfrs: nsamp=36421632, time=91.18, p0loss=2.0748, vloss=0.6784, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80012e+05, attn_norm=2.45956e+05
b32c512h16tfrs: nsamp=36472832, time=90.75, p0loss=2.0810, vloss=0.6757, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80103e+05, attn_norm=2.45975e+05
b32c512h16tfrs: nsamp=36524032, time=91.95, p0loss=2.0820, vloss=0.6729, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80189e+05, attn_norm=2.45977e+05
b32c512h16tfrs: nsamp=36575232, time=91.78, p0loss=2.0806, vloss=0.6742, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80233e+05, attn_norm=2.45987e+05
b32c512h16tfrs: nsamp=36626432, time=91.65, p0loss=2.0727, vloss=0.6760, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80236e+05, attn_norm=2.46015e+05
b32c512h16tfrs: nsamp=36677632, time=91.06, p0loss=2.0844, vloss=0.6805, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80226e+05, attn_norm=2.46043e+05
b32c512h16tfrs: nsamp=36728832, time=91.25, p0loss=2.0991, vloss=0.6699, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80218e+05, attn_norm=2.46051e+05
b32c512h16tfrs: nsamp=36780032, time=91.12, p0loss=2.0707, vloss=0.6791, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80243e+05, attn_norm=2.46069e+05
b32c512h16tfrs: nsamp=36831232, time=90.93, p0loss=2.0775, vloss=0.6745, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80294e+05, attn_norm=2.46090e+05
b32c512h16tfrs: nsamp=36882432, time=91.82, p0loss=2.0664, vloss=0.6721, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80275e+05, attn_norm=2.46101e+05
b32c512h16tfrs: nsamp=36933632, time=91.75, p0loss=2.0778, vloss=0.6729, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80331e+05, attn_norm=2.46110e+05
b32c512h16tfrs: nsamp=36984832, time=90.66, p0loss=2.0880, vloss=0.6741, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80364e+05, attn_norm=2.46122e+05
b32c512h16tfrs: nsamp=37036032, time=90.66, p0loss=2.0777, vloss=0.6669, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80388e+05, attn_norm=2.46130e+05
b32c512h16tfrs: nsamp=37087232, time=90.50, p0loss=2.0819, vloss=0.6740, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80485e+05, attn_norm=2.46147e+05
b32c512h16tfrs: nsamp=37138432, time=90.65, p0loss=2.0682, vloss=0.6720, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80547e+05, attn_norm=2.46157e+05
b32c512h16tfrs: nsamp=37189632, time=92.01, p0loss=2.0853, vloss=0.6678, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80605e+05, attn_norm=2.46159e+05
b32c512h16tfrs: nsamp=37240832, time=91.95, p0loss=2.0796, vloss=0.6636, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80570e+05, attn_norm=2.46165e+05
b32c512h16tfrs: nsamp=37292032, time=90.56, p0loss=2.0713, vloss=0.6739, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80583e+05, attn_norm=2.46172e+05
b32c512h16tfrs: nsamp=37343232, time=90.68, p0loss=2.0829, vloss=0.6694, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80612e+05, attn_norm=2.46183e+05
b32c512h16tfrs: nsamp=37394432, time=90.66, p0loss=2.0827, vloss=0.6748, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80596e+05, attn_norm=2.46197e+05
b32c512h16tfrs: nsamp=37445632, time=90.82, p0loss=2.0810, vloss=0.6731, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80603e+05, attn_norm=2.46219e+05
b32c512h16tfrs: nsamp=37496832, time=91.27, p0loss=2.0805, vloss=0.6723, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80577e+05, attn_norm=2.46219e+05
b32c512h16tfrs: nsamp=37548032, time=91.09, p0loss=2.1003, vloss=0.6753, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80583e+05, attn_norm=2.46216e+05
b32c512h16tfrs: nsamp=37599232, time=91.37, p0loss=2.0926, vloss=0.6785, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80615e+05, attn_norm=2.46240e+05
b32c512h16tfrs: nsamp=37650432, time=91.70, p0loss=2.0859, vloss=0.6715, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80603e+05, attn_norm=2.46242e+05
b32c512h16tfrs: nsamp=37701632, time=90.95, p0loss=2.0765, vloss=0.6759, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80648e+05, attn_norm=2.46239e+05
b32c512h16tfrs: nsamp=37752832, time=91.30, p0loss=2.0612, vloss=0.6830, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80697e+05, attn_norm=2.46251e+05
b32c512h16tfrs: nsamp=37804032, time=91.66, p0loss=2.0696, vloss=0.6723, pslr=2.263e-04,wdtc=1.000e+07, norm=3.80668e+05, attn_norm=2.46257e+05