latin-pos-tagger / training.log
mschonhardt's picture
Add files using upload-large-folder tool
f6573e1 verified
2026-02-09 20:42:00,363 ----------------------------------------------------------------------------------------------------
2026-02-09 20:42:00,364 Model: "SequenceTagger(
(embeddings): StackedEmbeddings(
(list_embedding_0): FlairEmbeddings(
(lm): LanguageModel(
(drop): Dropout(p=0.1, inplace=False)
(encoder): Embedding(333, 200)
(rnn): LSTM(200, 2048, num_layers=2, dropout=0.1)
)
)
(list_embedding_1): FlairEmbeddings(
(lm): LanguageModel(
(drop): Dropout(p=0.1, inplace=False)
(encoder): Embedding(333, 200)
(rnn): LSTM(200, 2048, num_layers=2, dropout=0.1)
)
)
)
(dropout): Dropout(p=0.2, inplace=False)
(word_dropout): WordDropout(p=0.1)
(locked_dropout): LockedDropout(p=0.5)
(embedding2nn): Linear(in_features=4096, out_features=4096, bias=True)
(rnn): LSTM(4096, 1024, num_layers=2, batch_first=True, dropout=0.5, bidirectional=True)
(linear): Linear(in_features=2048, out_features=17, bias=True)
(loss_function): ViterbiLoss()
(crf): CRF()
)"
2026-02-09 20:42:00,364 ----------------------------------------------------------------------------------------------------
2026-02-09 20:42:00,364 Corpus: 1592291 train + 199036 dev + 199037 test sentences
2026-02-09 20:42:00,364 ----------------------------------------------------------------------------------------------------
2026-02-09 20:42:00,364 Train: 1592291 sentences
2026-02-09 20:42:00,364 (train_with_dev=False, train_with_test=False)
2026-02-09 20:42:00,364 ----------------------------------------------------------------------------------------------------
2026-02-09 20:42:00,364 Training Params:
2026-02-09 20:42:00,364 - learning_rate: "0.1"
2026-02-09 20:42:00,364 - mini_batch_size: "512"
2026-02-09 20:42:00,364 - max_epochs: "15"
2026-02-09 20:42:00,364 - shuffle: "True"
2026-02-09 20:42:00,364 ----------------------------------------------------------------------------------------------------
2026-02-09 20:42:00,364 Plugins:
2026-02-09 20:42:00,364 - AnnealOnPlateau | patience: '3', anneal_factor: '0.5', min_learning_rate: '0.0001'
2026-02-09 20:42:00,364 ----------------------------------------------------------------------------------------------------
2026-02-09 20:42:00,364 Final evaluation on model from best epoch (best-model.pt)
2026-02-09 20:42:00,364 - metric: "('micro avg', 'f1-score')"
2026-02-09 20:42:00,364 ----------------------------------------------------------------------------------------------------
2026-02-09 20:42:00,364 Computation:
2026-02-09 20:42:00,364 - compute on device: cuda:0
2026-02-09 20:42:00,364 - embedding storage: none
2026-02-09 20:42:00,364 ----------------------------------------------------------------------------------------------------
2026-02-09 20:42:00,364 Model training base path: "latin-pos-blackwell-512-new"
2026-02-09 20:42:00,364 ----------------------------------------------------------------------------------------------------
2026-02-09 20:42:00,364 ----------------------------------------------------------------------------------------------------
2026-02-09 20:47:11,561 epoch 1 - iter 311/3110 - loss 0.97672627 - time (sec): 311.20 - samples/sec: 11881.68 - lr: 0.100000 - momentum: 0.000000
2026-02-09 20:52:22,878 epoch 1 - iter 622/3110 - loss 0.70472424 - time (sec): 622.51 - samples/sec: 11871.10 - lr: 0.100000 - momentum: 0.000000
2026-02-09 20:57:35,659 epoch 1 - iter 933/3110 - loss 0.59478437 - time (sec): 935.29 - samples/sec: 11855.27 - lr: 0.100000 - momentum: 0.000000
2026-02-09 21:02:46,489 epoch 1 - iter 1244/3110 - loss 0.53187816 - time (sec): 1246.12 - samples/sec: 11869.56 - lr: 0.100000 - momentum: 0.000000
2026-02-09 21:08:00,713 epoch 1 - iter 1555/3110 - loss 0.48977648 - time (sec): 1560.35 - samples/sec: 11837.67 - lr: 0.100000 - momentum: 0.000000
2026-02-09 21:13:11,499 epoch 1 - iter 1866/3110 - loss 0.45886390 - time (sec): 1871.13 - samples/sec: 11846.60 - lr: 0.100000 - momentum: 0.000000
2026-02-09 21:18:21,879 epoch 1 - iter 2177/3110 - loss 0.43504595 - time (sec): 2181.52 - samples/sec: 11843.92 - lr: 0.100000 - momentum: 0.000000
2026-02-09 21:23:33,567 epoch 1 - iter 2488/3110 - loss 0.41570566 - time (sec): 2493.20 - samples/sec: 11841.09 - lr: 0.100000 - momentum: 0.000000
2026-02-09 21:28:50,054 epoch 1 - iter 2799/3110 - loss 0.39964918 - time (sec): 2809.69 - samples/sec: 11827.10 - lr: 0.100000 - momentum: 0.000000
2026-02-09 21:33:58,104 epoch 1 - iter 3110/3110 - loss 0.38613944 - time (sec): 3117.74 - samples/sec: 11838.67 - lr: 0.100000 - momentum: 0.000000
2026-02-09 21:33:58,104 ----------------------------------------------------------------------------------------------------
2026-02-09 21:33:58,104 EPOCH 1 done: loss 0.3861 - lr: 0.100000
2026-02-09 21:33:58,104 Saving model at current epoch since 'save_model_each_k_epochs=1' was set
2026-02-09 21:47:35,428 DEV : loss 0.17296071350574493 - f1-score (micro avg) 0.9366
2026-02-09 21:48:20,010 - 0 epochs without improvement
2026-02-09 21:48:20,010 saving best model
2026-02-09 21:48:20,286 ----------------------------------------------------------------------------------------------------
2026-02-09 21:53:46,298 epoch 2 - iter 311/3110 - loss 0.25769796 - time (sec): 326.01 - samples/sec: 11333.07 - lr: 0.100000 - momentum: 0.000000
2026-02-09 21:58:56,575 epoch 2 - iter 622/3110 - loss 0.25489217 - time (sec): 636.29 - samples/sec: 11616.92 - lr: 0.100000 - momentum: 0.000000
2026-02-09 22:04:09,970 epoch 2 - iter 933/3110 - loss 0.25238585 - time (sec): 949.68 - samples/sec: 11669.93 - lr: 0.100000 - momentum: 0.000000
2026-02-09 22:09:22,122 epoch 2 - iter 1244/3110 - loss 0.25013598 - time (sec): 1261.84 - samples/sec: 11705.91 - lr: 0.100000 - momentum: 0.000000
2026-02-09 22:14:33,607 epoch 2 - iter 1555/3110 - loss 0.24776145 - time (sec): 1573.32 - samples/sec: 11729.66 - lr: 0.100000 - momentum: 0.000000
2026-02-09 22:19:41,650 epoch 2 - iter 1866/3110 - loss 0.24570752 - time (sec): 1881.36 - samples/sec: 11770.93 - lr: 0.100000 - momentum: 0.000000
2026-02-09 22:24:52,242 epoch 2 - iter 2177/3110 - loss 0.24372514 - time (sec): 2191.96 - samples/sec: 11789.96 - lr: 0.100000 - momentum: 0.000000
2026-02-09 22:30:04,220 epoch 2 - iter 2488/3110 - loss 0.24196210 - time (sec): 2503.93 - samples/sec: 11794.32 - lr: 0.100000 - momentum: 0.000000
2026-02-09 22:35:16,579 epoch 2 - iter 2799/3110 - loss 0.24030423 - time (sec): 2816.29 - samples/sec: 11797.03 - lr: 0.100000 - momentum: 0.000000
2026-02-09 22:40:28,518 epoch 2 - iter 3110/3110 - loss 0.23866268 - time (sec): 3128.23 - samples/sec: 11798.97 - lr: 0.100000 - momentum: 0.000000
2026-02-09 22:40:28,518 ----------------------------------------------------------------------------------------------------
2026-02-09 22:40:28,518 EPOCH 2 done: loss 0.2387 - lr: 0.100000
2026-02-09 22:40:28,518 Saving model at current epoch since 'save_model_each_k_epochs=1' was set
2026-02-09 22:54:09,533 DEV : loss 0.1449769288301468 - f1-score (micro avg) 0.9447
2026-02-09 22:54:54,577 - 0 epochs without improvement
2026-02-09 22:54:54,577 saving best model
2026-02-09 22:54:54,852 ----------------------------------------------------------------------------------------------------
2026-02-09 23:00:20,296 epoch 3 - iter 311/3110 - loss 0.22093191 - time (sec): 325.44 - samples/sec: 11345.93 - lr: 0.100000 - momentum: 0.000000
2026-02-09 23:05:31,225 epoch 3 - iter 622/3110 - loss 0.22001811 - time (sec): 636.37 - samples/sec: 11611.76 - lr: 0.100000 - momentum: 0.000000
2026-02-09 23:10:46,575 epoch 3 - iter 933/3110 - loss 0.21896386 - time (sec): 951.72 - samples/sec: 11648.39 - lr: 0.100000 - momentum: 0.000000
2026-02-09 23:16:02,268 epoch 3 - iter 1244/3110 - loss 0.21796798 - time (sec): 1267.42 - samples/sec: 11667.14 - lr: 0.100000 - momentum: 0.000000
2026-02-09 23:21:11,844 epoch 3 - iter 1555/3110 - loss 0.21701712 - time (sec): 1576.99 - samples/sec: 11713.71 - lr: 0.100000 - momentum: 0.000000
2026-02-09 23:26:20,591 epoch 3 - iter 1866/3110 - loss 0.21621540 - time (sec): 1885.74 - samples/sec: 11747.47 - lr: 0.100000 - momentum: 0.000000
2026-02-09 23:31:30,262 epoch 3 - iter 2177/3110 - loss 0.21539053 - time (sec): 2195.41 - samples/sec: 11765.20 - lr: 0.100000 - momentum: 0.000000
2026-02-09 23:36:46,017 epoch 3 - iter 2488/3110 - loss 0.21459263 - time (sec): 2511.16 - samples/sec: 11760.33 - lr: 0.100000 - momentum: 0.000000
2026-02-09 23:41:54,799 epoch 3 - iter 2799/3110 - loss 0.21386919 - time (sec): 2819.95 - samples/sec: 11781.51 - lr: 0.100000 - momentum: 0.000000
2026-02-09 23:47:05,566 epoch 3 - iter 3110/3110 - loss 0.21311649 - time (sec): 3130.71 - samples/sec: 11789.61 - lr: 0.100000 - momentum: 0.000000
2026-02-09 23:47:05,566 ----------------------------------------------------------------------------------------------------
2026-02-09 23:47:05,566 EPOCH 3 done: loss 0.2131 - lr: 0.100000
2026-02-09 23:47:05,566 Saving model at current epoch since 'save_model_each_k_epochs=1' was set
2026-02-10 00:00:44,701 DEV : loss 0.13400429487228394 - f1-score (micro avg) 0.9482
2026-02-10 00:01:40,578 - 0 epochs without improvement
2026-02-10 00:01:40,578 saving best model
2026-02-10 00:01:40,862 ----------------------------------------------------------------------------------------------------
2026-02-10 00:07:00,292 epoch 4 - iter 311/3110 - loss 0.20359245 - time (sec): 319.43 - samples/sec: 11518.83 - lr: 0.100000 - momentum: 0.000000
2026-02-10 00:12:10,494 epoch 4 - iter 622/3110 - loss 0.20327470 - time (sec): 629.63 - samples/sec: 11712.32 - lr: 0.100000 - momentum: 0.000000
2026-02-10 00:17:21,319 epoch 4 - iter 933/3110 - loss 0.20288992 - time (sec): 940.46 - samples/sec: 11766.83 - lr: 0.100000 - momentum: 0.000000
2026-02-10 00:22:36,737 epoch 4 - iter 1244/3110 - loss 0.20251141 - time (sec): 1255.87 - samples/sec: 11757.95 - lr: 0.100000 - momentum: 0.000000
2026-02-10 00:27:49,894 epoch 4 - iter 1555/3110 - loss 0.20204748 - time (sec): 1569.03 - samples/sec: 11762.39 - lr: 0.100000 - momentum: 0.000000
2026-02-10 00:33:04,063 epoch 4 - iter 1866/3110 - loss 0.20165710 - time (sec): 1883.20 - samples/sec: 11763.49 - lr: 0.100000 - momentum: 0.000000
2026-02-10 00:38:13,798 epoch 4 - iter 2177/3110 - loss 0.20111870 - time (sec): 2192.94 - samples/sec: 11785.72 - lr: 0.100000 - momentum: 0.000000
2026-02-10 00:43:25,465 epoch 4 - iter 2488/3110 - loss 0.20057669 - time (sec): 2504.60 - samples/sec: 11792.44 - lr: 0.100000 - momentum: 0.000000
2026-02-10 00:48:37,475 epoch 4 - iter 2799/3110 - loss 0.20007800 - time (sec): 2816.61 - samples/sec: 11795.80 - lr: 0.100000 - momentum: 0.000000
2026-02-10 00:53:46,575 epoch 4 - iter 3110/3110 - loss 0.19957406 - time (sec): 3125.71 - samples/sec: 11808.47 - lr: 0.100000 - momentum: 0.000000
2026-02-10 00:53:46,575 ----------------------------------------------------------------------------------------------------
2026-02-10 00:53:46,575 EPOCH 4 done: loss 0.1996 - lr: 0.100000
2026-02-10 00:53:46,575 Saving model at current epoch since 'save_model_each_k_epochs=1' was set
2026-02-10 01:07:26,758 DEV : loss 0.12668992578983307 - f1-score (micro avg) 0.9506
2026-02-10 01:08:13,431 - 0 epochs without improvement
2026-02-10 01:08:13,431 saving best model
2026-02-10 01:08:13,705 ----------------------------------------------------------------------------------------------------
2026-02-10 01:13:36,317 epoch 5 - iter 311/3110 - loss 0.19385727 - time (sec): 322.61 - samples/sec: 11418.49 - lr: 0.100000 - momentum: 0.000000
2026-02-10 01:18:47,383 epoch 5 - iter 622/3110 - loss 0.19323855 - time (sec): 633.68 - samples/sec: 11642.29 - lr: 0.100000 - momentum: 0.000000
2026-02-10 01:23:57,285 epoch 5 - iter 933/3110 - loss 0.19286212 - time (sec): 943.58 - samples/sec: 11732.12 - lr: 0.100000 - momentum: 0.000000
2026-02-10 01:29:15,071 epoch 5 - iter 1244/3110 - loss 0.19249539 - time (sec): 1261.37 - samples/sec: 11714.06 - lr: 0.100000 - momentum: 0.000000
2026-02-10 01:34:25,766 epoch 5 - iter 1555/3110 - loss 0.19211545 - time (sec): 1572.06 - samples/sec: 11738.15 - lr: 0.100000 - momentum: 0.000000
2026-02-10 01:39:40,304 epoch 5 - iter 1866/3110 - loss 0.19183846 - time (sec): 1886.60 - samples/sec: 11741.42 - lr: 0.100000 - momentum: 0.000000
2026-02-10 01:44:47,347 epoch 5 - iter 2177/3110 - loss 0.19148548 - time (sec): 2193.64 - samples/sec: 11781.54 - lr: 0.100000 - momentum: 0.000000
2026-02-10 01:49:58,878 epoch 5 - iter 2488/3110 - loss 0.19117364 - time (sec): 2505.17 - samples/sec: 11790.14 - lr: 0.100000 - momentum: 0.000000
2026-02-10 01:55:10,571 epoch 5 - iter 2799/3110 - loss 0.19085881 - time (sec): 2816.87 - samples/sec: 11793.74 - lr: 0.100000 - momentum: 0.000000
2026-02-10 02:00:23,934 epoch 5 - iter 3110/3110 - loss 0.19054709 - time (sec): 3130.23 - samples/sec: 11791.44 - lr: 0.100000 - momentum: 0.000000
2026-02-10 02:00:23,934 ----------------------------------------------------------------------------------------------------
2026-02-10 02:00:23,934 EPOCH 5 done: loss 0.1905 - lr: 0.100000
2026-02-10 02:00:23,934 Saving model at current epoch since 'save_model_each_k_epochs=1' was set
2026-02-10 02:14:04,793 DEV : loss 0.12145848572254181 - f1-score (micro avg) 0.9522
2026-02-10 02:14:55,349 - 0 epochs without improvement
2026-02-10 02:14:55,350 saving best model
2026-02-10 02:14:55,628 ----------------------------------------------------------------------------------------------------
2026-02-10 02:20:15,323 epoch 6 - iter 311/3110 - loss 0.18561402 - time (sec): 319.69 - samples/sec: 11542.19 - lr: 0.100000 - momentum: 0.000000
2026-02-10 02:25:24,003 epoch 6 - iter 622/3110 - loss 0.18550625 - time (sec): 628.37 - samples/sec: 11745.30 - lr: 0.100000 - momentum: 0.000000
2026-02-10 02:30:35,998 epoch 6 - iter 933/3110 - loss 0.18540523 - time (sec): 940.37 - samples/sec: 11773.06 - lr: 0.100000 - momentum: 0.000000
2026-02-10 02:35:48,422 epoch 6 - iter 1244/3110 - loss 0.18517698 - time (sec): 1252.79 - samples/sec: 11782.44 - lr: 0.100000 - momentum: 0.000000
2026-02-10 02:40:56,350 epoch 6 - iter 1555/3110 - loss 0.18483989 - time (sec): 1560.72 - samples/sec: 11813.85 - lr: 0.100000 - momentum: 0.000000
2026-02-10 02:46:07,831 epoch 6 - iter 1866/3110 - loss 0.18454884 - time (sec): 1872.20 - samples/sec: 11824.00 - lr: 0.100000 - momentum: 0.000000
2026-02-10 02:51:21,389 epoch 6 - iter 2177/3110 - loss 0.18438092 - time (sec): 2185.76 - samples/sec: 11821.81 - lr: 0.100000 - momentum: 0.000000
2026-02-10 02:56:34,446 epoch 6 - iter 2488/3110 - loss 0.18417804 - time (sec): 2498.82 - samples/sec: 11818.73 - lr: 0.100000 - momentum: 0.000000
2026-02-10 03:01:46,304 epoch 6 - iter 2799/3110 - loss 0.18399157 - time (sec): 2810.68 - samples/sec: 11818.95 - lr: 0.100000 - momentum: 0.000000
2026-02-10 03:06:59,499 epoch 6 - iter 3110/3110 - loss 0.18381043 - time (sec): 3123.87 - samples/sec: 11815.44 - lr: 0.100000 - momentum: 0.000000
2026-02-10 03:06:59,500 ----------------------------------------------------------------------------------------------------
2026-02-10 03:06:59,500 EPOCH 6 done: loss 0.1838 - lr: 0.100000
2026-02-10 03:06:59,500 Saving model at current epoch since 'save_model_each_k_epochs=1' was set
2026-02-10 03:20:42,924 DEV : loss 0.11777843534946442 - f1-score (micro avg) 0.9534
2026-02-10 03:21:28,115 - 0 epochs without improvement
2026-02-10 03:21:28,116 saving best model
2026-02-10 03:21:28,379 ----------------------------------------------------------------------------------------------------
2026-02-10 03:26:48,806 epoch 7 - iter 311/3110 - loss 0.18054505 - time (sec): 320.43 - samples/sec: 11456.26 - lr: 0.100000 - momentum: 0.000000
2026-02-10 03:32:01,235 epoch 7 - iter 622/3110 - loss 0.18027499 - time (sec): 632.86 - samples/sec: 11629.08 - lr: 0.100000 - momentum: 0.000000
2026-02-10 03:37:17,687 epoch 7 - iter 933/3110 - loss 0.17997007 - time (sec): 949.31 - samples/sec: 11649.51 - lr: 0.100000 - momentum: 0.000000
2026-02-10 03:42:34,701 epoch 7 - iter 1244/3110 - loss 0.17983657 - time (sec): 1266.32 - samples/sec: 11671.37 - lr: 0.100000 - momentum: 0.000000
2026-02-10 03:47:43,463 epoch 7 - iter 1555/3110 - loss 0.17955909 - time (sec): 1575.08 - samples/sec: 11721.52 - lr: 0.100000 - momentum: 0.000000
2026-02-10 03:52:55,091 epoch 7 - iter 1866/3110 - loss 0.17931778 - time (sec): 1886.71 - samples/sec: 11742.29 - lr: 0.100000 - momentum: 0.000000
2026-02-10 03:58:06,861 epoch 7 - iter 2177/3110 - loss 0.17911377 - time (sec): 2198.48 - samples/sec: 11752.62 - lr: 0.100000 - momentum: 0.000000
2026-02-10 04:03:16,095 epoch 7 - iter 2488/3110 - loss 0.17888264 - time (sec): 2507.72 - samples/sec: 11772.49 - lr: 0.100000 - momentum: 0.000000
2026-02-10 04:08:29,847 epoch 7 - iter 2799/3110 - loss 0.17878271 - time (sec): 2821.47 - samples/sec: 11773.82 - lr: 0.100000 - momentum: 0.000000
2026-02-10 04:13:41,547 epoch 7 - iter 3110/3110 - loss 0.17861219 - time (sec): 3133.17 - samples/sec: 11780.38 - lr: 0.100000 - momentum: 0.000000
2026-02-10 04:13:41,547 ----------------------------------------------------------------------------------------------------
2026-02-10 04:13:41,547 EPOCH 7 done: loss 0.1786 - lr: 0.100000
2026-02-10 04:13:41,547 Saving model at current epoch since 'save_model_each_k_epochs=1' was set
2026-02-10 04:27:21,650 DEV : loss 0.11466693878173828 - f1-score (micro avg) 0.9546
2026-02-10 04:28:06,372 - 0 epochs without improvement
2026-02-10 04:28:06,373 saving best model
2026-02-10 04:28:06,639 ----------------------------------------------------------------------------------------------------
2026-02-10 04:33:30,503 epoch 8 - iter 311/3110 - loss 0.17535825 - time (sec): 323.86 - samples/sec: 11371.73 - lr: 0.100000 - momentum: 0.000000
2026-02-10 04:38:39,921 epoch 8 - iter 622/3110 - loss 0.17537665 - time (sec): 633.28 - samples/sec: 11652.93 - lr: 0.100000 - momentum: 0.000000
2026-02-10 04:43:53,959 epoch 8 - iter 933/3110 - loss 0.17530236 - time (sec): 947.32 - samples/sec: 11705.31 - lr: 0.100000 - momentum: 0.000000
2026-02-10 04:49:03,766 epoch 8 - iter 1244/3110 - loss 0.17525774 - time (sec): 1257.13 - samples/sec: 11751.52 - lr: 0.100000 - momentum: 0.000000
2026-02-10 04:54:15,092 epoch 8 - iter 1555/3110 - loss 0.17506201 - time (sec): 1568.45 - samples/sec: 11770.88 - lr: 0.100000 - momentum: 0.000000
2026-02-10 04:59:26,328 epoch 8 - iter 1866/3110 - loss 0.17492563 - time (sec): 1879.69 - samples/sec: 11781.87 - lr: 0.100000 - momentum: 0.000000
2026-02-10 05:04:40,427 epoch 8 - iter 2177/3110 - loss 0.17488939 - time (sec): 2193.79 - samples/sec: 11770.90 - lr: 0.100000 - momentum: 0.000000
2026-02-10 05:09:52,553 epoch 8 - iter 2488/3110 - loss 0.17470121 - time (sec): 2505.91 - samples/sec: 11775.16 - lr: 0.100000 - momentum: 0.000000
2026-02-10 05:15:09,166 epoch 8 - iter 2799/3110 - loss 0.17455881 - time (sec): 2822.53 - samples/sec: 11765.78 - lr: 0.100000 - momentum: 0.000000
2026-02-10 05:20:20,533 epoch 8 - iter 3110/3110 - loss 0.17443527 - time (sec): 3133.89 - samples/sec: 11777.65 - lr: 0.100000 - momentum: 0.000000
2026-02-10 05:20:20,533 ----------------------------------------------------------------------------------------------------
2026-02-10 05:20:20,533 EPOCH 8 done: loss 0.1744 - lr: 0.100000
2026-02-10 05:20:20,533 Saving model at current epoch since 'save_model_each_k_epochs=1' was set
2026-02-10 05:34:01,253 DEV : loss 0.11188799142837524 - f1-score (micro avg) 0.9555
2026-02-10 05:34:51,656 - 0 epochs without improvement
2026-02-10 05:34:51,657 saving best model
2026-02-10 05:34:51,914 ----------------------------------------------------------------------------------------------------
2026-02-10 05:40:11,246 epoch 9 - iter 311/3110 - loss 0.17197210 - time (sec): 319.33 - samples/sec: 11517.43 - lr: 0.100000 - momentum: 0.000000
2026-02-10 05:45:27,710 epoch 9 - iter 622/3110 - loss 0.17201647 - time (sec): 635.79 - samples/sec: 11615.38 - lr: 0.100000 - momentum: 0.000000
2026-02-10 05:50:39,870 epoch 9 - iter 933/3110 - loss 0.17191134 - time (sec): 947.96 - samples/sec: 11681.29 - lr: 0.100000 - momentum: 0.000000
2026-02-10 05:55:54,872 epoch 9 - iter 1244/3110 - loss 0.17178835 - time (sec): 1262.96 - samples/sec: 11689.27 - lr: 0.100000 - momentum: 0.000000
2026-02-10 06:01:06,220 epoch 9 - iter 1555/3110 - loss 0.17177079 - time (sec): 1574.30 - samples/sec: 11721.40 - lr: 0.100000 - momentum: 0.000000
2026-02-10 06:06:18,947 epoch 9 - iter 1866/3110 - loss 0.17155636 - time (sec): 1887.03 - samples/sec: 11739.13 - lr: 0.100000 - momentum: 0.000000
2026-02-10 06:11:30,307 epoch 9 - iter 2177/3110 - loss 0.17136221 - time (sec): 2198.39 - samples/sec: 11749.12 - lr: 0.100000 - momentum: 0.000000
2026-02-10 06:16:43,276 epoch 9 - iter 2488/3110 - loss 0.17123760 - time (sec): 2511.36 - samples/sec: 11754.92 - lr: 0.100000 - momentum: 0.000000
2026-02-10 06:21:57,116 epoch 9 - iter 2799/3110 - loss 0.17111893 - time (sec): 2825.20 - samples/sec: 11760.28 - lr: 0.100000 - momentum: 0.000000
2026-02-10 06:27:07,301 epoch 9 - iter 3110/3110 - loss 0.17094364 - time (sec): 3135.39 - samples/sec: 11772.04 - lr: 0.100000 - momentum: 0.000000
2026-02-10 06:27:07,301 ----------------------------------------------------------------------------------------------------
2026-02-10 06:27:07,301 EPOCH 9 done: loss 0.1709 - lr: 0.100000
2026-02-10 06:27:07,301 Saving model at current epoch since 'save_model_each_k_epochs=1' was set
2026-02-10 06:40:48,730 DEV : loss 0.11047399044036865 - f1-score (micro avg) 0.9559
2026-02-10 06:41:34,093 - 0 epochs without improvement
2026-02-10 06:41:34,093 saving best model
2026-02-10 06:41:34,366 ----------------------------------------------------------------------------------------------------
2026-02-10 06:46:55,815 epoch 10 - iter 311/3110 - loss 0.16849032 - time (sec): 321.45 - samples/sec: 11479.08 - lr: 0.100000 - momentum: 0.000000
2026-02-10 06:52:07,423 epoch 10 - iter 622/3110 - loss 0.16859678 - time (sec): 633.06 - samples/sec: 11651.64 - lr: 0.100000 - momentum: 0.000000
2026-02-10 06:57:19,799 epoch 10 - iter 933/3110 - loss 0.16850413 - time (sec): 945.43 - samples/sec: 11707.75 - lr: 0.100000 - momentum: 0.000000
2026-02-10 07:02:33,512 epoch 10 - iter 1244/3110 - loss 0.16847262 - time (sec): 1259.15 - samples/sec: 11725.73 - lr: 0.100000 - momentum: 0.000000
2026-02-10 07:07:46,786 epoch 10 - iter 1555/3110 - loss 0.16851147 - time (sec): 1572.42 - samples/sec: 11738.13 - lr: 0.100000 - momentum: 0.000000
2026-02-10 07:12:59,019 epoch 10 - iter 1866/3110 - loss 0.16834712 - time (sec): 1884.65 - samples/sec: 11751.02 - lr: 0.100000 - momentum: 0.000000
2026-02-10 07:18:12,070 epoch 10 - iter 2177/3110 - loss 0.16821811 - time (sec): 2197.70 - samples/sec: 11753.35 - lr: 0.100000 - momentum: 0.000000
2026-02-10 07:23:26,468 epoch 10 - iter 2488/3110 - loss 0.16811111 - time (sec): 2512.10 - samples/sec: 11748.17 - lr: 0.100000 - momentum: 0.000000
2026-02-10 07:28:39,734 epoch 10 - iter 2799/3110 - loss 0.16800832 - time (sec): 2825.37 - samples/sec: 11753.29 - lr: 0.100000 - momentum: 0.000000
2026-02-10 07:33:56,343 epoch 10 - iter 3110/3110 - loss 0.16793082 - time (sec): 3141.98 - samples/sec: 11747.35 - lr: 0.100000 - momentum: 0.000000
2026-02-10 07:33:56,344 ----------------------------------------------------------------------------------------------------
2026-02-10 07:33:56,344 EPOCH 10 done: loss 0.1679 - lr: 0.100000
2026-02-10 07:33:56,344 Saving model at current epoch since 'save_model_each_k_epochs=1' was set
2026-02-10 07:47:36,243 DEV : loss 0.10839281976222992 - f1-score (micro avg) 0.9568
2026-02-10 07:48:26,863 - 0 epochs without improvement
2026-02-10 07:48:26,864 saving best model
2026-02-10 07:48:27,138 ----------------------------------------------------------------------------------------------------
2026-02-10 07:53:49,401 epoch 11 - iter 311/3110 - loss 0.16592743 - time (sec): 322.26 - samples/sec: 11474.26 - lr: 0.100000 - momentum: 0.000000
2026-02-10 07:58:58,872 epoch 11 - iter 622/3110 - loss 0.16582946 - time (sec): 631.73 - samples/sec: 11698.36 - lr: 0.100000 - momentum: 0.000000
2026-02-10 08:04:11,364 epoch 11 - iter 933/3110 - loss 0.16579006 - time (sec): 944.23 - samples/sec: 11723.58 - lr: 0.100000 - momentum: 0.000000
2026-02-10 08:09:23,614 epoch 11 - iter 1244/3110 - loss 0.16562564 - time (sec): 1256.48 - samples/sec: 11749.98 - lr: 0.100000 - momentum: 0.000000
2026-02-10 08:14:33,963 epoch 11 - iter 1555/3110 - loss 0.16549630 - time (sec): 1566.83 - samples/sec: 11765.86 - lr: 0.100000 - momentum: 0.000000
2026-02-10 08:19:46,289 epoch 11 - iter 1866/3110 - loss 0.16541166 - time (sec): 1879.15 - samples/sec: 11773.80 - lr: 0.100000 - momentum: 0.000000
2026-02-10 08:24:55,577 epoch 11 - iter 2177/3110 - loss 0.16540584 - time (sec): 2188.44 - samples/sec: 11796.77 - lr: 0.100000 - momentum: 0.000000
2026-02-10 08:30:05,804 epoch 11 - iter 2488/3110 - loss 0.16537389 - time (sec): 2498.67 - samples/sec: 11811.65 - lr: 0.100000 - momentum: 0.000000
2026-02-10 08:35:16,994 epoch 11 - iter 2799/3110 - loss 0.16532171 - time (sec): 2809.86 - samples/sec: 11820.56 - lr: 0.100000 - momentum: 0.000000
2026-02-10 08:40:32,986 epoch 11 - iter 3110/3110 - loss 0.16527514 - time (sec): 3125.85 - samples/sec: 11807.96 - lr: 0.100000 - momentum: 0.000000
2026-02-10 08:40:32,986 ----------------------------------------------------------------------------------------------------
2026-02-10 08:40:32,986 EPOCH 11 done: loss 0.1653 - lr: 0.100000
2026-02-10 08:40:32,986 Saving model at current epoch since 'save_model_each_k_epochs=1' was set
2026-02-10 08:54:16,469 DEV : loss 0.10685806721448898 - f1-score (micro avg) 0.9573
2026-02-10 08:55:07,681 - 0 epochs without improvement
2026-02-10 08:55:07,681 saving best model
2026-02-10 08:55:07,953 ----------------------------------------------------------------------------------------------------
2026-02-10 09:00:27,565 epoch 12 - iter 311/3110 - loss 0.16328614 - time (sec): 319.61 - samples/sec: 11533.84 - lr: 0.100000 - momentum: 0.000000
2026-02-10 09:05:40,794 epoch 12 - iter 622/3110 - loss 0.16342406 - time (sec): 632.84 - samples/sec: 11642.47 - lr: 0.100000 - momentum: 0.000000
2026-02-10 09:10:55,803 epoch 12 - iter 933/3110 - loss 0.16349449 - time (sec): 947.85 - samples/sec: 11671.60 - lr: 0.100000 - momentum: 0.000000
2026-02-10 09:16:12,258 epoch 12 - iter 1244/3110 - loss 0.16346084 - time (sec): 1264.31 - samples/sec: 11673.56 - lr: 0.100000 - momentum: 0.000000
2026-02-10 09:21:22,127 epoch 12 - iter 1555/3110 - loss 0.16335801 - time (sec): 1574.17 - samples/sec: 11713.04 - lr: 0.100000 - momentum: 0.000000
2026-02-10 09:26:32,144 epoch 12 - iter 1866/3110 - loss 0.16326496 - time (sec): 1884.19 - samples/sec: 11742.48 - lr: 0.100000 - momentum: 0.000000
2026-02-10 09:31:46,688 epoch 12 - iter 2177/3110 - loss 0.16312106 - time (sec): 2198.74 - samples/sec: 11745.23 - lr: 0.100000 - momentum: 0.000000
2026-02-10 09:37:00,662 epoch 12 - iter 2488/3110 - loss 0.16306929 - time (sec): 2512.71 - samples/sec: 11748.20 - lr: 0.100000 - momentum: 0.000000
2026-02-10 09:42:15,995 epoch 12 - iter 2799/3110 - loss 0.16300123 - time (sec): 2828.04 - samples/sec: 11747.51 - lr: 0.100000 - momentum: 0.000000
2026-02-10 09:47:29,369 epoch 12 - iter 3110/3110 - loss 0.16296335 - time (sec): 3141.42 - samples/sec: 11749.45 - lr: 0.100000 - momentum: 0.000000
2026-02-10 09:47:29,369 ----------------------------------------------------------------------------------------------------
2026-02-10 09:47:29,369 EPOCH 12 done: loss 0.1630 - lr: 0.100000
2026-02-10 09:47:29,369 Saving model at current epoch since 'save_model_each_k_epochs=1' was set
2026-02-10 10:01:10,196 DEV : loss 0.1056423932313919 - f1-score (micro avg) 0.9577
2026-02-10 10:02:06,666 - 0 epochs without improvement
2026-02-10 10:02:06,667 saving best model
2026-02-10 10:02:06,942 ----------------------------------------------------------------------------------------------------
2026-02-10 10:07:24,444 epoch 13 - iter 311/3110 - loss 0.16116292 - time (sec): 317.50 - samples/sec: 11620.61 - lr: 0.100000 - momentum: 0.000000
2026-02-10 10:12:36,255 epoch 13 - iter 622/3110 - loss 0.16130133 - time (sec): 629.31 - samples/sec: 11719.52 - lr: 0.100000 - momentum: 0.000000
2026-02-10 10:17:50,537 epoch 13 - iter 933/3110 - loss 0.16123996 - time (sec): 943.59 - samples/sec: 11721.13 - lr: 0.100000 - momentum: 0.000000
2026-02-10 10:23:00,893 epoch 13 - iter 1244/3110 - loss 0.16119485 - time (sec): 1253.95 - samples/sec: 11763.10 - lr: 0.100000 - momentum: 0.000000
2026-02-10 10:28:14,450 epoch 13 - iter 1555/3110 - loss 0.16113836 - time (sec): 1567.51 - samples/sec: 11778.32 - lr: 0.100000 - momentum: 0.000000
2026-02-10 10:33:26,047 epoch 13 - iter 1866/3110 - loss 0.16105785 - time (sec): 1879.10 - samples/sec: 11788.86 - lr: 0.100000 - momentum: 0.000000
2026-02-10 10:38:43,781 epoch 13 - iter 2177/3110 - loss 0.16096505 - time (sec): 2196.84 - samples/sec: 11768.85 - lr: 0.100000 - momentum: 0.000000
2026-02-10 10:43:52,826 epoch 13 - iter 2488/3110 - loss 0.16091201 - time (sec): 2505.88 - samples/sec: 11790.07 - lr: 0.100000 - momentum: 0.000000
2026-02-10 10:49:01,924 epoch 13 - iter 2799/3110 - loss 0.16080766 - time (sec): 2814.98 - samples/sec: 11800.47 - lr: 0.100000 - momentum: 0.000000
2026-02-10 10:54:13,847 epoch 13 - iter 3110/3110 - loss 0.16085373 - time (sec): 3126.90 - samples/sec: 11803.98 - lr: 0.100000 - momentum: 0.000000
2026-02-10 10:54:13,847 ----------------------------------------------------------------------------------------------------
2026-02-10 10:54:13,847 EPOCH 13 done: loss 0.1609 - lr: 0.100000
2026-02-10 10:54:13,847 Saving model at current epoch since 'save_model_each_k_epochs=1' was set
2026-02-10 11:08:00,321 DEV : loss 0.10403402149677277 - f1-score (micro avg) 0.9582
2026-02-10 11:08:46,444 - 0 epochs without improvement
2026-02-10 11:08:46,445 saving best model
2026-02-10 11:08:46,719 ----------------------------------------------------------------------------------------------------
2026-02-10 11:14:11,567 epoch 14 - iter 311/3110 - loss 0.15899880 - time (sec): 324.85 - samples/sec: 11346.85 - lr: 0.100000 - momentum: 0.000000
2026-02-10 11:19:26,723 epoch 14 - iter 622/3110 - loss 0.15903770 - time (sec): 640.00 - samples/sec: 11525.21 - lr: 0.100000 - momentum: 0.000000
2026-02-10 11:24:40,356 epoch 14 - iter 933/3110 - loss 0.15912954 - time (sec): 953.64 - samples/sec: 11621.99 - lr: 0.100000 - momentum: 0.000000
2026-02-10 11:29:54,366 epoch 14 - iter 1244/3110 - loss 0.15920298 - time (sec): 1267.65 - samples/sec: 11646.64 - lr: 0.100000 - momentum: 0.000000
2026-02-10 11:35:05,167 epoch 14 - iter 1555/3110 - loss 0.15930740 - time (sec): 1578.45 - samples/sec: 11690.72 - lr: 0.100000 - momentum: 0.000000
2026-02-10 11:40:18,442 epoch 14 - iter 1866/3110 - loss 0.15922501 - time (sec): 1891.72 - samples/sec: 11708.16 - lr: 0.100000 - momentum: 0.000000
2026-02-10 11:45:27,927 epoch 14 - iter 2177/3110 - loss 0.15914178 - time (sec): 2201.21 - samples/sec: 11729.69 - lr: 0.100000 - momentum: 0.000000
2026-02-10 11:50:40,596 epoch 14 - iter 2488/3110 - loss 0.15904949 - time (sec): 2513.88 - samples/sec: 11742.39 - lr: 0.100000 - momentum: 0.000000
2026-02-10 11:55:51,043 epoch 14 - iter 2799/3110 - loss 0.15898782 - time (sec): 2824.32 - samples/sec: 11756.05 - lr: 0.100000 - momentum: 0.000000
2026-02-10 12:01:06,539 epoch 14 - iter 3110/3110 - loss 0.15897226 - time (sec): 3139.82 - samples/sec: 11755.42 - lr: 0.100000 - momentum: 0.000000
2026-02-10 12:01:06,539 ----------------------------------------------------------------------------------------------------
2026-02-10 12:01:06,539 EPOCH 14 done: loss 0.1590 - lr: 0.100000
2026-02-10 12:01:06,539 Saving model at current epoch since 'save_model_each_k_epochs=1' was set
2026-02-10 12:14:49,377 DEV : loss 0.10377830266952515 - f1-score (micro avg) 0.9583
2026-02-10 12:15:48,667 - 0 epochs without improvement
2026-02-10 12:15:48,667 saving best model
2026-02-10 12:15:48,924 ----------------------------------------------------------------------------------------------------
2026-02-10 12:21:10,377 epoch 15 - iter 311/3110 - loss 0.15731530 - time (sec): 321.45 - samples/sec: 11514.67 - lr: 0.100000 - momentum: 0.000000
2026-02-10 12:26:20,264 epoch 15 - iter 622/3110 - loss 0.15724833 - time (sec): 631.34 - samples/sec: 11717.14 - lr: 0.100000 - momentum: 0.000000
2026-02-10 12:31:32,015 epoch 15 - iter 933/3110 - loss 0.15715091 - time (sec): 943.09 - samples/sec: 11753.81 - lr: 0.100000 - momentum: 0.000000
2026-02-10 12:36:45,077 epoch 15 - iter 1244/3110 - loss 0.15731420 - time (sec): 1256.15 - samples/sec: 11771.52 - lr: 0.100000 - momentum: 0.000000
2026-02-10 12:41:58,527 epoch 15 - iter 1555/3110 - loss 0.15729791 - time (sec): 1569.60 - samples/sec: 11772.94 - lr: 0.100000 - momentum: 0.000000
2026-02-10 12:47:12,347 epoch 15 - iter 1866/3110 - loss 0.15727648 - time (sec): 1883.42 - samples/sec: 11772.49 - lr: 0.100000 - momentum: 0.000000
2026-02-10 12:52:23,659 epoch 15 - iter 2177/3110 - loss 0.15730041 - time (sec): 2194.73 - samples/sec: 11785.18 - lr: 0.100000 - momentum: 0.000000
2026-02-10 12:57:35,305 epoch 15 - iter 2488/3110 - loss 0.15723497 - time (sec): 2506.38 - samples/sec: 11785.23 - lr: 0.100000 - momentum: 0.000000
2026-02-10 13:02:47,917 epoch 15 - iter 2799/3110 - loss 0.15717980 - time (sec): 2818.99 - samples/sec: 11785.61 - lr: 0.100000 - momentum: 0.000000
2026-02-10 13:08:00,591 epoch 15 - iter 3110/3110 - loss 0.15714437 - time (sec): 3131.67 - samples/sec: 11786.02 - lr: 0.100000 - momentum: 0.000000
2026-02-10 13:08:00,592 ----------------------------------------------------------------------------------------------------
2026-02-10 13:08:00,592 EPOCH 15 done: loss 0.1571 - lr: 0.100000
2026-02-10 13:08:00,592 Saving model at current epoch since 'save_model_each_k_epochs=1' was set
2026-02-10 13:21:39,506 DEV : loss 0.10233993083238602 - f1-score (micro avg) 0.9589
2026-02-10 13:22:36,304 - 0 epochs without improvement
2026-02-10 13:22:36,304 saving best model
2026-02-10 13:22:36,812 ----------------------------------------------------------------------------------------------------
2026-02-10 13:22:36,812 Loading model from best epoch ...
2026-02-10 13:22:37,687 SequenceTagger predicts: Dictionary with 17 tags: ADV, CCONJ, ADJ, NOUN, VERB, ADP, PUNCT, NUM, PRON, PROPN, FM, PART, ORD, ITJ, X, <START>, <STOP>
2026-02-10 13:34:49,261
Results:
- F-score (micro) 0.9588
- F-score (macro) 0.9397
- Accuracy 0.9588
By class:
precision recall f1-score support
NOUN 0.9444 0.9480 0.9462 1036164
PUNCT 0.9999 1.0000 1.0000 831460
VERB 0.9657 0.9465 0.9560 810899
CCONJ 0.9833 0.9920 0.9877 463354
PRON 0.9657 0.9631 0.9644 405738
ADP 0.9786 0.9886 0.9835 296947
ADV 0.9300 0.9264 0.9282 285781
ADJ 0.8347 0.8443 0.8395 273219
PROPN 0.9428 0.9623 0.9525 128068
NUM 0.9771 0.9913 0.9842 58389
ORD 0.8362 0.9223 0.8771 8534
ITJ 0.9088 0.8821 0.8953 4554
PART 0.9509 0.9307 0.9407 3202
FM 0.9226 0.8804 0.9010 2491
accuracy 0.9588 4608800
macro avg 0.9386 0.9413 0.9397 4608800
weighted avg 0.9589 0.9588 0.9588 4608800
2026-02-10 13:34:49,261 ----------------------------------------------------------------------------------------------------