2026-02-09 20:42:00,363 ---------------------------------------------------------------------------------------------------- 2026-02-09 20:42:00,364 Model: "SequenceTagger( (embeddings): StackedEmbeddings( (list_embedding_0): FlairEmbeddings( (lm): LanguageModel( (drop): Dropout(p=0.1, inplace=False) (encoder): Embedding(333, 200) (rnn): LSTM(200, 2048, num_layers=2, dropout=0.1) ) ) (list_embedding_1): FlairEmbeddings( (lm): LanguageModel( (drop): Dropout(p=0.1, inplace=False) (encoder): Embedding(333, 200) (rnn): LSTM(200, 2048, num_layers=2, dropout=0.1) ) ) ) (dropout): Dropout(p=0.2, inplace=False) (word_dropout): WordDropout(p=0.1) (locked_dropout): LockedDropout(p=0.5) (embedding2nn): Linear(in_features=4096, out_features=4096, bias=True) (rnn): LSTM(4096, 1024, num_layers=2, batch_first=True, dropout=0.5, bidirectional=True) (linear): Linear(in_features=2048, out_features=17, bias=True) (loss_function): ViterbiLoss() (crf): CRF() )" 2026-02-09 20:42:00,364 ---------------------------------------------------------------------------------------------------- 2026-02-09 20:42:00,364 Corpus: 1592291 train + 199036 dev + 199037 test sentences 2026-02-09 20:42:00,364 ---------------------------------------------------------------------------------------------------- 2026-02-09 20:42:00,364 Train: 1592291 sentences 2026-02-09 20:42:00,364 (train_with_dev=False, train_with_test=False) 2026-02-09 20:42:00,364 ---------------------------------------------------------------------------------------------------- 2026-02-09 20:42:00,364 Training Params: 2026-02-09 20:42:00,364 - learning_rate: "0.1" 2026-02-09 20:42:00,364 - mini_batch_size: "512" 2026-02-09 20:42:00,364 - max_epochs: "15" 2026-02-09 20:42:00,364 - shuffle: "True" 2026-02-09 20:42:00,364 ---------------------------------------------------------------------------------------------------- 2026-02-09 20:42:00,364 Plugins: 2026-02-09 20:42:00,364 - AnnealOnPlateau | patience: '3', anneal_factor: '0.5', min_learning_rate: '0.0001' 2026-02-09 20:42:00,364 ---------------------------------------------------------------------------------------------------- 2026-02-09 20:42:00,364 Final evaluation on model from best epoch (best-model.pt) 2026-02-09 20:42:00,364 - metric: "('micro avg', 'f1-score')" 2026-02-09 20:42:00,364 ---------------------------------------------------------------------------------------------------- 2026-02-09 20:42:00,364 Computation: 2026-02-09 20:42:00,364 - compute on device: cuda:0 2026-02-09 20:42:00,364 - embedding storage: none 2026-02-09 20:42:00,364 ---------------------------------------------------------------------------------------------------- 2026-02-09 20:42:00,364 Model training base path: "latin-pos-blackwell-512-new" 2026-02-09 20:42:00,364 ---------------------------------------------------------------------------------------------------- 2026-02-09 20:42:00,364 ---------------------------------------------------------------------------------------------------- 2026-02-09 20:47:11,561 epoch 1 - iter 311/3110 - loss 0.97672627 - time (sec): 311.20 - samples/sec: 11881.68 - lr: 0.100000 - momentum: 0.000000 2026-02-09 20:52:22,878 epoch 1 - iter 622/3110 - loss 0.70472424 - time (sec): 622.51 - samples/sec: 11871.10 - lr: 0.100000 - momentum: 0.000000 2026-02-09 20:57:35,659 epoch 1 - iter 933/3110 - loss 0.59478437 - time (sec): 935.29 - samples/sec: 11855.27 - lr: 0.100000 - momentum: 0.000000 2026-02-09 21:02:46,489 epoch 1 - iter 1244/3110 - loss 0.53187816 - time (sec): 1246.12 - samples/sec: 11869.56 - lr: 0.100000 - momentum: 0.000000 2026-02-09 21:08:00,713 epoch 1 - iter 1555/3110 - loss 0.48977648 - time (sec): 1560.35 - samples/sec: 11837.67 - lr: 0.100000 - momentum: 0.000000 2026-02-09 21:13:11,499 epoch 1 - iter 1866/3110 - loss 0.45886390 - time (sec): 1871.13 - samples/sec: 11846.60 - lr: 0.100000 - momentum: 0.000000 2026-02-09 21:18:21,879 epoch 1 - iter 2177/3110 - loss 0.43504595 - time (sec): 2181.52 - samples/sec: 11843.92 - lr: 0.100000 - momentum: 0.000000 2026-02-09 21:23:33,567 epoch 1 - iter 2488/3110 - loss 0.41570566 - time (sec): 2493.20 - samples/sec: 11841.09 - lr: 0.100000 - momentum: 0.000000 2026-02-09 21:28:50,054 epoch 1 - iter 2799/3110 - loss 0.39964918 - time (sec): 2809.69 - samples/sec: 11827.10 - lr: 0.100000 - momentum: 0.000000 2026-02-09 21:33:58,104 epoch 1 - iter 3110/3110 - loss 0.38613944 - time (sec): 3117.74 - samples/sec: 11838.67 - lr: 0.100000 - momentum: 0.000000 2026-02-09 21:33:58,104 ---------------------------------------------------------------------------------------------------- 2026-02-09 21:33:58,104 EPOCH 1 done: loss 0.3861 - lr: 0.100000 2026-02-09 21:33:58,104 Saving model at current epoch since 'save_model_each_k_epochs=1' was set 2026-02-09 21:47:35,428 DEV : loss 0.17296071350574493 - f1-score (micro avg) 0.9366 2026-02-09 21:48:20,010 - 0 epochs without improvement 2026-02-09 21:48:20,010 saving best model 2026-02-09 21:48:20,286 ---------------------------------------------------------------------------------------------------- 2026-02-09 21:53:46,298 epoch 2 - iter 311/3110 - loss 0.25769796 - time (sec): 326.01 - samples/sec: 11333.07 - lr: 0.100000 - momentum: 0.000000 2026-02-09 21:58:56,575 epoch 2 - iter 622/3110 - loss 0.25489217 - time (sec): 636.29 - samples/sec: 11616.92 - lr: 0.100000 - momentum: 0.000000 2026-02-09 22:04:09,970 epoch 2 - iter 933/3110 - loss 0.25238585 - time (sec): 949.68 - samples/sec: 11669.93 - lr: 0.100000 - momentum: 0.000000 2026-02-09 22:09:22,122 epoch 2 - iter 1244/3110 - loss 0.25013598 - time (sec): 1261.84 - samples/sec: 11705.91 - lr: 0.100000 - momentum: 0.000000 2026-02-09 22:14:33,607 epoch 2 - iter 1555/3110 - loss 0.24776145 - time (sec): 1573.32 - samples/sec: 11729.66 - lr: 0.100000 - momentum: 0.000000 2026-02-09 22:19:41,650 epoch 2 - iter 1866/3110 - loss 0.24570752 - time (sec): 1881.36 - samples/sec: 11770.93 - lr: 0.100000 - momentum: 0.000000 2026-02-09 22:24:52,242 epoch 2 - iter 2177/3110 - loss 0.24372514 - time (sec): 2191.96 - samples/sec: 11789.96 - lr: 0.100000 - momentum: 0.000000 2026-02-09 22:30:04,220 epoch 2 - iter 2488/3110 - loss 0.24196210 - time (sec): 2503.93 - samples/sec: 11794.32 - lr: 0.100000 - momentum: 0.000000 2026-02-09 22:35:16,579 epoch 2 - iter 2799/3110 - loss 0.24030423 - time (sec): 2816.29 - samples/sec: 11797.03 - lr: 0.100000 - momentum: 0.000000 2026-02-09 22:40:28,518 epoch 2 - iter 3110/3110 - loss 0.23866268 - time (sec): 3128.23 - samples/sec: 11798.97 - lr: 0.100000 - momentum: 0.000000 2026-02-09 22:40:28,518 ---------------------------------------------------------------------------------------------------- 2026-02-09 22:40:28,518 EPOCH 2 done: loss 0.2387 - lr: 0.100000 2026-02-09 22:40:28,518 Saving model at current epoch since 'save_model_each_k_epochs=1' was set 2026-02-09 22:54:09,533 DEV : loss 0.1449769288301468 - f1-score (micro avg) 0.9447 2026-02-09 22:54:54,577 - 0 epochs without improvement 2026-02-09 22:54:54,577 saving best model 2026-02-09 22:54:54,852 ---------------------------------------------------------------------------------------------------- 2026-02-09 23:00:20,296 epoch 3 - iter 311/3110 - loss 0.22093191 - time (sec): 325.44 - samples/sec: 11345.93 - lr: 0.100000 - momentum: 0.000000 2026-02-09 23:05:31,225 epoch 3 - iter 622/3110 - loss 0.22001811 - time (sec): 636.37 - samples/sec: 11611.76 - lr: 0.100000 - momentum: 0.000000 2026-02-09 23:10:46,575 epoch 3 - iter 933/3110 - loss 0.21896386 - time (sec): 951.72 - samples/sec: 11648.39 - lr: 0.100000 - momentum: 0.000000 2026-02-09 23:16:02,268 epoch 3 - iter 1244/3110 - loss 0.21796798 - time (sec): 1267.42 - samples/sec: 11667.14 - lr: 0.100000 - momentum: 0.000000 2026-02-09 23:21:11,844 epoch 3 - iter 1555/3110 - loss 0.21701712 - time (sec): 1576.99 - samples/sec: 11713.71 - lr: 0.100000 - momentum: 0.000000 2026-02-09 23:26:20,591 epoch 3 - iter 1866/3110 - loss 0.21621540 - time (sec): 1885.74 - samples/sec: 11747.47 - lr: 0.100000 - momentum: 0.000000 2026-02-09 23:31:30,262 epoch 3 - iter 2177/3110 - loss 0.21539053 - time (sec): 2195.41 - samples/sec: 11765.20 - lr: 0.100000 - momentum: 0.000000 2026-02-09 23:36:46,017 epoch 3 - iter 2488/3110 - loss 0.21459263 - time (sec): 2511.16 - samples/sec: 11760.33 - lr: 0.100000 - momentum: 0.000000 2026-02-09 23:41:54,799 epoch 3 - iter 2799/3110 - loss 0.21386919 - time (sec): 2819.95 - samples/sec: 11781.51 - lr: 0.100000 - momentum: 0.000000 2026-02-09 23:47:05,566 epoch 3 - iter 3110/3110 - loss 0.21311649 - time (sec): 3130.71 - samples/sec: 11789.61 - lr: 0.100000 - momentum: 0.000000 2026-02-09 23:47:05,566 ---------------------------------------------------------------------------------------------------- 2026-02-09 23:47:05,566 EPOCH 3 done: loss 0.2131 - lr: 0.100000 2026-02-09 23:47:05,566 Saving model at current epoch since 'save_model_each_k_epochs=1' was set 2026-02-10 00:00:44,701 DEV : loss 0.13400429487228394 - f1-score (micro avg) 0.9482 2026-02-10 00:01:40,578 - 0 epochs without improvement 2026-02-10 00:01:40,578 saving best model 2026-02-10 00:01:40,862 ---------------------------------------------------------------------------------------------------- 2026-02-10 00:07:00,292 epoch 4 - iter 311/3110 - loss 0.20359245 - time (sec): 319.43 - samples/sec: 11518.83 - lr: 0.100000 - momentum: 0.000000 2026-02-10 00:12:10,494 epoch 4 - iter 622/3110 - loss 0.20327470 - time (sec): 629.63 - samples/sec: 11712.32 - lr: 0.100000 - momentum: 0.000000 2026-02-10 00:17:21,319 epoch 4 - iter 933/3110 - loss 0.20288992 - time (sec): 940.46 - samples/sec: 11766.83 - lr: 0.100000 - momentum: 0.000000 2026-02-10 00:22:36,737 epoch 4 - iter 1244/3110 - loss 0.20251141 - time (sec): 1255.87 - samples/sec: 11757.95 - lr: 0.100000 - momentum: 0.000000 2026-02-10 00:27:49,894 epoch 4 - iter 1555/3110 - loss 0.20204748 - time (sec): 1569.03 - samples/sec: 11762.39 - lr: 0.100000 - momentum: 0.000000 2026-02-10 00:33:04,063 epoch 4 - iter 1866/3110 - loss 0.20165710 - time (sec): 1883.20 - samples/sec: 11763.49 - lr: 0.100000 - momentum: 0.000000 2026-02-10 00:38:13,798 epoch 4 - iter 2177/3110 - loss 0.20111870 - time (sec): 2192.94 - samples/sec: 11785.72 - lr: 0.100000 - momentum: 0.000000 2026-02-10 00:43:25,465 epoch 4 - iter 2488/3110 - loss 0.20057669 - time (sec): 2504.60 - samples/sec: 11792.44 - lr: 0.100000 - momentum: 0.000000 2026-02-10 00:48:37,475 epoch 4 - iter 2799/3110 - loss 0.20007800 - time (sec): 2816.61 - samples/sec: 11795.80 - lr: 0.100000 - momentum: 0.000000 2026-02-10 00:53:46,575 epoch 4 - iter 3110/3110 - loss 0.19957406 - time (sec): 3125.71 - samples/sec: 11808.47 - lr: 0.100000 - momentum: 0.000000 2026-02-10 00:53:46,575 ---------------------------------------------------------------------------------------------------- 2026-02-10 00:53:46,575 EPOCH 4 done: loss 0.1996 - lr: 0.100000 2026-02-10 00:53:46,575 Saving model at current epoch since 'save_model_each_k_epochs=1' was set 2026-02-10 01:07:26,758 DEV : loss 0.12668992578983307 - f1-score (micro avg) 0.9506 2026-02-10 01:08:13,431 - 0 epochs without improvement 2026-02-10 01:08:13,431 saving best model 2026-02-10 01:08:13,705 ---------------------------------------------------------------------------------------------------- 2026-02-10 01:13:36,317 epoch 5 - iter 311/3110 - loss 0.19385727 - time (sec): 322.61 - samples/sec: 11418.49 - lr: 0.100000 - momentum: 0.000000 2026-02-10 01:18:47,383 epoch 5 - iter 622/3110 - loss 0.19323855 - time (sec): 633.68 - samples/sec: 11642.29 - lr: 0.100000 - momentum: 0.000000 2026-02-10 01:23:57,285 epoch 5 - iter 933/3110 - loss 0.19286212 - time (sec): 943.58 - samples/sec: 11732.12 - lr: 0.100000 - momentum: 0.000000 2026-02-10 01:29:15,071 epoch 5 - iter 1244/3110 - loss 0.19249539 - time (sec): 1261.37 - samples/sec: 11714.06 - lr: 0.100000 - momentum: 0.000000 2026-02-10 01:34:25,766 epoch 5 - iter 1555/3110 - loss 0.19211545 - time (sec): 1572.06 - samples/sec: 11738.15 - lr: 0.100000 - momentum: 0.000000 2026-02-10 01:39:40,304 epoch 5 - iter 1866/3110 - loss 0.19183846 - time (sec): 1886.60 - samples/sec: 11741.42 - lr: 0.100000 - momentum: 0.000000 2026-02-10 01:44:47,347 epoch 5 - iter 2177/3110 - loss 0.19148548 - time (sec): 2193.64 - samples/sec: 11781.54 - lr: 0.100000 - momentum: 0.000000 2026-02-10 01:49:58,878 epoch 5 - iter 2488/3110 - loss 0.19117364 - time (sec): 2505.17 - samples/sec: 11790.14 - lr: 0.100000 - momentum: 0.000000 2026-02-10 01:55:10,571 epoch 5 - iter 2799/3110 - loss 0.19085881 - time (sec): 2816.87 - samples/sec: 11793.74 - lr: 0.100000 - momentum: 0.000000 2026-02-10 02:00:23,934 epoch 5 - iter 3110/3110 - loss 0.19054709 - time (sec): 3130.23 - samples/sec: 11791.44 - lr: 0.100000 - momentum: 0.000000 2026-02-10 02:00:23,934 ---------------------------------------------------------------------------------------------------- 2026-02-10 02:00:23,934 EPOCH 5 done: loss 0.1905 - lr: 0.100000 2026-02-10 02:00:23,934 Saving model at current epoch since 'save_model_each_k_epochs=1' was set 2026-02-10 02:14:04,793 DEV : loss 0.12145848572254181 - f1-score (micro avg) 0.9522 2026-02-10 02:14:55,349 - 0 epochs without improvement 2026-02-10 02:14:55,350 saving best model 2026-02-10 02:14:55,628 ---------------------------------------------------------------------------------------------------- 2026-02-10 02:20:15,323 epoch 6 - iter 311/3110 - loss 0.18561402 - time (sec): 319.69 - samples/sec: 11542.19 - lr: 0.100000 - momentum: 0.000000 2026-02-10 02:25:24,003 epoch 6 - iter 622/3110 - loss 0.18550625 - time (sec): 628.37 - samples/sec: 11745.30 - lr: 0.100000 - momentum: 0.000000 2026-02-10 02:30:35,998 epoch 6 - iter 933/3110 - loss 0.18540523 - time (sec): 940.37 - samples/sec: 11773.06 - lr: 0.100000 - momentum: 0.000000 2026-02-10 02:35:48,422 epoch 6 - iter 1244/3110 - loss 0.18517698 - time (sec): 1252.79 - samples/sec: 11782.44 - lr: 0.100000 - momentum: 0.000000 2026-02-10 02:40:56,350 epoch 6 - iter 1555/3110 - loss 0.18483989 - time (sec): 1560.72 - samples/sec: 11813.85 - lr: 0.100000 - momentum: 0.000000 2026-02-10 02:46:07,831 epoch 6 - iter 1866/3110 - loss 0.18454884 - time (sec): 1872.20 - samples/sec: 11824.00 - lr: 0.100000 - momentum: 0.000000 2026-02-10 02:51:21,389 epoch 6 - iter 2177/3110 - loss 0.18438092 - time (sec): 2185.76 - samples/sec: 11821.81 - lr: 0.100000 - momentum: 0.000000 2026-02-10 02:56:34,446 epoch 6 - iter 2488/3110 - loss 0.18417804 - time (sec): 2498.82 - samples/sec: 11818.73 - lr: 0.100000 - momentum: 0.000000 2026-02-10 03:01:46,304 epoch 6 - iter 2799/3110 - loss 0.18399157 - time (sec): 2810.68 - samples/sec: 11818.95 - lr: 0.100000 - momentum: 0.000000 2026-02-10 03:06:59,499 epoch 6 - iter 3110/3110 - loss 0.18381043 - time (sec): 3123.87 - samples/sec: 11815.44 - lr: 0.100000 - momentum: 0.000000 2026-02-10 03:06:59,500 ---------------------------------------------------------------------------------------------------- 2026-02-10 03:06:59,500 EPOCH 6 done: loss 0.1838 - lr: 0.100000 2026-02-10 03:06:59,500 Saving model at current epoch since 'save_model_each_k_epochs=1' was set 2026-02-10 03:20:42,924 DEV : loss 0.11777843534946442 - f1-score (micro avg) 0.9534 2026-02-10 03:21:28,115 - 0 epochs without improvement 2026-02-10 03:21:28,116 saving best model 2026-02-10 03:21:28,379 ---------------------------------------------------------------------------------------------------- 2026-02-10 03:26:48,806 epoch 7 - iter 311/3110 - loss 0.18054505 - time (sec): 320.43 - samples/sec: 11456.26 - lr: 0.100000 - momentum: 0.000000 2026-02-10 03:32:01,235 epoch 7 - iter 622/3110 - loss 0.18027499 - time (sec): 632.86 - samples/sec: 11629.08 - lr: 0.100000 - momentum: 0.000000 2026-02-10 03:37:17,687 epoch 7 - iter 933/3110 - loss 0.17997007 - time (sec): 949.31 - samples/sec: 11649.51 - lr: 0.100000 - momentum: 0.000000 2026-02-10 03:42:34,701 epoch 7 - iter 1244/3110 - loss 0.17983657 - time (sec): 1266.32 - samples/sec: 11671.37 - lr: 0.100000 - momentum: 0.000000 2026-02-10 03:47:43,463 epoch 7 - iter 1555/3110 - loss 0.17955909 - time (sec): 1575.08 - samples/sec: 11721.52 - lr: 0.100000 - momentum: 0.000000 2026-02-10 03:52:55,091 epoch 7 - iter 1866/3110 - loss 0.17931778 - time (sec): 1886.71 - samples/sec: 11742.29 - lr: 0.100000 - momentum: 0.000000 2026-02-10 03:58:06,861 epoch 7 - iter 2177/3110 - loss 0.17911377 - time (sec): 2198.48 - samples/sec: 11752.62 - lr: 0.100000 - momentum: 0.000000 2026-02-10 04:03:16,095 epoch 7 - iter 2488/3110 - loss 0.17888264 - time (sec): 2507.72 - samples/sec: 11772.49 - lr: 0.100000 - momentum: 0.000000 2026-02-10 04:08:29,847 epoch 7 - iter 2799/3110 - loss 0.17878271 - time (sec): 2821.47 - samples/sec: 11773.82 - lr: 0.100000 - momentum: 0.000000 2026-02-10 04:13:41,547 epoch 7 - iter 3110/3110 - loss 0.17861219 - time (sec): 3133.17 - samples/sec: 11780.38 - lr: 0.100000 - momentum: 0.000000 2026-02-10 04:13:41,547 ---------------------------------------------------------------------------------------------------- 2026-02-10 04:13:41,547 EPOCH 7 done: loss 0.1786 - lr: 0.100000 2026-02-10 04:13:41,547 Saving model at current epoch since 'save_model_each_k_epochs=1' was set 2026-02-10 04:27:21,650 DEV : loss 0.11466693878173828 - f1-score (micro avg) 0.9546 2026-02-10 04:28:06,372 - 0 epochs without improvement 2026-02-10 04:28:06,373 saving best model 2026-02-10 04:28:06,639 ---------------------------------------------------------------------------------------------------- 2026-02-10 04:33:30,503 epoch 8 - iter 311/3110 - loss 0.17535825 - time (sec): 323.86 - samples/sec: 11371.73 - lr: 0.100000 - momentum: 0.000000 2026-02-10 04:38:39,921 epoch 8 - iter 622/3110 - loss 0.17537665 - time (sec): 633.28 - samples/sec: 11652.93 - lr: 0.100000 - momentum: 0.000000 2026-02-10 04:43:53,959 epoch 8 - iter 933/3110 - loss 0.17530236 - time (sec): 947.32 - samples/sec: 11705.31 - lr: 0.100000 - momentum: 0.000000 2026-02-10 04:49:03,766 epoch 8 - iter 1244/3110 - loss 0.17525774 - time (sec): 1257.13 - samples/sec: 11751.52 - lr: 0.100000 - momentum: 0.000000 2026-02-10 04:54:15,092 epoch 8 - iter 1555/3110 - loss 0.17506201 - time (sec): 1568.45 - samples/sec: 11770.88 - lr: 0.100000 - momentum: 0.000000 2026-02-10 04:59:26,328 epoch 8 - iter 1866/3110 - loss 0.17492563 - time (sec): 1879.69 - samples/sec: 11781.87 - lr: 0.100000 - momentum: 0.000000 2026-02-10 05:04:40,427 epoch 8 - iter 2177/3110 - loss 0.17488939 - time (sec): 2193.79 - samples/sec: 11770.90 - lr: 0.100000 - momentum: 0.000000 2026-02-10 05:09:52,553 epoch 8 - iter 2488/3110 - loss 0.17470121 - time (sec): 2505.91 - samples/sec: 11775.16 - lr: 0.100000 - momentum: 0.000000 2026-02-10 05:15:09,166 epoch 8 - iter 2799/3110 - loss 0.17455881 - time (sec): 2822.53 - samples/sec: 11765.78 - lr: 0.100000 - momentum: 0.000000 2026-02-10 05:20:20,533 epoch 8 - iter 3110/3110 - loss 0.17443527 - time (sec): 3133.89 - samples/sec: 11777.65 - lr: 0.100000 - momentum: 0.000000 2026-02-10 05:20:20,533 ---------------------------------------------------------------------------------------------------- 2026-02-10 05:20:20,533 EPOCH 8 done: loss 0.1744 - lr: 0.100000 2026-02-10 05:20:20,533 Saving model at current epoch since 'save_model_each_k_epochs=1' was set 2026-02-10 05:34:01,253 DEV : loss 0.11188799142837524 - f1-score (micro avg) 0.9555 2026-02-10 05:34:51,656 - 0 epochs without improvement 2026-02-10 05:34:51,657 saving best model 2026-02-10 05:34:51,914 ---------------------------------------------------------------------------------------------------- 2026-02-10 05:40:11,246 epoch 9 - iter 311/3110 - loss 0.17197210 - time (sec): 319.33 - samples/sec: 11517.43 - lr: 0.100000 - momentum: 0.000000 2026-02-10 05:45:27,710 epoch 9 - iter 622/3110 - loss 0.17201647 - time (sec): 635.79 - samples/sec: 11615.38 - lr: 0.100000 - momentum: 0.000000 2026-02-10 05:50:39,870 epoch 9 - iter 933/3110 - loss 0.17191134 - time (sec): 947.96 - samples/sec: 11681.29 - lr: 0.100000 - momentum: 0.000000 2026-02-10 05:55:54,872 epoch 9 - iter 1244/3110 - loss 0.17178835 - time (sec): 1262.96 - samples/sec: 11689.27 - lr: 0.100000 - momentum: 0.000000 2026-02-10 06:01:06,220 epoch 9 - iter 1555/3110 - loss 0.17177079 - time (sec): 1574.30 - samples/sec: 11721.40 - lr: 0.100000 - momentum: 0.000000 2026-02-10 06:06:18,947 epoch 9 - iter 1866/3110 - loss 0.17155636 - time (sec): 1887.03 - samples/sec: 11739.13 - lr: 0.100000 - momentum: 0.000000 2026-02-10 06:11:30,307 epoch 9 - iter 2177/3110 - loss 0.17136221 - time (sec): 2198.39 - samples/sec: 11749.12 - lr: 0.100000 - momentum: 0.000000 2026-02-10 06:16:43,276 epoch 9 - iter 2488/3110 - loss 0.17123760 - time (sec): 2511.36 - samples/sec: 11754.92 - lr: 0.100000 - momentum: 0.000000 2026-02-10 06:21:57,116 epoch 9 - iter 2799/3110 - loss 0.17111893 - time (sec): 2825.20 - samples/sec: 11760.28 - lr: 0.100000 - momentum: 0.000000 2026-02-10 06:27:07,301 epoch 9 - iter 3110/3110 - loss 0.17094364 - time (sec): 3135.39 - samples/sec: 11772.04 - lr: 0.100000 - momentum: 0.000000 2026-02-10 06:27:07,301 ---------------------------------------------------------------------------------------------------- 2026-02-10 06:27:07,301 EPOCH 9 done: loss 0.1709 - lr: 0.100000 2026-02-10 06:27:07,301 Saving model at current epoch since 'save_model_each_k_epochs=1' was set 2026-02-10 06:40:48,730 DEV : loss 0.11047399044036865 - f1-score (micro avg) 0.9559 2026-02-10 06:41:34,093 - 0 epochs without improvement 2026-02-10 06:41:34,093 saving best model 2026-02-10 06:41:34,366 ---------------------------------------------------------------------------------------------------- 2026-02-10 06:46:55,815 epoch 10 - iter 311/3110 - loss 0.16849032 - time (sec): 321.45 - samples/sec: 11479.08 - lr: 0.100000 - momentum: 0.000000 2026-02-10 06:52:07,423 epoch 10 - iter 622/3110 - loss 0.16859678 - time (sec): 633.06 - samples/sec: 11651.64 - lr: 0.100000 - momentum: 0.000000 2026-02-10 06:57:19,799 epoch 10 - iter 933/3110 - loss 0.16850413 - time (sec): 945.43 - samples/sec: 11707.75 - lr: 0.100000 - momentum: 0.000000 2026-02-10 07:02:33,512 epoch 10 - iter 1244/3110 - loss 0.16847262 - time (sec): 1259.15 - samples/sec: 11725.73 - lr: 0.100000 - momentum: 0.000000 2026-02-10 07:07:46,786 epoch 10 - iter 1555/3110 - loss 0.16851147 - time (sec): 1572.42 - samples/sec: 11738.13 - lr: 0.100000 - momentum: 0.000000 2026-02-10 07:12:59,019 epoch 10 - iter 1866/3110 - loss 0.16834712 - time (sec): 1884.65 - samples/sec: 11751.02 - lr: 0.100000 - momentum: 0.000000 2026-02-10 07:18:12,070 epoch 10 - iter 2177/3110 - loss 0.16821811 - time (sec): 2197.70 - samples/sec: 11753.35 - lr: 0.100000 - momentum: 0.000000 2026-02-10 07:23:26,468 epoch 10 - iter 2488/3110 - loss 0.16811111 - time (sec): 2512.10 - samples/sec: 11748.17 - lr: 0.100000 - momentum: 0.000000 2026-02-10 07:28:39,734 epoch 10 - iter 2799/3110 - loss 0.16800832 - time (sec): 2825.37 - samples/sec: 11753.29 - lr: 0.100000 - momentum: 0.000000 2026-02-10 07:33:56,343 epoch 10 - iter 3110/3110 - loss 0.16793082 - time (sec): 3141.98 - samples/sec: 11747.35 - lr: 0.100000 - momentum: 0.000000 2026-02-10 07:33:56,344 ---------------------------------------------------------------------------------------------------- 2026-02-10 07:33:56,344 EPOCH 10 done: loss 0.1679 - lr: 0.100000 2026-02-10 07:33:56,344 Saving model at current epoch since 'save_model_each_k_epochs=1' was set 2026-02-10 07:47:36,243 DEV : loss 0.10839281976222992 - f1-score (micro avg) 0.9568 2026-02-10 07:48:26,863 - 0 epochs without improvement 2026-02-10 07:48:26,864 saving best model 2026-02-10 07:48:27,138 ---------------------------------------------------------------------------------------------------- 2026-02-10 07:53:49,401 epoch 11 - iter 311/3110 - loss 0.16592743 - time (sec): 322.26 - samples/sec: 11474.26 - lr: 0.100000 - momentum: 0.000000 2026-02-10 07:58:58,872 epoch 11 - iter 622/3110 - loss 0.16582946 - time (sec): 631.73 - samples/sec: 11698.36 - lr: 0.100000 - momentum: 0.000000 2026-02-10 08:04:11,364 epoch 11 - iter 933/3110 - loss 0.16579006 - time (sec): 944.23 - samples/sec: 11723.58 - lr: 0.100000 - momentum: 0.000000 2026-02-10 08:09:23,614 epoch 11 - iter 1244/3110 - loss 0.16562564 - time (sec): 1256.48 - samples/sec: 11749.98 - lr: 0.100000 - momentum: 0.000000 2026-02-10 08:14:33,963 epoch 11 - iter 1555/3110 - loss 0.16549630 - time (sec): 1566.83 - samples/sec: 11765.86 - lr: 0.100000 - momentum: 0.000000 2026-02-10 08:19:46,289 epoch 11 - iter 1866/3110 - loss 0.16541166 - time (sec): 1879.15 - samples/sec: 11773.80 - lr: 0.100000 - momentum: 0.000000 2026-02-10 08:24:55,577 epoch 11 - iter 2177/3110 - loss 0.16540584 - time (sec): 2188.44 - samples/sec: 11796.77 - lr: 0.100000 - momentum: 0.000000 2026-02-10 08:30:05,804 epoch 11 - iter 2488/3110 - loss 0.16537389 - time (sec): 2498.67 - samples/sec: 11811.65 - lr: 0.100000 - momentum: 0.000000 2026-02-10 08:35:16,994 epoch 11 - iter 2799/3110 - loss 0.16532171 - time (sec): 2809.86 - samples/sec: 11820.56 - lr: 0.100000 - momentum: 0.000000 2026-02-10 08:40:32,986 epoch 11 - iter 3110/3110 - loss 0.16527514 - time (sec): 3125.85 - samples/sec: 11807.96 - lr: 0.100000 - momentum: 0.000000 2026-02-10 08:40:32,986 ---------------------------------------------------------------------------------------------------- 2026-02-10 08:40:32,986 EPOCH 11 done: loss 0.1653 - lr: 0.100000 2026-02-10 08:40:32,986 Saving model at current epoch since 'save_model_each_k_epochs=1' was set 2026-02-10 08:54:16,469 DEV : loss 0.10685806721448898 - f1-score (micro avg) 0.9573 2026-02-10 08:55:07,681 - 0 epochs without improvement 2026-02-10 08:55:07,681 saving best model 2026-02-10 08:55:07,953 ---------------------------------------------------------------------------------------------------- 2026-02-10 09:00:27,565 epoch 12 - iter 311/3110 - loss 0.16328614 - time (sec): 319.61 - samples/sec: 11533.84 - lr: 0.100000 - momentum: 0.000000 2026-02-10 09:05:40,794 epoch 12 - iter 622/3110 - loss 0.16342406 - time (sec): 632.84 - samples/sec: 11642.47 - lr: 0.100000 - momentum: 0.000000 2026-02-10 09:10:55,803 epoch 12 - iter 933/3110 - loss 0.16349449 - time (sec): 947.85 - samples/sec: 11671.60 - lr: 0.100000 - momentum: 0.000000 2026-02-10 09:16:12,258 epoch 12 - iter 1244/3110 - loss 0.16346084 - time (sec): 1264.31 - samples/sec: 11673.56 - lr: 0.100000 - momentum: 0.000000 2026-02-10 09:21:22,127 epoch 12 - iter 1555/3110 - loss 0.16335801 - time (sec): 1574.17 - samples/sec: 11713.04 - lr: 0.100000 - momentum: 0.000000 2026-02-10 09:26:32,144 epoch 12 - iter 1866/3110 - loss 0.16326496 - time (sec): 1884.19 - samples/sec: 11742.48 - lr: 0.100000 - momentum: 0.000000 2026-02-10 09:31:46,688 epoch 12 - iter 2177/3110 - loss 0.16312106 - time (sec): 2198.74 - samples/sec: 11745.23 - lr: 0.100000 - momentum: 0.000000 2026-02-10 09:37:00,662 epoch 12 - iter 2488/3110 - loss 0.16306929 - time (sec): 2512.71 - samples/sec: 11748.20 - lr: 0.100000 - momentum: 0.000000 2026-02-10 09:42:15,995 epoch 12 - iter 2799/3110 - loss 0.16300123 - time (sec): 2828.04 - samples/sec: 11747.51 - lr: 0.100000 - momentum: 0.000000 2026-02-10 09:47:29,369 epoch 12 - iter 3110/3110 - loss 0.16296335 - time (sec): 3141.42 - samples/sec: 11749.45 - lr: 0.100000 - momentum: 0.000000 2026-02-10 09:47:29,369 ---------------------------------------------------------------------------------------------------- 2026-02-10 09:47:29,369 EPOCH 12 done: loss 0.1630 - lr: 0.100000 2026-02-10 09:47:29,369 Saving model at current epoch since 'save_model_each_k_epochs=1' was set 2026-02-10 10:01:10,196 DEV : loss 0.1056423932313919 - f1-score (micro avg) 0.9577 2026-02-10 10:02:06,666 - 0 epochs without improvement 2026-02-10 10:02:06,667 saving best model 2026-02-10 10:02:06,942 ---------------------------------------------------------------------------------------------------- 2026-02-10 10:07:24,444 epoch 13 - iter 311/3110 - loss 0.16116292 - time (sec): 317.50 - samples/sec: 11620.61 - lr: 0.100000 - momentum: 0.000000 2026-02-10 10:12:36,255 epoch 13 - iter 622/3110 - loss 0.16130133 - time (sec): 629.31 - samples/sec: 11719.52 - lr: 0.100000 - momentum: 0.000000 2026-02-10 10:17:50,537 epoch 13 - iter 933/3110 - loss 0.16123996 - time (sec): 943.59 - samples/sec: 11721.13 - lr: 0.100000 - momentum: 0.000000 2026-02-10 10:23:00,893 epoch 13 - iter 1244/3110 - loss 0.16119485 - time (sec): 1253.95 - samples/sec: 11763.10 - lr: 0.100000 - momentum: 0.000000 2026-02-10 10:28:14,450 epoch 13 - iter 1555/3110 - loss 0.16113836 - time (sec): 1567.51 - samples/sec: 11778.32 - lr: 0.100000 - momentum: 0.000000 2026-02-10 10:33:26,047 epoch 13 - iter 1866/3110 - loss 0.16105785 - time (sec): 1879.10 - samples/sec: 11788.86 - lr: 0.100000 - momentum: 0.000000 2026-02-10 10:38:43,781 epoch 13 - iter 2177/3110 - loss 0.16096505 - time (sec): 2196.84 - samples/sec: 11768.85 - lr: 0.100000 - momentum: 0.000000 2026-02-10 10:43:52,826 epoch 13 - iter 2488/3110 - loss 0.16091201 - time (sec): 2505.88 - samples/sec: 11790.07 - lr: 0.100000 - momentum: 0.000000 2026-02-10 10:49:01,924 epoch 13 - iter 2799/3110 - loss 0.16080766 - time (sec): 2814.98 - samples/sec: 11800.47 - lr: 0.100000 - momentum: 0.000000 2026-02-10 10:54:13,847 epoch 13 - iter 3110/3110 - loss 0.16085373 - time (sec): 3126.90 - samples/sec: 11803.98 - lr: 0.100000 - momentum: 0.000000 2026-02-10 10:54:13,847 ---------------------------------------------------------------------------------------------------- 2026-02-10 10:54:13,847 EPOCH 13 done: loss 0.1609 - lr: 0.100000 2026-02-10 10:54:13,847 Saving model at current epoch since 'save_model_each_k_epochs=1' was set 2026-02-10 11:08:00,321 DEV : loss 0.10403402149677277 - f1-score (micro avg) 0.9582 2026-02-10 11:08:46,444 - 0 epochs without improvement 2026-02-10 11:08:46,445 saving best model 2026-02-10 11:08:46,719 ---------------------------------------------------------------------------------------------------- 2026-02-10 11:14:11,567 epoch 14 - iter 311/3110 - loss 0.15899880 - time (sec): 324.85 - samples/sec: 11346.85 - lr: 0.100000 - momentum: 0.000000 2026-02-10 11:19:26,723 epoch 14 - iter 622/3110 - loss 0.15903770 - time (sec): 640.00 - samples/sec: 11525.21 - lr: 0.100000 - momentum: 0.000000 2026-02-10 11:24:40,356 epoch 14 - iter 933/3110 - loss 0.15912954 - time (sec): 953.64 - samples/sec: 11621.99 - lr: 0.100000 - momentum: 0.000000 2026-02-10 11:29:54,366 epoch 14 - iter 1244/3110 - loss 0.15920298 - time (sec): 1267.65 - samples/sec: 11646.64 - lr: 0.100000 - momentum: 0.000000 2026-02-10 11:35:05,167 epoch 14 - iter 1555/3110 - loss 0.15930740 - time (sec): 1578.45 - samples/sec: 11690.72 - lr: 0.100000 - momentum: 0.000000 2026-02-10 11:40:18,442 epoch 14 - iter 1866/3110 - loss 0.15922501 - time (sec): 1891.72 - samples/sec: 11708.16 - lr: 0.100000 - momentum: 0.000000 2026-02-10 11:45:27,927 epoch 14 - iter 2177/3110 - loss 0.15914178 - time (sec): 2201.21 - samples/sec: 11729.69 - lr: 0.100000 - momentum: 0.000000 2026-02-10 11:50:40,596 epoch 14 - iter 2488/3110 - loss 0.15904949 - time (sec): 2513.88 - samples/sec: 11742.39 - lr: 0.100000 - momentum: 0.000000 2026-02-10 11:55:51,043 epoch 14 - iter 2799/3110 - loss 0.15898782 - time (sec): 2824.32 - samples/sec: 11756.05 - lr: 0.100000 - momentum: 0.000000 2026-02-10 12:01:06,539 epoch 14 - iter 3110/3110 - loss 0.15897226 - time (sec): 3139.82 - samples/sec: 11755.42 - lr: 0.100000 - momentum: 0.000000 2026-02-10 12:01:06,539 ---------------------------------------------------------------------------------------------------- 2026-02-10 12:01:06,539 EPOCH 14 done: loss 0.1590 - lr: 0.100000 2026-02-10 12:01:06,539 Saving model at current epoch since 'save_model_each_k_epochs=1' was set 2026-02-10 12:14:49,377 DEV : loss 0.10377830266952515 - f1-score (micro avg) 0.9583 2026-02-10 12:15:48,667 - 0 epochs without improvement 2026-02-10 12:15:48,667 saving best model 2026-02-10 12:15:48,924 ---------------------------------------------------------------------------------------------------- 2026-02-10 12:21:10,377 epoch 15 - iter 311/3110 - loss 0.15731530 - time (sec): 321.45 - samples/sec: 11514.67 - lr: 0.100000 - momentum: 0.000000 2026-02-10 12:26:20,264 epoch 15 - iter 622/3110 - loss 0.15724833 - time (sec): 631.34 - samples/sec: 11717.14 - lr: 0.100000 - momentum: 0.000000 2026-02-10 12:31:32,015 epoch 15 - iter 933/3110 - loss 0.15715091 - time (sec): 943.09 - samples/sec: 11753.81 - lr: 0.100000 - momentum: 0.000000 2026-02-10 12:36:45,077 epoch 15 - iter 1244/3110 - loss 0.15731420 - time (sec): 1256.15 - samples/sec: 11771.52 - lr: 0.100000 - momentum: 0.000000 2026-02-10 12:41:58,527 epoch 15 - iter 1555/3110 - loss 0.15729791 - time (sec): 1569.60 - samples/sec: 11772.94 - lr: 0.100000 - momentum: 0.000000 2026-02-10 12:47:12,347 epoch 15 - iter 1866/3110 - loss 0.15727648 - time (sec): 1883.42 - samples/sec: 11772.49 - lr: 0.100000 - momentum: 0.000000 2026-02-10 12:52:23,659 epoch 15 - iter 2177/3110 - loss 0.15730041 - time (sec): 2194.73 - samples/sec: 11785.18 - lr: 0.100000 - momentum: 0.000000 2026-02-10 12:57:35,305 epoch 15 - iter 2488/3110 - loss 0.15723497 - time (sec): 2506.38 - samples/sec: 11785.23 - lr: 0.100000 - momentum: 0.000000 2026-02-10 13:02:47,917 epoch 15 - iter 2799/3110 - loss 0.15717980 - time (sec): 2818.99 - samples/sec: 11785.61 - lr: 0.100000 - momentum: 0.000000 2026-02-10 13:08:00,591 epoch 15 - iter 3110/3110 - loss 0.15714437 - time (sec): 3131.67 - samples/sec: 11786.02 - lr: 0.100000 - momentum: 0.000000 2026-02-10 13:08:00,592 ---------------------------------------------------------------------------------------------------- 2026-02-10 13:08:00,592 EPOCH 15 done: loss 0.1571 - lr: 0.100000 2026-02-10 13:08:00,592 Saving model at current epoch since 'save_model_each_k_epochs=1' was set 2026-02-10 13:21:39,506 DEV : loss 0.10233993083238602 - f1-score (micro avg) 0.9589 2026-02-10 13:22:36,304 - 0 epochs without improvement 2026-02-10 13:22:36,304 saving best model 2026-02-10 13:22:36,812 ---------------------------------------------------------------------------------------------------- 2026-02-10 13:22:36,812 Loading model from best epoch ... 2026-02-10 13:22:37,687 SequenceTagger predicts: Dictionary with 17 tags: ADV, CCONJ, ADJ, NOUN, VERB, ADP, PUNCT, NUM, PRON, PROPN, FM, PART, ORD, ITJ, X, , 2026-02-10 13:34:49,261 Results: - F-score (micro) 0.9588 - F-score (macro) 0.9397 - Accuracy 0.9588 By class: precision recall f1-score support NOUN 0.9444 0.9480 0.9462 1036164 PUNCT 0.9999 1.0000 1.0000 831460 VERB 0.9657 0.9465 0.9560 810899 CCONJ 0.9833 0.9920 0.9877 463354 PRON 0.9657 0.9631 0.9644 405738 ADP 0.9786 0.9886 0.9835 296947 ADV 0.9300 0.9264 0.9282 285781 ADJ 0.8347 0.8443 0.8395 273219 PROPN 0.9428 0.9623 0.9525 128068 NUM 0.9771 0.9913 0.9842 58389 ORD 0.8362 0.9223 0.8771 8534 ITJ 0.9088 0.8821 0.8953 4554 PART 0.9509 0.9307 0.9407 3202 FM 0.9226 0.8804 0.9010 2491 accuracy 0.9588 4608800 macro avg 0.9386 0.9413 0.9397 4608800 weighted avg 0.9589 0.9588 0.9588 4608800 2026-02-10 13:34:49,261 ----------------------------------------------------------------------------------------------------