Upload 3 files
Browse files- loss.tsv +41 -0
- pytorch_model.bin +3 -0
- training.log +1095 -0
loss.tsv
ADDED
|
@@ -0,0 +1,41 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
EPOCH TIMESTAMP BAD_EPOCHS LEARNING_RATE TRAIN_LOSS TRAIN_PRECISION TRAIN_RECALL TRAIN_F1 TRAIN_ACCURACY DEV_LOSS DEV_PRECISION DEV_RECALL DEV_F1 DEV_ACCURACY
|
| 2 |
+
1 12:42:38 0 0.1000 1.2624991280691964 0.861785352230072 0.7432 0.7097 0.7261 0.5916 0.9671951532363892 0.687 0.6593 0.6729 0.5341
|
| 3 |
+
2 12:43:50 0 0.1000 0.6825343231175375 0.5168215036392212 0.7308 0.8282 0.7765 0.661 0.6507071852684021 0.6635 0.7656 0.7109 0.5838
|
| 4 |
+
3 12:45:03 0 0.1000 0.5183594098256386 0.3893236517906189 0.8172 0.836 0.8265 0.7337 0.4997367858886719 0.7649 0.7985 0.7814 0.6899
|
| 5 |
+
4 12:46:15 0 0.1000 0.4139997055040813 0.3005659878253937 0.8938 0.8594 0.8763 0.8003 0.3956667482852936 0.8377 0.8132 0.8253 0.74
|
| 6 |
+
5 12:47:27 1 0.1000 0.36818719446306397 0.24057592451572418 0.8911 0.9028 0.8969 0.8363 0.3635919690132141 0.7993 0.8168 0.808 0.7102
|
| 7 |
+
6 12:48:38 2 0.1000 0.31204920145584586 0.2524365782737732 0.87 0.8696 0.8698 0.7968 0.3847086429595947 0.7949 0.7949 0.7949 0.7023
|
| 8 |
+
7 12:49:50 3 0.1000 0.28333256802917633 0.21568651497364044 0.8761 0.893 0.8845 0.8179 0.3467901051044464 0.8123 0.8242 0.8182 0.7258
|
| 9 |
+
8 12:51:02 4 0.1000 0.2589272867494585 0.19819645583629608 0.8969 0.8918 0.8943 0.8346 0.33811214566230774 0.8178 0.8059 0.8118 0.719
|
| 10 |
+
9 12:52:14 0 0.0500 0.2208224505362426 0.14378328621387482 0.9301 0.927 0.9285 0.8794 0.2700260877609253 0.8401 0.8278 0.8339 0.7386
|
| 11 |
+
10 12:53:28 0 0.0500 0.2100470625941066 0.1295960396528244 0.94 0.9319 0.936 0.8938 0.2524380087852478 0.8625 0.8498 0.8561 0.7708
|
| 12 |
+
11 12:54:42 1 0.0500 0.19424312072392752 0.1299719512462616 0.9298 0.9393 0.9345 0.8907 0.2761968970298767 0.8406 0.8498 0.8452 0.7557
|
| 13 |
+
12 12:55:55 0 0.0500 0.18922761222213783 0.11139164865016937 0.9456 0.9479 0.9468 0.9092 0.24872702360153198 0.8704 0.8608 0.8656 0.786
|
| 14 |
+
13 12:57:06 1 0.0500 0.178593337767626 0.11495152860879898 0.94 0.9446 0.9423 0.9011 0.2527526021003723 0.8571 0.8571 0.8571 0.7697
|
| 15 |
+
14 12:58:17 0 0.0500 0.17087342237277447 0.09600471705198288 0.9559 0.959 0.9574 0.9252 0.2339187115430832 0.8881 0.8718 0.8799 0.8013
|
| 16 |
+
15 12:59:28 0 0.0500 0.1794916741442902 0.10160095989704132 0.9584 0.9455 0.9519 0.9158 0.22547538578510284 0.8981 0.8718 0.8848 0.8041
|
| 17 |
+
16 13:00:40 1 0.0500 0.16603657197670119 0.09528940916061401 0.9628 0.9442 0.9534 0.9208 0.2436159998178482 0.8897 0.8571 0.8731 0.8014
|
| 18 |
+
17 13:01:51 2 0.0500 0.16133079754530325 0.09658616781234741 0.9542 0.9475 0.9508 0.9127 0.25625115633010864 0.8731 0.8571 0.8651 0.7748
|
| 19 |
+
18 13:03:04 3 0.0500 0.15205490873309382 0.07957068085670471 0.9641 0.9569 0.9605 0.9332 0.24449127912521362 0.8826 0.8535 0.8678 0.7898
|
| 20 |
+
19 13:04:15 4 0.0500 0.14762112769854846 0.08018826693296432 0.9589 0.9664 0.9626 0.9342 0.2329457700252533 0.8635 0.8571 0.8603 0.7697
|
| 21 |
+
20 13:05:25 0 0.0250 0.13416617515222157 0.06673520058393478 0.9676 0.9672 0.9674 0.9417 0.22465617954730988 0.8981 0.8718 0.8848 0.8068
|
| 22 |
+
21 13:06:34 0 0.0250 0.13184886426111447 0.06548392027616501 0.9727 0.9647 0.9687 0.9446 0.22177472710609436 0.8985 0.8755 0.8868 0.8157
|
| 23 |
+
22 13:07:46 0 0.0250 0.12659577267309013 0.06098590046167374 0.9752 0.9668 0.971 0.9477 0.21829380095005035 0.9049 0.8718 0.8881 0.8095
|
| 24 |
+
23 13:08:59 0 0.0250 0.1288031041974852 0.05855342745780945 0.9769 0.9717 0.9743 0.9549 0.21743902564048767 0.9053 0.8755 0.8901 0.8129
|
| 25 |
+
24 13:10:12 1 0.0250 0.1292022240736277 0.05762539058923721 0.9705 0.9705 0.9705 0.9487 0.21672436594963074 0.8947 0.8718 0.8831 0.8068
|
| 26 |
+
25 13:11:23 0 0.0250 0.12257867395525147 0.05489451438188553 0.9762 0.9754 0.9758 0.957 0.20767748355865479 0.9094 0.8828 0.8959 0.8197
|
| 27 |
+
26 13:12:36 1 0.0250 0.12978441588272005 0.053497862070798874 0.9691 0.977 0.9731 0.9517 0.21015514433383942 0.8848 0.8718 0.8782 0.7987
|
| 28 |
+
27 13:13:49 2 0.0250 0.12584238060429653 0.05426767095923424 0.974 0.9688 0.9714 0.9494 0.21591384708881378 0.9057 0.8791 0.8922 0.8191
|
| 29 |
+
28 13:15:05 0 0.0250 0.11048245014825919 0.05138213932514191 0.9753 0.9721 0.9737 0.9534 0.20661891996860504 0.9101 0.8901 0.9 0.8265
|
| 30 |
+
29 13:16:20 1 0.0250 0.10748581229280035 0.04754020646214485 0.9831 0.9783 0.9807 0.9656 0.20554056763648987 0.8947 0.8718 0.8831 0.8013
|
| 31 |
+
30 13:17:37 2 0.0250 0.1138213260034737 0.049051374197006226 0.9814 0.9717 0.9765 0.9591 0.21063490211963654 0.9195 0.8791 0.8989 0.8276
|
| 32 |
+
31 13:18:54 3 0.0250 0.11994299481505574 0.04879188537597656 0.9822 0.9713 0.9767 0.9599 0.21518820524215698 0.9077 0.8645 0.8856 0.811
|
| 33 |
+
32 13:20:08 4 0.0250 0.113609206908251 0.04602975398302078 0.9762 0.9766 0.9764 0.9593 0.2196006327867508 0.8947 0.8718 0.8831 0.8068
|
| 34 |
+
33 13:21:25 1 0.0125 0.11010150409611422 0.04103841260075569 0.982 0.9815 0.9818 0.9684 0.19789864122867584 0.8981 0.8718 0.8848 0.8068
|
| 35 |
+
34 13:22:43 2 0.0125 0.10113046086650729 0.04190446436405182 0.9815 0.9783 0.9799 0.9656 0.20444391667842865 0.9157 0.8755 0.8951 0.8241
|
| 36 |
+
35 13:24:00 3 0.0125 0.10165476730488562 0.03953753411769867 0.9836 0.9815 0.9826 0.9704 0.20463646948337555 0.9091 0.8791 0.8939 0.8191
|
| 37 |
+
36 13:25:15 4 0.0125 0.10106634041664465 0.03854582458734512 0.9832 0.9815 0.9824 0.9692 0.20157837867736816 0.9019 0.8755 0.8885 0.8074
|
| 38 |
+
37 13:26:32 1 0.0063 0.09473008117933748 0.03739665448665619 0.9856 0.9828 0.9842 0.9736 0.20396985113620758 0.916 0.8791 0.8972 0.8276
|
| 39 |
+
38 13:27:45 0 0.0063 0.0904515941165205 0.03672816604375839 0.9856 0.9824 0.984 0.9724 0.1981133222579956 0.9234 0.8828 0.9026 0.831
|
| 40 |
+
39 13:28:59 1 0.0063 0.09554681721523668 0.03638274222612381 0.9856 0.9824 0.984 0.9724 0.20241594314575195 0.9122 0.8755 0.8935 0.8185
|
| 41 |
+
40 13:30:13 2 0.0063 0.09763757313900043 0.03600259870290756 0.9852 0.9824 0.9838 0.972 0.20378927886486053 0.9091 0.8791 0.8939 0.8163
|
pytorch_model.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:44f76d94464e679cd6a35070b94ee5efc9fa7d8a75c445cbd751ab10e2a45bc5
|
| 3 |
+
size 682870485
|
training.log
ADDED
|
@@ -0,0 +1,1095 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
2023-04-05 12:41:26,937 ----------------------------------------------------------------------------------------------------
|
| 2 |
+
2023-04-05 12:41:26,943 Model: "SequenceTagger(
|
| 3 |
+
(embeddings): TransformerWordEmbeddings(
|
| 4 |
+
(model): BertModel(
|
| 5 |
+
(embeddings): BertEmbeddings(
|
| 6 |
+
(word_embeddings): Embedding(105880, 768)
|
| 7 |
+
(position_embeddings): Embedding(512, 768)
|
| 8 |
+
(token_type_embeddings): Embedding(2, 768)
|
| 9 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
| 10 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 11 |
+
)
|
| 12 |
+
(encoder): BertEncoder(
|
| 13 |
+
(layer): ModuleList(
|
| 14 |
+
(0): BertLayer(
|
| 15 |
+
(attention): BertAttention(
|
| 16 |
+
(self): BertSelfAttention(
|
| 17 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
| 18 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
| 19 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
| 20 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 21 |
+
)
|
| 22 |
+
(output): BertSelfOutput(
|
| 23 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
| 24 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
| 25 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 26 |
+
)
|
| 27 |
+
)
|
| 28 |
+
(intermediate): BertIntermediate(
|
| 29 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
| 30 |
+
(intermediate_act_fn): GELUActivation()
|
| 31 |
+
)
|
| 32 |
+
(output): BertOutput(
|
| 33 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
| 34 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
| 35 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 36 |
+
)
|
| 37 |
+
)
|
| 38 |
+
(1): BertLayer(
|
| 39 |
+
(attention): BertAttention(
|
| 40 |
+
(self): BertSelfAttention(
|
| 41 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
| 42 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
| 43 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
| 44 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 45 |
+
)
|
| 46 |
+
(output): BertSelfOutput(
|
| 47 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
| 48 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
| 49 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 50 |
+
)
|
| 51 |
+
)
|
| 52 |
+
(intermediate): BertIntermediate(
|
| 53 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
| 54 |
+
(intermediate_act_fn): GELUActivation()
|
| 55 |
+
)
|
| 56 |
+
(output): BertOutput(
|
| 57 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
| 58 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
| 59 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 60 |
+
)
|
| 61 |
+
)
|
| 62 |
+
(2): BertLayer(
|
| 63 |
+
(attention): BertAttention(
|
| 64 |
+
(self): BertSelfAttention(
|
| 65 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
| 66 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
| 67 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
| 68 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 69 |
+
)
|
| 70 |
+
(output): BertSelfOutput(
|
| 71 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
| 72 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
| 73 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 74 |
+
)
|
| 75 |
+
)
|
| 76 |
+
(intermediate): BertIntermediate(
|
| 77 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
| 78 |
+
(intermediate_act_fn): GELUActivation()
|
| 79 |
+
)
|
| 80 |
+
(output): BertOutput(
|
| 81 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
| 82 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
| 83 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 84 |
+
)
|
| 85 |
+
)
|
| 86 |
+
(3): BertLayer(
|
| 87 |
+
(attention): BertAttention(
|
| 88 |
+
(self): BertSelfAttention(
|
| 89 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
| 90 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
| 91 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
| 92 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 93 |
+
)
|
| 94 |
+
(output): BertSelfOutput(
|
| 95 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
| 96 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
| 97 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 98 |
+
)
|
| 99 |
+
)
|
| 100 |
+
(intermediate): BertIntermediate(
|
| 101 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
| 102 |
+
(intermediate_act_fn): GELUActivation()
|
| 103 |
+
)
|
| 104 |
+
(output): BertOutput(
|
| 105 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
| 106 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
| 107 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 108 |
+
)
|
| 109 |
+
)
|
| 110 |
+
(4): BertLayer(
|
| 111 |
+
(attention): BertAttention(
|
| 112 |
+
(self): BertSelfAttention(
|
| 113 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
| 114 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
| 115 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
| 116 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 117 |
+
)
|
| 118 |
+
(output): BertSelfOutput(
|
| 119 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
| 120 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
| 121 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 122 |
+
)
|
| 123 |
+
)
|
| 124 |
+
(intermediate): BertIntermediate(
|
| 125 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
| 126 |
+
(intermediate_act_fn): GELUActivation()
|
| 127 |
+
)
|
| 128 |
+
(output): BertOutput(
|
| 129 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
| 130 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
| 131 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 132 |
+
)
|
| 133 |
+
)
|
| 134 |
+
(5): BertLayer(
|
| 135 |
+
(attention): BertAttention(
|
| 136 |
+
(self): BertSelfAttention(
|
| 137 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
| 138 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
| 139 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
| 140 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 141 |
+
)
|
| 142 |
+
(output): BertSelfOutput(
|
| 143 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
| 144 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
| 145 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 146 |
+
)
|
| 147 |
+
)
|
| 148 |
+
(intermediate): BertIntermediate(
|
| 149 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
| 150 |
+
(intermediate_act_fn): GELUActivation()
|
| 151 |
+
)
|
| 152 |
+
(output): BertOutput(
|
| 153 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
| 154 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
| 155 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 156 |
+
)
|
| 157 |
+
)
|
| 158 |
+
(6): BertLayer(
|
| 159 |
+
(attention): BertAttention(
|
| 160 |
+
(self): BertSelfAttention(
|
| 161 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
| 162 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
| 163 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
| 164 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 165 |
+
)
|
| 166 |
+
(output): BertSelfOutput(
|
| 167 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
| 168 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
| 169 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 170 |
+
)
|
| 171 |
+
)
|
| 172 |
+
(intermediate): BertIntermediate(
|
| 173 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
| 174 |
+
(intermediate_act_fn): GELUActivation()
|
| 175 |
+
)
|
| 176 |
+
(output): BertOutput(
|
| 177 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
| 178 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
| 179 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 180 |
+
)
|
| 181 |
+
)
|
| 182 |
+
(7): BertLayer(
|
| 183 |
+
(attention): BertAttention(
|
| 184 |
+
(self): BertSelfAttention(
|
| 185 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
| 186 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
| 187 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
| 188 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 189 |
+
)
|
| 190 |
+
(output): BertSelfOutput(
|
| 191 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
| 192 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
| 193 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 194 |
+
)
|
| 195 |
+
)
|
| 196 |
+
(intermediate): BertIntermediate(
|
| 197 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
| 198 |
+
(intermediate_act_fn): GELUActivation()
|
| 199 |
+
)
|
| 200 |
+
(output): BertOutput(
|
| 201 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
| 202 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
| 203 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 204 |
+
)
|
| 205 |
+
)
|
| 206 |
+
(8): BertLayer(
|
| 207 |
+
(attention): BertAttention(
|
| 208 |
+
(self): BertSelfAttention(
|
| 209 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
| 210 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
| 211 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
| 212 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 213 |
+
)
|
| 214 |
+
(output): BertSelfOutput(
|
| 215 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
| 216 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
| 217 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 218 |
+
)
|
| 219 |
+
)
|
| 220 |
+
(intermediate): BertIntermediate(
|
| 221 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
| 222 |
+
(intermediate_act_fn): GELUActivation()
|
| 223 |
+
)
|
| 224 |
+
(output): BertOutput(
|
| 225 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
| 226 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
| 227 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 228 |
+
)
|
| 229 |
+
)
|
| 230 |
+
(9): BertLayer(
|
| 231 |
+
(attention): BertAttention(
|
| 232 |
+
(self): BertSelfAttention(
|
| 233 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
| 234 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
| 235 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
| 236 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 237 |
+
)
|
| 238 |
+
(output): BertSelfOutput(
|
| 239 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
| 240 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
| 241 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 242 |
+
)
|
| 243 |
+
)
|
| 244 |
+
(intermediate): BertIntermediate(
|
| 245 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
| 246 |
+
(intermediate_act_fn): GELUActivation()
|
| 247 |
+
)
|
| 248 |
+
(output): BertOutput(
|
| 249 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
| 250 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
| 251 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 252 |
+
)
|
| 253 |
+
)
|
| 254 |
+
(10): BertLayer(
|
| 255 |
+
(attention): BertAttention(
|
| 256 |
+
(self): BertSelfAttention(
|
| 257 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
| 258 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
| 259 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
| 260 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 261 |
+
)
|
| 262 |
+
(output): BertSelfOutput(
|
| 263 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
| 264 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
| 265 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 266 |
+
)
|
| 267 |
+
)
|
| 268 |
+
(intermediate): BertIntermediate(
|
| 269 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
| 270 |
+
(intermediate_act_fn): GELUActivation()
|
| 271 |
+
)
|
| 272 |
+
(output): BertOutput(
|
| 273 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
| 274 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
| 275 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 276 |
+
)
|
| 277 |
+
)
|
| 278 |
+
(11): BertLayer(
|
| 279 |
+
(attention): BertAttention(
|
| 280 |
+
(self): BertSelfAttention(
|
| 281 |
+
(query): Linear(in_features=768, out_features=768, bias=True)
|
| 282 |
+
(key): Linear(in_features=768, out_features=768, bias=True)
|
| 283 |
+
(value): Linear(in_features=768, out_features=768, bias=True)
|
| 284 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 285 |
+
)
|
| 286 |
+
(output): BertSelfOutput(
|
| 287 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
| 288 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
| 289 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 290 |
+
)
|
| 291 |
+
)
|
| 292 |
+
(intermediate): BertIntermediate(
|
| 293 |
+
(dense): Linear(in_features=768, out_features=3072, bias=True)
|
| 294 |
+
(intermediate_act_fn): GELUActivation()
|
| 295 |
+
)
|
| 296 |
+
(output): BertOutput(
|
| 297 |
+
(dense): Linear(in_features=3072, out_features=768, bias=True)
|
| 298 |
+
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
|
| 299 |
+
(dropout): Dropout(p=0.1, inplace=False)
|
| 300 |
+
)
|
| 301 |
+
)
|
| 302 |
+
)
|
| 303 |
+
)
|
| 304 |
+
(pooler): BertPooler(
|
| 305 |
+
(dense): Linear(in_features=768, out_features=768, bias=True)
|
| 306 |
+
(activation): Tanh()
|
| 307 |
+
)
|
| 308 |
+
)
|
| 309 |
+
)
|
| 310 |
+
(word_dropout): WordDropout(p=0.05)
|
| 311 |
+
(locked_dropout): LockedDropout(p=0.5)
|
| 312 |
+
(embedding2nn): Linear(in_features=768, out_features=768, bias=True)
|
| 313 |
+
(rnn): LSTM(768, 256, batch_first=True, bidirectional=True)
|
| 314 |
+
(linear): Linear(in_features=512, out_features=11, bias=True)
|
| 315 |
+
(loss_function): ViterbiLoss()
|
| 316 |
+
(crf): CRF()
|
| 317 |
+
)"
|
| 318 |
+
2023-04-05 12:41:26,945 ----------------------------------------------------------------------------------------------------
|
| 319 |
+
2023-04-05 12:41:26,947 Corpus: "Corpus: 2253 train + 250 dev + 441 test sentences"
|
| 320 |
+
2023-04-05 12:41:26,948 ----------------------------------------------------------------------------------------------------
|
| 321 |
+
2023-04-05 12:41:26,948 Parameters:
|
| 322 |
+
2023-04-05 12:41:26,949 - learning_rate: "0.100000"
|
| 323 |
+
2023-04-05 12:41:26,950 - mini_batch_size: "32"
|
| 324 |
+
2023-04-05 12:41:26,951 - patience: "3"
|
| 325 |
+
2023-04-05 12:41:26,951 - anneal_factor: "0.5"
|
| 326 |
+
2023-04-05 12:41:26,953 - max_epochs: "40"
|
| 327 |
+
2023-04-05 12:41:26,954 - shuffle: "True"
|
| 328 |
+
2023-04-05 12:41:26,955 - train_with_dev: "False"
|
| 329 |
+
2023-04-05 12:41:26,955 - batch_growth_annealing: "False"
|
| 330 |
+
2023-04-05 12:41:26,958 ----------------------------------------------------------------------------------------------------
|
| 331 |
+
2023-04-05 12:41:26,958 Model training base path: "resources\taggers\addressNER"
|
| 332 |
+
2023-04-05 12:41:26,959 ----------------------------------------------------------------------------------------------------
|
| 333 |
+
2023-04-05 12:41:26,960 Device: cuda:0
|
| 334 |
+
2023-04-05 12:41:26,961 ----------------------------------------------------------------------------------------------------
|
| 335 |
+
2023-04-05 12:41:26,962 Embeddings storage mode: none
|
| 336 |
+
2023-04-05 12:41:26,962 ----------------------------------------------------------------------------------------------------
|
| 337 |
+
2023-04-05 12:41:30,223 epoch 1 - iter 7/71 - loss 2.28721713 - time (sec): 3.26 - samples/sec: 178.21 - lr: 0.100000
|
| 338 |
+
2023-04-05 12:41:33,767 epoch 1 - iter 14/71 - loss 1.92622932 - time (sec): 6.80 - samples/sec: 171.67 - lr: 0.100000
|
| 339 |
+
2023-04-05 12:41:37,177 epoch 1 - iter 21/71 - loss 1.73813053 - time (sec): 10.21 - samples/sec: 168.49 - lr: 0.100000
|
| 340 |
+
2023-04-05 12:41:40,953 epoch 1 - iter 28/71 - loss 1.61830648 - time (sec): 13.99 - samples/sec: 166.33 - lr: 0.100000
|
| 341 |
+
2023-04-05 12:41:44,268 epoch 1 - iter 35/71 - loss 1.53214432 - time (sec): 17.30 - samples/sec: 168.39 - lr: 0.100000
|
| 342 |
+
2023-04-05 12:41:48,032 epoch 1 - iter 42/71 - loss 1.45733425 - time (sec): 21.07 - samples/sec: 167.12 - lr: 0.100000
|
| 343 |
+
2023-04-05 12:41:51,701 epoch 1 - iter 49/71 - loss 1.39384449 - time (sec): 24.74 - samples/sec: 166.14 - lr: 0.100000
|
| 344 |
+
2023-04-05 12:41:55,370 epoch 1 - iter 56/71 - loss 1.34539887 - time (sec): 28.41 - samples/sec: 164.71 - lr: 0.100000
|
| 345 |
+
2023-04-05 12:41:59,094 epoch 1 - iter 63/71 - loss 1.30608958 - time (sec): 32.13 - samples/sec: 163.55 - lr: 0.100000
|
| 346 |
+
2023-04-05 12:42:02,838 epoch 1 - iter 70/71 - loss 1.26400603 - time (sec): 35.87 - samples/sec: 163.93 - lr: 0.100000
|
| 347 |
+
2023-04-05 12:42:03,627 ----------------------------------------------------------------------------------------------------
|
| 348 |
+
2023-04-05 12:42:03,628 EPOCH 1 done: loss 1.2625 - lr 0.100000
|
| 349 |
+
2023-04-05 12:42:34,974 Evaluating as a multi-label problem: False
|
| 350 |
+
2023-04-05 12:42:34,994 TRAIN : loss 0.861785352230072 - f1-score (micro avg) 0.7261
|
| 351 |
+
2023-04-05 12:42:38,596 Evaluating as a multi-label problem: False
|
| 352 |
+
2023-04-05 12:42:38,610 DEV : loss 0.9671951532363892 - f1-score (micro avg) 0.6729
|
| 353 |
+
2023-04-05 12:42:38,617 BAD EPOCHS (no improvement): 0
|
| 354 |
+
2023-04-05 12:42:38,618 saving best model
|
| 355 |
+
2023-04-05 12:42:39,745 ----------------------------------------------------------------------------------------------------
|
| 356 |
+
2023-04-05 12:42:42,695 epoch 2 - iter 7/71 - loss 0.92334886 - time (sec): 2.95 - samples/sec: 202.11 - lr: 0.100000
|
| 357 |
+
2023-04-05 12:42:46,576 epoch 2 - iter 14/71 - loss 0.87175200 - time (sec): 6.83 - samples/sec: 177.16 - lr: 0.100000
|
| 358 |
+
2023-04-05 12:42:50,257 epoch 2 - iter 21/71 - loss 0.81372023 - time (sec): 10.51 - samples/sec: 171.63 - lr: 0.100000
|
| 359 |
+
2023-04-05 12:42:53,892 epoch 2 - iter 28/71 - loss 0.78629231 - time (sec): 14.15 - samples/sec: 168.38 - lr: 0.100000
|
| 360 |
+
2023-04-05 12:42:57,549 epoch 2 - iter 35/71 - loss 0.76274145 - time (sec): 17.80 - samples/sec: 166.10 - lr: 0.100000
|
| 361 |
+
2023-04-05 12:43:01,159 epoch 2 - iter 42/71 - loss 0.73990095 - time (sec): 21.41 - samples/sec: 163.64 - lr: 0.100000
|
| 362 |
+
2023-04-05 12:43:04,701 epoch 2 - iter 49/71 - loss 0.71935682 - time (sec): 24.96 - samples/sec: 164.09 - lr: 0.100000
|
| 363 |
+
2023-04-05 12:43:08,315 epoch 2 - iter 56/71 - loss 0.69784735 - time (sec): 28.57 - samples/sec: 163.32 - lr: 0.100000
|
| 364 |
+
2023-04-05 12:43:11,830 epoch 2 - iter 63/71 - loss 0.68374551 - time (sec): 32.08 - samples/sec: 164.13 - lr: 0.100000
|
| 365 |
+
2023-04-05 12:43:15,403 epoch 2 - iter 70/71 - loss 0.68318206 - time (sec): 35.66 - samples/sec: 165.13 - lr: 0.100000
|
| 366 |
+
2023-04-05 12:43:16,197 ----------------------------------------------------------------------------------------------------
|
| 367 |
+
2023-04-05 12:43:16,199 EPOCH 2 done: loss 0.6825 - lr 0.100000
|
| 368 |
+
2023-04-05 12:43:47,058 Evaluating as a multi-label problem: False
|
| 369 |
+
2023-04-05 12:43:47,079 TRAIN : loss 0.5168215036392212 - f1-score (micro avg) 0.7765
|
| 370 |
+
2023-04-05 12:43:50,605 Evaluating as a multi-label problem: False
|
| 371 |
+
2023-04-05 12:43:50,618 DEV : loss 0.6507071852684021 - f1-score (micro avg) 0.7109
|
| 372 |
+
2023-04-05 12:43:50,623 BAD EPOCHS (no improvement): 0
|
| 373 |
+
2023-04-05 12:43:50,624 saving best model
|
| 374 |
+
2023-04-05 12:43:51,833 ----------------------------------------------------------------------------------------------------
|
| 375 |
+
2023-04-05 12:43:54,803 epoch 3 - iter 7/71 - loss 0.54805596 - time (sec): 2.97 - samples/sec: 196.06 - lr: 0.100000
|
| 376 |
+
2023-04-05 12:43:58,496 epoch 3 - iter 14/71 - loss 0.52917100 - time (sec): 6.66 - samples/sec: 178.19 - lr: 0.100000
|
| 377 |
+
2023-04-05 12:44:02,157 epoch 3 - iter 21/71 - loss 0.53172923 - time (sec): 10.32 - samples/sec: 173.12 - lr: 0.100000
|
| 378 |
+
2023-04-05 12:44:05,748 epoch 3 - iter 28/71 - loss 0.53976229 - time (sec): 13.91 - samples/sec: 168.90 - lr: 0.100000
|
| 379 |
+
2023-04-05 12:44:09,391 epoch 3 - iter 35/71 - loss 0.54268651 - time (sec): 17.56 - samples/sec: 167.34 - lr: 0.100000
|
| 380 |
+
2023-04-05 12:44:13,039 epoch 3 - iter 42/71 - loss 0.53676146 - time (sec): 21.20 - samples/sec: 165.96 - lr: 0.100000
|
| 381 |
+
2023-04-05 12:44:16,753 epoch 3 - iter 49/71 - loss 0.52184996 - time (sec): 24.92 - samples/sec: 163.25 - lr: 0.100000
|
| 382 |
+
2023-04-05 12:44:20,426 epoch 3 - iter 56/71 - loss 0.51321120 - time (sec): 28.59 - samples/sec: 162.14 - lr: 0.100000
|
| 383 |
+
2023-04-05 12:44:24,224 epoch 3 - iter 63/71 - loss 0.50438788 - time (sec): 32.39 - samples/sec: 161.35 - lr: 0.100000
|
| 384 |
+
2023-04-05 12:44:27,884 epoch 3 - iter 70/71 - loss 0.51802873 - time (sec): 36.05 - samples/sec: 163.16 - lr: 0.100000
|
| 385 |
+
2023-04-05 12:44:28,729 ----------------------------------------------------------------------------------------------------
|
| 386 |
+
2023-04-05 12:44:28,730 EPOCH 3 done: loss 0.5184 - lr 0.100000
|
| 387 |
+
2023-04-05 12:44:59,957 Evaluating as a multi-label problem: False
|
| 388 |
+
2023-04-05 12:44:59,978 TRAIN : loss 0.3893236517906189 - f1-score (micro avg) 0.8265
|
| 389 |
+
2023-04-05 12:45:03,682 Evaluating as a multi-label problem: False
|
| 390 |
+
2023-04-05 12:45:03,692 DEV : loss 0.4997367858886719 - f1-score (micro avg) 0.7814
|
| 391 |
+
2023-04-05 12:45:03,700 BAD EPOCHS (no improvement): 0
|
| 392 |
+
2023-04-05 12:45:03,702 saving best model
|
| 393 |
+
2023-04-05 12:45:04,923 ----------------------------------------------------------------------------------------------------
|
| 394 |
+
2023-04-05 12:45:07,851 epoch 4 - iter 7/71 - loss 0.40043816 - time (sec): 2.92 - samples/sec: 198.64 - lr: 0.100000
|
| 395 |
+
2023-04-05 12:45:11,521 epoch 4 - iter 14/71 - loss 0.43328716 - time (sec): 6.60 - samples/sec: 179.83 - lr: 0.100000
|
| 396 |
+
2023-04-05 12:45:15,244 epoch 4 - iter 21/71 - loss 0.44871122 - time (sec): 10.32 - samples/sec: 173.97 - lr: 0.100000
|
| 397 |
+
2023-04-05 12:45:18,967 epoch 4 - iter 28/71 - loss 0.44610987 - time (sec): 14.04 - samples/sec: 167.08 - lr: 0.100000
|
| 398 |
+
2023-04-05 12:45:22,672 epoch 4 - iter 35/71 - loss 0.46221491 - time (sec): 17.75 - samples/sec: 167.81 - lr: 0.100000
|
| 399 |
+
2023-04-05 12:45:26,360 epoch 4 - iter 42/71 - loss 0.43613944 - time (sec): 21.43 - samples/sec: 166.04 - lr: 0.100000
|
| 400 |
+
2023-04-05 12:45:30,026 epoch 4 - iter 49/71 - loss 0.43457310 - time (sec): 25.10 - samples/sec: 163.95 - lr: 0.100000
|
| 401 |
+
2023-04-05 12:45:33,722 epoch 4 - iter 56/71 - loss 0.42344792 - time (sec): 28.80 - samples/sec: 163.63 - lr: 0.100000
|
| 402 |
+
2023-04-05 12:45:37,362 epoch 4 - iter 63/71 - loss 0.41981578 - time (sec): 32.44 - samples/sec: 163.15 - lr: 0.100000
|
| 403 |
+
2023-04-05 12:45:41,136 epoch 4 - iter 70/71 - loss 0.41432375 - time (sec): 36.21 - samples/sec: 162.52 - lr: 0.100000
|
| 404 |
+
2023-04-05 12:45:41,950 ----------------------------------------------------------------------------------------------------
|
| 405 |
+
2023-04-05 12:45:41,951 EPOCH 4 done: loss 0.4140 - lr 0.100000
|
| 406 |
+
2023-04-05 12:46:11,773 Evaluating as a multi-label problem: False
|
| 407 |
+
2023-04-05 12:46:11,790 TRAIN : loss 0.3005659878253937 - f1-score (micro avg) 0.8763
|
| 408 |
+
2023-04-05 12:46:15,334 Evaluating as a multi-label problem: False
|
| 409 |
+
2023-04-05 12:46:15,345 DEV : loss 0.3956667482852936 - f1-score (micro avg) 0.8253
|
| 410 |
+
2023-04-05 12:46:15,352 BAD EPOCHS (no improvement): 0
|
| 411 |
+
2023-04-05 12:46:15,353 saving best model
|
| 412 |
+
2023-04-05 12:46:16,744 ----------------------------------------------------------------------------------------------------
|
| 413 |
+
2023-04-05 12:46:19,771 epoch 5 - iter 7/71 - loss 0.33104508 - time (sec): 3.03 - samples/sec: 191.03 - lr: 0.100000
|
| 414 |
+
2023-04-05 12:46:23,484 epoch 5 - iter 14/71 - loss 0.37656699 - time (sec): 6.74 - samples/sec: 178.21 - lr: 0.100000
|
| 415 |
+
2023-04-05 12:46:27,097 epoch 5 - iter 21/71 - loss 0.37195919 - time (sec): 10.35 - samples/sec: 172.91 - lr: 0.100000
|
| 416 |
+
2023-04-05 12:46:30,706 epoch 5 - iter 28/71 - loss 0.37480153 - time (sec): 13.96 - samples/sec: 168.97 - lr: 0.100000
|
| 417 |
+
2023-04-05 12:46:34,278 epoch 5 - iter 35/71 - loss 0.38207478 - time (sec): 17.53 - samples/sec: 169.28 - lr: 0.100000
|
| 418 |
+
2023-04-05 12:46:38,037 epoch 5 - iter 42/71 - loss 0.37466200 - time (sec): 21.29 - samples/sec: 166.22 - lr: 0.100000
|
| 419 |
+
2023-04-05 12:46:41,693 epoch 5 - iter 49/71 - loss 0.37114011 - time (sec): 24.95 - samples/sec: 165.58 - lr: 0.100000
|
| 420 |
+
2023-04-05 12:46:45,303 epoch 5 - iter 56/71 - loss 0.37649024 - time (sec): 28.56 - samples/sec: 165.56 - lr: 0.100000
|
| 421 |
+
2023-04-05 12:46:49,121 epoch 5 - iter 63/71 - loss 0.37565114 - time (sec): 32.38 - samples/sec: 164.82 - lr: 0.100000
|
| 422 |
+
2023-04-05 12:46:52,825 epoch 5 - iter 70/71 - loss 0.36875926 - time (sec): 36.08 - samples/sec: 163.00 - lr: 0.100000
|
| 423 |
+
2023-04-05 12:46:53,641 ----------------------------------------------------------------------------------------------------
|
| 424 |
+
2023-04-05 12:46:53,642 EPOCH 5 done: loss 0.3682 - lr 0.100000
|
| 425 |
+
2023-04-05 12:47:23,884 Evaluating as a multi-label problem: False
|
| 426 |
+
2023-04-05 12:47:23,900 TRAIN : loss 0.24057592451572418 - f1-score (micro avg) 0.8969
|
| 427 |
+
2023-04-05 12:47:27,486 Evaluating as a multi-label problem: False
|
| 428 |
+
2023-04-05 12:47:27,497 DEV : loss 0.3635919690132141 - f1-score (micro avg) 0.808
|
| 429 |
+
2023-04-05 12:47:27,503 BAD EPOCHS (no improvement): 1
|
| 430 |
+
2023-04-05 12:47:27,509 ----------------------------------------------------------------------------------------------------
|
| 431 |
+
2023-04-05 12:47:30,597 epoch 6 - iter 7/71 - loss 0.29060467 - time (sec): 3.09 - samples/sec: 191.74 - lr: 0.100000
|
| 432 |
+
2023-04-05 12:47:34,411 epoch 6 - iter 14/71 - loss 0.33442431 - time (sec): 6.90 - samples/sec: 175.32 - lr: 0.100000
|
| 433 |
+
2023-04-05 12:47:38,195 epoch 6 - iter 21/71 - loss 0.33081293 - time (sec): 10.69 - samples/sec: 169.02 - lr: 0.100000
|
| 434 |
+
2023-04-05 12:47:41,883 epoch 6 - iter 28/71 - loss 0.31355855 - time (sec): 14.37 - samples/sec: 164.05 - lr: 0.100000
|
| 435 |
+
2023-04-05 12:47:45,626 epoch 6 - iter 35/71 - loss 0.30769625 - time (sec): 18.12 - samples/sec: 161.79 - lr: 0.100000
|
| 436 |
+
2023-04-05 12:47:49,335 epoch 6 - iter 42/71 - loss 0.30399027 - time (sec): 21.83 - samples/sec: 160.77 - lr: 0.100000
|
| 437 |
+
2023-04-05 12:47:53,093 epoch 6 - iter 49/71 - loss 0.30522447 - time (sec): 25.58 - samples/sec: 161.32 - lr: 0.100000
|
| 438 |
+
2023-04-05 12:47:56,821 epoch 6 - iter 56/71 - loss 0.30943381 - time (sec): 29.31 - samples/sec: 160.49 - lr: 0.100000
|
| 439 |
+
2023-04-05 12:48:00,614 epoch 6 - iter 63/71 - loss 0.30689469 - time (sec): 33.10 - samples/sec: 159.74 - lr: 0.100000
|
| 440 |
+
2023-04-05 12:48:04,217 epoch 6 - iter 70/71 - loss 0.31271804 - time (sec): 36.71 - samples/sec: 160.30 - lr: 0.100000
|
| 441 |
+
2023-04-05 12:48:05,056 ----------------------------------------------------------------------------------------------------
|
| 442 |
+
2023-04-05 12:48:05,057 EPOCH 6 done: loss 0.3120 - lr 0.100000
|
| 443 |
+
2023-04-05 12:48:35,310 Evaluating as a multi-label problem: False
|
| 444 |
+
2023-04-05 12:48:35,330 TRAIN : loss 0.2524365782737732 - f1-score (micro avg) 0.8698
|
| 445 |
+
2023-04-05 12:48:38,882 Evaluating as a multi-label problem: False
|
| 446 |
+
2023-04-05 12:48:38,893 DEV : loss 0.3847086429595947 - f1-score (micro avg) 0.7949
|
| 447 |
+
2023-04-05 12:48:38,897 BAD EPOCHS (no improvement): 2
|
| 448 |
+
2023-04-05 12:48:38,900 ----------------------------------------------------------------------------------------------------
|
| 449 |
+
2023-04-05 12:48:41,937 epoch 7 - iter 7/71 - loss 0.37316342 - time (sec): 3.04 - samples/sec: 194.56 - lr: 0.100000
|
| 450 |
+
2023-04-05 12:48:45,658 epoch 7 - iter 14/71 - loss 0.30328593 - time (sec): 6.76 - samples/sec: 173.42 - lr: 0.100000
|
| 451 |
+
2023-04-05 12:48:49,422 epoch 7 - iter 21/71 - loss 0.29911219 - time (sec): 10.52 - samples/sec: 168.42 - lr: 0.100000
|
| 452 |
+
2023-04-05 12:48:53,079 epoch 7 - iter 28/71 - loss 0.29218439 - time (sec): 14.18 - samples/sec: 167.57 - lr: 0.100000
|
| 453 |
+
2023-04-05 12:48:56,921 epoch 7 - iter 35/71 - loss 0.29301233 - time (sec): 18.02 - samples/sec: 164.19 - lr: 0.100000
|
| 454 |
+
2023-04-05 12:49:00,741 epoch 7 - iter 42/71 - loss 0.29024371 - time (sec): 21.84 - samples/sec: 162.13 - lr: 0.100000
|
| 455 |
+
2023-04-05 12:49:04,470 epoch 7 - iter 49/71 - loss 0.29491898 - time (sec): 25.57 - samples/sec: 161.04 - lr: 0.100000
|
| 456 |
+
2023-04-05 12:49:08,148 epoch 7 - iter 56/71 - loss 0.29485370 - time (sec): 29.25 - samples/sec: 161.27 - lr: 0.100000
|
| 457 |
+
2023-04-05 12:49:11,798 epoch 7 - iter 63/71 - loss 0.29048646 - time (sec): 32.90 - samples/sec: 161.74 - lr: 0.100000
|
| 458 |
+
2023-04-05 12:49:15,495 epoch 7 - iter 70/71 - loss 0.28277861 - time (sec): 36.60 - samples/sec: 160.65 - lr: 0.100000
|
| 459 |
+
2023-04-05 12:49:16,301 ----------------------------------------------------------------------------------------------------
|
| 460 |
+
2023-04-05 12:49:16,302 EPOCH 7 done: loss 0.2833 - lr 0.100000
|
| 461 |
+
2023-04-05 12:49:46,527 Evaluating as a multi-label problem: False
|
| 462 |
+
2023-04-05 12:49:46,546 TRAIN : loss 0.21568651497364044 - f1-score (micro avg) 0.8845
|
| 463 |
+
2023-04-05 12:49:50,126 Evaluating as a multi-label problem: False
|
| 464 |
+
2023-04-05 12:49:50,138 DEV : loss 0.3467901051044464 - f1-score (micro avg) 0.8182
|
| 465 |
+
2023-04-05 12:49:50,144 BAD EPOCHS (no improvement): 3
|
| 466 |
+
2023-04-05 12:49:50,145 ----------------------------------------------------------------------------------------------------
|
| 467 |
+
2023-04-05 12:49:53,171 epoch 8 - iter 7/71 - loss 0.25938026 - time (sec): 3.03 - samples/sec: 191.70 - lr: 0.100000
|
| 468 |
+
2023-04-05 12:49:56,942 epoch 8 - iter 14/71 - loss 0.25775377 - time (sec): 6.80 - samples/sec: 173.18 - lr: 0.100000
|
| 469 |
+
2023-04-05 12:50:00,558 epoch 8 - iter 21/71 - loss 0.25357329 - time (sec): 10.41 - samples/sec: 168.64 - lr: 0.100000
|
| 470 |
+
2023-04-05 12:50:04,296 epoch 8 - iter 28/71 - loss 0.24824351 - time (sec): 14.15 - samples/sec: 164.02 - lr: 0.100000
|
| 471 |
+
2023-04-05 12:50:08,001 epoch 8 - iter 35/71 - loss 0.25876964 - time (sec): 17.86 - samples/sec: 162.19 - lr: 0.100000
|
| 472 |
+
2023-04-05 12:50:11,740 epoch 8 - iter 42/71 - loss 0.25788824 - time (sec): 21.60 - samples/sec: 162.58 - lr: 0.100000
|
| 473 |
+
2023-04-05 12:50:15,342 epoch 8 - iter 49/71 - loss 0.25424281 - time (sec): 25.20 - samples/sec: 162.72 - lr: 0.100000
|
| 474 |
+
2023-04-05 12:50:19,188 epoch 8 - iter 56/71 - loss 0.25450229 - time (sec): 29.04 - samples/sec: 161.73 - lr: 0.100000
|
| 475 |
+
2023-04-05 12:50:23,093 epoch 8 - iter 63/71 - loss 0.25746436 - time (sec): 32.95 - samples/sec: 160.01 - lr: 0.100000
|
| 476 |
+
2023-04-05 12:50:27,329 epoch 8 - iter 70/71 - loss 0.25933626 - time (sec): 37.18 - samples/sec: 158.05 - lr: 0.100000
|
| 477 |
+
2023-04-05 12:50:28,163 ----------------------------------------------------------------------------------------------------
|
| 478 |
+
2023-04-05 12:50:28,165 EPOCH 8 done: loss 0.2589 - lr 0.100000
|
| 479 |
+
2023-04-05 12:50:59,296 Evaluating as a multi-label problem: False
|
| 480 |
+
2023-04-05 12:50:59,312 TRAIN : loss 0.19819645583629608 - f1-score (micro avg) 0.8943
|
| 481 |
+
2023-04-05 12:51:02,965 Evaluating as a multi-label problem: False
|
| 482 |
+
2023-04-05 12:51:02,980 DEV : loss 0.33811214566230774 - f1-score (micro avg) 0.8118
|
| 483 |
+
2023-04-05 12:51:02,988 Epoch 8: reducing learning rate of group 0 to 5.0000e-02.
|
| 484 |
+
2023-04-05 12:51:02,989 BAD EPOCHS (no improvement): 4
|
| 485 |
+
2023-04-05 12:51:02,990 ----------------------------------------------------------------------------------------------------
|
| 486 |
+
2023-04-05 12:51:06,070 epoch 9 - iter 7/71 - loss 0.23812631 - time (sec): 3.08 - samples/sec: 194.17 - lr: 0.050000
|
| 487 |
+
2023-04-05 12:51:09,867 epoch 9 - iter 14/71 - loss 0.19886070 - time (sec): 6.88 - samples/sec: 169.72 - lr: 0.050000
|
| 488 |
+
2023-04-05 12:51:13,813 epoch 9 - iter 21/71 - loss 0.22626942 - time (sec): 10.82 - samples/sec: 161.52 - lr: 0.050000
|
| 489 |
+
2023-04-05 12:51:17,892 epoch 9 - iter 28/71 - loss 0.22291349 - time (sec): 14.90 - samples/sec: 158.04 - lr: 0.050000
|
| 490 |
+
2023-04-05 12:51:21,762 epoch 9 - iter 35/71 - loss 0.22330927 - time (sec): 18.77 - samples/sec: 155.93 - lr: 0.050000
|
| 491 |
+
2023-04-05 12:51:25,462 epoch 9 - iter 42/71 - loss 0.22661256 - time (sec): 22.47 - samples/sec: 158.69 - lr: 0.050000
|
| 492 |
+
2023-04-05 12:51:29,109 epoch 9 - iter 49/71 - loss 0.22209246 - time (sec): 26.12 - samples/sec: 158.89 - lr: 0.050000
|
| 493 |
+
2023-04-05 12:51:32,660 epoch 9 - iter 56/71 - loss 0.21543228 - time (sec): 29.67 - samples/sec: 158.89 - lr: 0.050000
|
| 494 |
+
2023-04-05 12:51:36,421 epoch 9 - iter 63/71 - loss 0.22191567 - time (sec): 33.43 - samples/sec: 158.57 - lr: 0.050000
|
| 495 |
+
2023-04-05 12:51:40,056 epoch 9 - iter 70/71 - loss 0.22014300 - time (sec): 37.06 - samples/sec: 158.83 - lr: 0.050000
|
| 496 |
+
2023-04-05 12:51:40,907 ----------------------------------------------------------------------------------------------------
|
| 497 |
+
2023-04-05 12:51:40,908 EPOCH 9 done: loss 0.2208 - lr 0.050000
|
| 498 |
+
2023-04-05 12:52:11,299 Evaluating as a multi-label problem: False
|
| 499 |
+
2023-04-05 12:52:11,317 TRAIN : loss 0.14378328621387482 - f1-score (micro avg) 0.9285
|
| 500 |
+
2023-04-05 12:52:14,963 Evaluating as a multi-label problem: False
|
| 501 |
+
2023-04-05 12:52:14,976 DEV : loss 0.2700260877609253 - f1-score (micro avg) 0.8339
|
| 502 |
+
2023-04-05 12:52:14,981 BAD EPOCHS (no improvement): 0
|
| 503 |
+
2023-04-05 12:52:14,983 saving best model
|
| 504 |
+
2023-04-05 12:52:16,467 ----------------------------------------------------------------------------------------------------
|
| 505 |
+
2023-04-05 12:52:19,567 epoch 10 - iter 7/71 - loss 0.17365492 - time (sec): 3.10 - samples/sec: 178.18 - lr: 0.050000
|
| 506 |
+
2023-04-05 12:52:23,341 epoch 10 - iter 14/71 - loss 0.18889650 - time (sec): 6.87 - samples/sec: 165.16 - lr: 0.050000
|
| 507 |
+
2023-04-05 12:52:27,018 epoch 10 - iter 21/71 - loss 0.20267585 - time (sec): 10.55 - samples/sec: 165.89 - lr: 0.050000
|
| 508 |
+
2023-04-05 12:52:30,734 epoch 10 - iter 28/71 - loss 0.19907475 - time (sec): 14.26 - samples/sec: 163.63 - lr: 0.050000
|
| 509 |
+
2023-04-05 12:52:34,373 epoch 10 - iter 35/71 - loss 0.19695937 - time (sec): 17.90 - samples/sec: 162.87 - lr: 0.050000
|
| 510 |
+
2023-04-05 12:52:38,176 epoch 10 - iter 42/71 - loss 0.20787867 - time (sec): 21.71 - samples/sec: 163.27 - lr: 0.050000
|
| 511 |
+
2023-04-05 12:52:41,917 epoch 10 - iter 49/71 - loss 0.21020299 - time (sec): 25.45 - samples/sec: 162.92 - lr: 0.050000
|
| 512 |
+
2023-04-05 12:52:45,576 epoch 10 - iter 56/71 - loss 0.21217935 - time (sec): 29.11 - samples/sec: 162.53 - lr: 0.050000
|
| 513 |
+
2023-04-05 12:52:49,330 epoch 10 - iter 63/71 - loss 0.20993717 - time (sec): 32.86 - samples/sec: 161.37 - lr: 0.050000
|
| 514 |
+
2023-04-05 12:52:53,093 epoch 10 - iter 70/71 - loss 0.20975981 - time (sec): 36.62 - samples/sec: 160.74 - lr: 0.050000
|
| 515 |
+
2023-04-05 12:52:53,956 ----------------------------------------------------------------------------------------------------
|
| 516 |
+
2023-04-05 12:52:53,957 EPOCH 10 done: loss 0.2100 - lr 0.050000
|
| 517 |
+
2023-04-05 12:53:24,907 Evaluating as a multi-label problem: False
|
| 518 |
+
2023-04-05 12:53:24,923 TRAIN : loss 0.1295960396528244 - f1-score (micro avg) 0.936
|
| 519 |
+
2023-04-05 12:53:28,577 Evaluating as a multi-label problem: False
|
| 520 |
+
2023-04-05 12:53:28,590 DEV : loss 0.2524380087852478 - f1-score (micro avg) 0.8561
|
| 521 |
+
2023-04-05 12:53:28,596 BAD EPOCHS (no improvement): 0
|
| 522 |
+
2023-04-05 12:53:28,601 saving best model
|
| 523 |
+
2023-04-05 12:53:29,758 ----------------------------------------------------------------------------------------------------
|
| 524 |
+
2023-04-05 12:53:32,976 epoch 11 - iter 7/71 - loss 0.23687125 - time (sec): 3.22 - samples/sec: 196.25 - lr: 0.050000
|
| 525 |
+
2023-04-05 12:53:36,734 epoch 11 - iter 14/71 - loss 0.23011958 - time (sec): 6.97 - samples/sec: 175.94 - lr: 0.050000
|
| 526 |
+
2023-04-05 12:53:40,400 epoch 11 - iter 21/71 - loss 0.20729973 - time (sec): 10.64 - samples/sec: 170.31 - lr: 0.050000
|
| 527 |
+
2023-04-05 12:53:44,055 epoch 11 - iter 28/71 - loss 0.20154173 - time (sec): 14.29 - samples/sec: 166.99 - lr: 0.050000
|
| 528 |
+
2023-04-05 12:53:47,926 epoch 11 - iter 35/71 - loss 0.20303865 - time (sec): 18.17 - samples/sec: 164.55 - lr: 0.050000
|
| 529 |
+
2023-04-05 12:53:51,746 epoch 11 - iter 42/71 - loss 0.19656997 - time (sec): 21.99 - samples/sec: 162.47 - lr: 0.050000
|
| 530 |
+
2023-04-05 12:53:55,488 epoch 11 - iter 49/71 - loss 0.20078721 - time (sec): 25.73 - samples/sec: 160.57 - lr: 0.050000
|
| 531 |
+
2023-04-05 12:53:59,182 epoch 11 - iter 56/71 - loss 0.19665885 - time (sec): 29.42 - samples/sec: 160.26 - lr: 0.050000
|
| 532 |
+
2023-04-05 12:54:03,005 epoch 11 - iter 63/71 - loss 0.19670097 - time (sec): 33.24 - samples/sec: 158.79 - lr: 0.050000
|
| 533 |
+
2023-04-05 12:54:06,832 epoch 11 - iter 70/71 - loss 0.19415408 - time (sec): 37.07 - samples/sec: 158.61 - lr: 0.050000
|
| 534 |
+
2023-04-05 12:54:07,718 ----------------------------------------------------------------------------------------------------
|
| 535 |
+
2023-04-05 12:54:07,719 EPOCH 11 done: loss 0.1942 - lr 0.050000
|
| 536 |
+
2023-04-05 12:54:38,401 Evaluating as a multi-label problem: False
|
| 537 |
+
2023-04-05 12:54:38,416 TRAIN : loss 0.1299719512462616 - f1-score (micro avg) 0.9345
|
| 538 |
+
2023-04-05 12:54:42,043 Evaluating as a multi-label problem: False
|
| 539 |
+
2023-04-05 12:54:42,055 DEV : loss 0.2761968970298767 - f1-score (micro avg) 0.8452
|
| 540 |
+
2023-04-05 12:54:42,061 BAD EPOCHS (no improvement): 1
|
| 541 |
+
2023-04-05 12:54:42,062 ----------------------------------------------------------------------------------------------------
|
| 542 |
+
2023-04-05 12:54:45,126 epoch 12 - iter 7/71 - loss 0.18339264 - time (sec): 3.06 - samples/sec: 187.09 - lr: 0.050000
|
| 543 |
+
2023-04-05 12:54:48,926 epoch 12 - iter 14/71 - loss 0.19237624 - time (sec): 6.86 - samples/sec: 173.27 - lr: 0.050000
|
| 544 |
+
2023-04-05 12:54:52,633 epoch 12 - iter 21/71 - loss 0.19432209 - time (sec): 10.57 - samples/sec: 166.42 - lr: 0.050000
|
| 545 |
+
2023-04-05 12:54:56,545 epoch 12 - iter 28/71 - loss 0.20200765 - time (sec): 14.48 - samples/sec: 162.90 - lr: 0.050000
|
| 546 |
+
2023-04-05 12:55:00,495 epoch 12 - iter 35/71 - loss 0.19446487 - time (sec): 18.43 - samples/sec: 160.60 - lr: 0.050000
|
| 547 |
+
2023-04-05 12:55:04,256 epoch 12 - iter 42/71 - loss 0.19910943 - time (sec): 22.19 - samples/sec: 160.01 - lr: 0.050000
|
| 548 |
+
2023-04-05 12:55:08,062 epoch 12 - iter 49/71 - loss 0.19637866 - time (sec): 26.00 - samples/sec: 158.66 - lr: 0.050000
|
| 549 |
+
2023-04-05 12:55:11,765 epoch 12 - iter 56/71 - loss 0.19106381 - time (sec): 29.70 - samples/sec: 158.17 - lr: 0.050000
|
| 550 |
+
2023-04-05 12:55:15,677 epoch 12 - iter 63/71 - loss 0.19328764 - time (sec): 33.61 - samples/sec: 157.71 - lr: 0.050000
|
| 551 |
+
2023-04-05 12:55:19,610 epoch 12 - iter 70/71 - loss 0.18986505 - time (sec): 37.55 - samples/sec: 156.77 - lr: 0.050000
|
| 552 |
+
2023-04-05 12:55:20,532 ----------------------------------------------------------------------------------------------------
|
| 553 |
+
2023-04-05 12:55:20,534 EPOCH 12 done: loss 0.1892 - lr 0.050000
|
| 554 |
+
2023-04-05 12:55:51,520 Evaluating as a multi-label problem: False
|
| 555 |
+
2023-04-05 12:55:51,541 TRAIN : loss 0.11139164865016937 - f1-score (micro avg) 0.9468
|
| 556 |
+
2023-04-05 12:55:55,099 Evaluating as a multi-label problem: False
|
| 557 |
+
2023-04-05 12:55:55,107 DEV : loss 0.24872702360153198 - f1-score (micro avg) 0.8656
|
| 558 |
+
2023-04-05 12:55:55,114 BAD EPOCHS (no improvement): 0
|
| 559 |
+
2023-04-05 12:55:55,116 saving best model
|
| 560 |
+
2023-04-05 12:55:56,261 ----------------------------------------------------------------------------------------------------
|
| 561 |
+
2023-04-05 12:55:59,215 epoch 13 - iter 7/71 - loss 0.13416757 - time (sec): 2.95 - samples/sec: 191.70 - lr: 0.050000
|
| 562 |
+
2023-04-05 12:56:02,877 epoch 13 - iter 14/71 - loss 0.14492713 - time (sec): 6.61 - samples/sec: 173.40 - lr: 0.050000
|
| 563 |
+
2023-04-05 12:56:06,537 epoch 13 - iter 21/71 - loss 0.15026220 - time (sec): 10.27 - samples/sec: 165.55 - lr: 0.050000
|
| 564 |
+
2023-04-05 12:56:10,099 epoch 13 - iter 28/71 - loss 0.14928537 - time (sec): 13.84 - samples/sec: 166.95 - lr: 0.050000
|
| 565 |
+
2023-04-05 12:56:13,686 epoch 13 - iter 35/71 - loss 0.14769742 - time (sec): 17.42 - samples/sec: 165.57 - lr: 0.050000
|
| 566 |
+
2023-04-05 12:56:17,287 epoch 13 - iter 42/71 - loss 0.15688659 - time (sec): 21.02 - samples/sec: 164.09 - lr: 0.050000
|
| 567 |
+
2023-04-05 12:56:20,996 epoch 13 - iter 49/71 - loss 0.16690754 - time (sec): 24.73 - samples/sec: 164.31 - lr: 0.050000
|
| 568 |
+
2023-04-05 12:56:24,678 epoch 13 - iter 56/71 - loss 0.17086305 - time (sec): 28.42 - samples/sec: 164.45 - lr: 0.050000
|
| 569 |
+
2023-04-05 12:56:28,394 epoch 13 - iter 63/71 - loss 0.17395421 - time (sec): 32.13 - samples/sec: 164.13 - lr: 0.050000
|
| 570 |
+
2023-04-05 12:56:32,037 epoch 13 - iter 70/71 - loss 0.17863766 - time (sec): 35.77 - samples/sec: 164.47 - lr: 0.050000
|
| 571 |
+
2023-04-05 12:56:32,895 ----------------------------------------------------------------------------------------------------
|
| 572 |
+
2023-04-05 12:56:32,895 EPOCH 13 done: loss 0.1786 - lr 0.050000
|
| 573 |
+
2023-04-05 12:57:02,811 Evaluating as a multi-label problem: False
|
| 574 |
+
2023-04-05 12:57:02,831 TRAIN : loss 0.11495152860879898 - f1-score (micro avg) 0.9423
|
| 575 |
+
2023-04-05 12:57:06,419 Evaluating as a multi-label problem: False
|
| 576 |
+
2023-04-05 12:57:06,429 DEV : loss 0.2527526021003723 - f1-score (micro avg) 0.8571
|
| 577 |
+
2023-04-05 12:57:06,433 BAD EPOCHS (no improvement): 1
|
| 578 |
+
2023-04-05 12:57:06,435 ----------------------------------------------------------------------------------------------------
|
| 579 |
+
2023-04-05 12:57:09,526 epoch 14 - iter 7/71 - loss 0.19849680 - time (sec): 3.09 - samples/sec: 198.95 - lr: 0.050000
|
| 580 |
+
2023-04-05 12:57:13,202 epoch 14 - iter 14/71 - loss 0.18692846 - time (sec): 6.77 - samples/sec: 173.93 - lr: 0.050000
|
| 581 |
+
2023-04-05 12:57:16,931 epoch 14 - iter 21/71 - loss 0.16410048 - time (sec): 10.50 - samples/sec: 165.11 - lr: 0.050000
|
| 582 |
+
2023-04-05 12:57:20,568 epoch 14 - iter 28/71 - loss 0.17096232 - time (sec): 14.13 - samples/sec: 164.73 - lr: 0.050000
|
| 583 |
+
2023-04-05 12:57:24,320 epoch 14 - iter 35/71 - loss 0.17130471 - time (sec): 17.88 - samples/sec: 164.61 - lr: 0.050000
|
| 584 |
+
2023-04-05 12:57:28,263 epoch 14 - iter 42/71 - loss 0.17276982 - time (sec): 21.83 - samples/sec: 162.55 - lr: 0.050000
|
| 585 |
+
2023-04-05 12:57:31,855 epoch 14 - iter 49/71 - loss 0.17053371 - time (sec): 25.42 - samples/sec: 161.88 - lr: 0.050000
|
| 586 |
+
2023-04-05 12:57:35,454 epoch 14 - iter 56/71 - loss 0.17009003 - time (sec): 29.02 - samples/sec: 161.72 - lr: 0.050000
|
| 587 |
+
2023-04-05 12:57:39,094 epoch 14 - iter 63/71 - loss 0.16765044 - time (sec): 32.66 - samples/sec: 161.28 - lr: 0.050000
|
| 588 |
+
2023-04-05 12:57:42,812 epoch 14 - iter 70/71 - loss 0.17106035 - time (sec): 36.38 - samples/sec: 161.78 - lr: 0.050000
|
| 589 |
+
2023-04-05 12:57:43,636 ----------------------------------------------------------------------------------------------------
|
| 590 |
+
2023-04-05 12:57:43,637 EPOCH 14 done: loss 0.1709 - lr 0.050000
|
| 591 |
+
2023-04-05 12:58:13,469 Evaluating as a multi-label problem: False
|
| 592 |
+
2023-04-05 12:58:13,486 TRAIN : loss 0.09600471705198288 - f1-score (micro avg) 0.9574
|
| 593 |
+
2023-04-05 12:58:17,049 Evaluating as a multi-label problem: False
|
| 594 |
+
2023-04-05 12:58:17,060 DEV : loss 0.2339187115430832 - f1-score (micro avg) 0.8799
|
| 595 |
+
2023-04-05 12:58:17,066 BAD EPOCHS (no improvement): 0
|
| 596 |
+
2023-04-05 12:58:17,068 saving best model
|
| 597 |
+
2023-04-05 12:58:18,074 ----------------------------------------------------------------------------------------------------
|
| 598 |
+
2023-04-05 12:58:21,099 epoch 15 - iter 7/71 - loss 0.18573106 - time (sec): 3.02 - samples/sec: 195.41 - lr: 0.050000
|
| 599 |
+
2023-04-05 12:58:24,823 epoch 15 - iter 14/71 - loss 0.16845787 - time (sec): 6.75 - samples/sec: 173.09 - lr: 0.050000
|
| 600 |
+
2023-04-05 12:58:28,520 epoch 15 - iter 21/71 - loss 0.17292932 - time (sec): 10.45 - samples/sec: 168.59 - lr: 0.050000
|
| 601 |
+
2023-04-05 12:58:32,285 epoch 15 - iter 28/71 - loss 0.16194007 - time (sec): 14.21 - samples/sec: 164.53 - lr: 0.050000
|
| 602 |
+
2023-04-05 12:58:35,962 epoch 15 - iter 35/71 - loss 0.16402796 - time (sec): 17.89 - samples/sec: 163.30 - lr: 0.050000
|
| 603 |
+
2023-04-05 12:58:39,707 epoch 15 - iter 42/71 - loss 0.16481449 - time (sec): 21.63 - samples/sec: 163.18 - lr: 0.050000
|
| 604 |
+
2023-04-05 12:58:43,272 epoch 15 - iter 49/71 - loss 0.16905415 - time (sec): 25.20 - samples/sec: 163.51 - lr: 0.050000
|
| 605 |
+
2023-04-05 12:58:46,892 epoch 15 - iter 56/71 - loss 0.17389761 - time (sec): 28.82 - samples/sec: 162.99 - lr: 0.050000
|
| 606 |
+
2023-04-05 12:58:50,539 epoch 15 - iter 63/71 - loss 0.18006799 - time (sec): 32.46 - samples/sec: 163.29 - lr: 0.050000
|
| 607 |
+
2023-04-05 12:58:54,178 epoch 15 - iter 70/71 - loss 0.17852127 - time (sec): 36.10 - samples/sec: 162.76 - lr: 0.050000
|
| 608 |
+
2023-04-05 12:58:54,989 ----------------------------------------------------------------------------------------------------
|
| 609 |
+
2023-04-05 12:58:54,990 EPOCH 15 done: loss 0.1795 - lr 0.050000
|
| 610 |
+
2023-04-05 12:59:25,120 Evaluating as a multi-label problem: False
|
| 611 |
+
2023-04-05 12:59:25,142 TRAIN : loss 0.10160095989704132 - f1-score (micro avg) 0.9519
|
| 612 |
+
2023-04-05 12:59:28,704 Evaluating as a multi-label problem: False
|
| 613 |
+
2023-04-05 12:59:28,715 DEV : loss 0.22547538578510284 - f1-score (micro avg) 0.8848
|
| 614 |
+
2023-04-05 12:59:28,722 BAD EPOCHS (no improvement): 0
|
| 615 |
+
2023-04-05 12:59:28,724 saving best model
|
| 616 |
+
2023-04-05 12:59:29,886 ----------------------------------------------------------------------------------------------------
|
| 617 |
+
2023-04-05 12:59:32,863 epoch 16 - iter 7/71 - loss 0.18803312 - time (sec): 2.98 - samples/sec: 201.30 - lr: 0.050000
|
| 618 |
+
2023-04-05 12:59:36,639 epoch 16 - iter 14/71 - loss 0.17472744 - time (sec): 6.75 - samples/sec: 176.98 - lr: 0.050000
|
| 619 |
+
2023-04-05 12:59:40,239 epoch 16 - iter 21/71 - loss 0.17674174 - time (sec): 10.35 - samples/sec: 173.11 - lr: 0.050000
|
| 620 |
+
2023-04-05 12:59:43,953 epoch 16 - iter 28/71 - loss 0.18068979 - time (sec): 14.07 - samples/sec: 170.13 - lr: 0.050000
|
| 621 |
+
2023-04-05 12:59:47,620 epoch 16 - iter 35/71 - loss 0.18222164 - time (sec): 17.73 - samples/sec: 168.22 - lr: 0.050000
|
| 622 |
+
2023-04-05 12:59:51,255 epoch 16 - iter 42/71 - loss 0.18013268 - time (sec): 21.37 - samples/sec: 167.36 - lr: 0.050000
|
| 623 |
+
2023-04-05 12:59:54,891 epoch 16 - iter 49/71 - loss 0.17236708 - time (sec): 25.00 - samples/sec: 166.02 - lr: 0.050000
|
| 624 |
+
2023-04-05 12:59:58,571 epoch 16 - iter 56/71 - loss 0.17486601 - time (sec): 28.68 - samples/sec: 164.55 - lr: 0.050000
|
| 625 |
+
2023-04-05 13:00:02,189 epoch 16 - iter 63/71 - loss 0.17192697 - time (sec): 32.30 - samples/sec: 164.58 - lr: 0.050000
|
| 626 |
+
2023-04-05 13:00:05,957 epoch 16 - iter 70/71 - loss 0.16671679 - time (sec): 36.07 - samples/sec: 163.16 - lr: 0.050000
|
| 627 |
+
2023-04-05 13:00:06,789 ----------------------------------------------------------------------------------------------------
|
| 628 |
+
2023-04-05 13:00:06,790 EPOCH 16 done: loss 0.1660 - lr 0.050000
|
| 629 |
+
2023-04-05 13:00:36,903 Evaluating as a multi-label problem: False
|
| 630 |
+
2023-04-05 13:00:36,919 TRAIN : loss 0.09528940916061401 - f1-score (micro avg) 0.9534
|
| 631 |
+
2023-04-05 13:00:40,480 Evaluating as a multi-label problem: False
|
| 632 |
+
2023-04-05 13:00:40,495 DEV : loss 0.2436159998178482 - f1-score (micro avg) 0.8731
|
| 633 |
+
2023-04-05 13:00:40,500 BAD EPOCHS (no improvement): 1
|
| 634 |
+
2023-04-05 13:00:40,503 ----------------------------------------------------------------------------------------------------
|
| 635 |
+
2023-04-05 13:00:43,573 epoch 17 - iter 7/71 - loss 0.16280295 - time (sec): 3.07 - samples/sec: 187.36 - lr: 0.050000
|
| 636 |
+
2023-04-05 13:00:47,316 epoch 17 - iter 14/71 - loss 0.16491960 - time (sec): 6.81 - samples/sec: 171.74 - lr: 0.050000
|
| 637 |
+
2023-04-05 13:00:51,066 epoch 17 - iter 21/71 - loss 0.16969325 - time (sec): 10.56 - samples/sec: 171.45 - lr: 0.050000
|
| 638 |
+
2023-04-05 13:00:54,815 epoch 17 - iter 28/71 - loss 0.16213035 - time (sec): 14.31 - samples/sec: 166.86 - lr: 0.050000
|
| 639 |
+
2023-04-05 13:00:58,536 epoch 17 - iter 35/71 - loss 0.16020484 - time (sec): 18.03 - samples/sec: 165.03 - lr: 0.050000
|
| 640 |
+
2023-04-05 13:01:02,186 epoch 17 - iter 42/71 - loss 0.16493772 - time (sec): 21.68 - samples/sec: 162.94 - lr: 0.050000
|
| 641 |
+
2023-04-05 13:01:05,927 epoch 17 - iter 49/71 - loss 0.16161421 - time (sec): 25.42 - samples/sec: 162.80 - lr: 0.050000
|
| 642 |
+
2023-04-05 13:01:09,546 epoch 17 - iter 56/71 - loss 0.16247358 - time (sec): 29.04 - samples/sec: 163.49 - lr: 0.050000
|
| 643 |
+
2023-04-05 13:01:13,292 epoch 17 - iter 63/71 - loss 0.16001124 - time (sec): 32.79 - samples/sec: 161.34 - lr: 0.050000
|
| 644 |
+
2023-04-05 13:01:17,024 epoch 17 - iter 70/71 - loss 0.16184123 - time (sec): 36.52 - samples/sec: 161.03 - lr: 0.050000
|
| 645 |
+
2023-04-05 13:01:17,872 ----------------------------------------------------------------------------------------------------
|
| 646 |
+
2023-04-05 13:01:17,873 EPOCH 17 done: loss 0.1613 - lr 0.050000
|
| 647 |
+
2023-04-05 13:01:48,038 Evaluating as a multi-label problem: False
|
| 648 |
+
2023-04-05 13:01:48,055 TRAIN : loss 0.09658616781234741 - f1-score (micro avg) 0.9508
|
| 649 |
+
2023-04-05 13:01:51,710 Evaluating as a multi-label problem: False
|
| 650 |
+
2023-04-05 13:01:51,723 DEV : loss 0.25625115633010864 - f1-score (micro avg) 0.8651
|
| 651 |
+
2023-04-05 13:01:51,730 BAD EPOCHS (no improvement): 2
|
| 652 |
+
2023-04-05 13:01:51,731 ----------------------------------------------------------------------------------------------------
|
| 653 |
+
2023-04-05 13:01:54,950 epoch 18 - iter 7/71 - loss 0.14699763 - time (sec): 3.22 - samples/sec: 178.08 - lr: 0.050000
|
| 654 |
+
2023-04-05 13:01:58,906 epoch 18 - iter 14/71 - loss 0.11720383 - time (sec): 7.17 - samples/sec: 156.81 - lr: 0.050000
|
| 655 |
+
2023-04-05 13:02:02,544 epoch 18 - iter 21/71 - loss 0.12927531 - time (sec): 10.81 - samples/sec: 160.28 - lr: 0.050000
|
| 656 |
+
2023-04-05 13:02:06,291 epoch 18 - iter 28/71 - loss 0.13428711 - time (sec): 14.56 - samples/sec: 159.42 - lr: 0.050000
|
| 657 |
+
2023-04-05 13:02:10,110 epoch 18 - iter 35/71 - loss 0.13574585 - time (sec): 18.38 - samples/sec: 157.52 - lr: 0.050000
|
| 658 |
+
2023-04-05 13:02:13,768 epoch 18 - iter 42/71 - loss 0.13144409 - time (sec): 22.04 - samples/sec: 158.47 - lr: 0.050000
|
| 659 |
+
2023-04-05 13:02:17,521 epoch 18 - iter 49/71 - loss 0.13997386 - time (sec): 25.79 - samples/sec: 158.95 - lr: 0.050000
|
| 660 |
+
2023-04-05 13:02:21,181 epoch 18 - iter 56/71 - loss 0.15194990 - time (sec): 29.45 - samples/sec: 159.70 - lr: 0.050000
|
| 661 |
+
2023-04-05 13:02:24,954 epoch 18 - iter 63/71 - loss 0.15543297 - time (sec): 33.22 - samples/sec: 159.95 - lr: 0.050000
|
| 662 |
+
2023-04-05 13:02:28,617 epoch 18 - iter 70/71 - loss 0.15181266 - time (sec): 36.89 - samples/sec: 159.47 - lr: 0.050000
|
| 663 |
+
2023-04-05 13:02:29,438 ----------------------------------------------------------------------------------------------------
|
| 664 |
+
2023-04-05 13:02:29,438 EPOCH 18 done: loss 0.1521 - lr 0.050000
|
| 665 |
+
2023-04-05 13:03:00,408 Evaluating as a multi-label problem: False
|
| 666 |
+
2023-04-05 13:03:00,426 TRAIN : loss 0.07957068085670471 - f1-score (micro avg) 0.9605
|
| 667 |
+
2023-04-05 13:03:04,203 Evaluating as a multi-label problem: False
|
| 668 |
+
2023-04-05 13:03:04,214 DEV : loss 0.24449127912521362 - f1-score (micro avg) 0.8678
|
| 669 |
+
2023-04-05 13:03:04,220 BAD EPOCHS (no improvement): 3
|
| 670 |
+
2023-04-05 13:03:04,221 ----------------------------------------------------------------------------------------------------
|
| 671 |
+
2023-04-05 13:03:07,272 epoch 19 - iter 7/71 - loss 0.14268361 - time (sec): 3.05 - samples/sec: 192.50 - lr: 0.050000
|
| 672 |
+
2023-04-05 13:03:11,108 epoch 19 - iter 14/71 - loss 0.11705529 - time (sec): 6.89 - samples/sec: 171.38 - lr: 0.050000
|
| 673 |
+
2023-04-05 13:03:14,913 epoch 19 - iter 21/71 - loss 0.13677063 - time (sec): 10.69 - samples/sec: 166.87 - lr: 0.050000
|
| 674 |
+
2023-04-05 13:03:18,612 epoch 19 - iter 28/71 - loss 0.14036431 - time (sec): 14.39 - samples/sec: 164.42 - lr: 0.050000
|
| 675 |
+
2023-04-05 13:03:22,346 epoch 19 - iter 35/71 - loss 0.14069950 - time (sec): 18.12 - samples/sec: 162.99 - lr: 0.050000
|
| 676 |
+
2023-04-05 13:03:26,137 epoch 19 - iter 42/71 - loss 0.14416925 - time (sec): 21.91 - samples/sec: 161.81 - lr: 0.050000
|
| 677 |
+
2023-04-05 13:03:29,727 epoch 19 - iter 49/71 - loss 0.14774011 - time (sec): 25.50 - samples/sec: 162.05 - lr: 0.050000
|
| 678 |
+
2023-04-05 13:03:33,433 epoch 19 - iter 56/71 - loss 0.14539107 - time (sec): 29.21 - samples/sec: 161.11 - lr: 0.050000
|
| 679 |
+
2023-04-05 13:03:37,072 epoch 19 - iter 63/71 - loss 0.14446964 - time (sec): 32.85 - samples/sec: 160.61 - lr: 0.050000
|
| 680 |
+
2023-04-05 13:03:40,670 epoch 19 - iter 70/71 - loss 0.14622801 - time (sec): 36.45 - samples/sec: 161.22 - lr: 0.050000
|
| 681 |
+
2023-04-05 13:03:41,501 ----------------------------------------------------------------------------------------------------
|
| 682 |
+
2023-04-05 13:03:41,502 EPOCH 19 done: loss 0.1476 - lr 0.050000
|
| 683 |
+
2023-04-05 13:04:11,752 Evaluating as a multi-label problem: False
|
| 684 |
+
2023-04-05 13:04:11,772 TRAIN : loss 0.08018826693296432 - f1-score (micro avg) 0.9626
|
| 685 |
+
2023-04-05 13:04:15,357 Evaluating as a multi-label problem: False
|
| 686 |
+
2023-04-05 13:04:15,368 DEV : loss 0.2329457700252533 - f1-score (micro avg) 0.8603
|
| 687 |
+
2023-04-05 13:04:15,373 Epoch 19: reducing learning rate of group 0 to 2.5000e-02.
|
| 688 |
+
2023-04-05 13:04:15,374 BAD EPOCHS (no improvement): 4
|
| 689 |
+
2023-04-05 13:04:15,376 ----------------------------------------------------------------------------------------------------
|
| 690 |
+
2023-04-05 13:04:18,529 epoch 20 - iter 7/71 - loss 0.15115543 - time (sec): 3.15 - samples/sec: 187.15 - lr: 0.025000
|
| 691 |
+
2023-04-05 13:04:22,232 epoch 20 - iter 14/71 - loss 0.14745089 - time (sec): 6.86 - samples/sec: 170.06 - lr: 0.025000
|
| 692 |
+
2023-04-05 13:04:25,979 epoch 20 - iter 21/71 - loss 0.13852706 - time (sec): 10.60 - samples/sec: 162.88 - lr: 0.025000
|
| 693 |
+
2023-04-05 13:04:29,734 epoch 20 - iter 28/71 - loss 0.13696755 - time (sec): 14.36 - samples/sec: 161.09 - lr: 0.025000
|
| 694 |
+
2023-04-05 13:04:33,339 epoch 20 - iter 35/71 - loss 0.13835721 - time (sec): 17.96 - samples/sec: 162.89 - lr: 0.025000
|
| 695 |
+
2023-04-05 13:04:36,924 epoch 20 - iter 42/71 - loss 0.13924535 - time (sec): 21.55 - samples/sec: 163.31 - lr: 0.025000
|
| 696 |
+
2023-04-05 13:04:40,601 epoch 20 - iter 49/71 - loss 0.13776399 - time (sec): 25.22 - samples/sec: 163.13 - lr: 0.025000
|
| 697 |
+
2023-04-05 13:04:44,100 epoch 20 - iter 56/71 - loss 0.13600369 - time (sec): 28.72 - samples/sec: 163.35 - lr: 0.025000
|
| 698 |
+
2023-04-05 13:04:47,772 epoch 20 - iter 63/71 - loss 0.13546503 - time (sec): 32.40 - samples/sec: 163.72 - lr: 0.025000
|
| 699 |
+
2023-04-05 13:04:51,285 epoch 20 - iter 70/71 - loss 0.13381809 - time (sec): 35.91 - samples/sec: 163.86 - lr: 0.025000
|
| 700 |
+
2023-04-05 13:04:52,110 ----------------------------------------------------------------------------------------------------
|
| 701 |
+
2023-04-05 13:04:52,111 EPOCH 20 done: loss 0.1342 - lr 0.025000
|
| 702 |
+
2023-04-05 13:05:21,762 Evaluating as a multi-label problem: False
|
| 703 |
+
2023-04-05 13:05:21,781 TRAIN : loss 0.06673520058393478 - f1-score (micro avg) 0.9674
|
| 704 |
+
2023-04-05 13:05:25,271 Evaluating as a multi-label problem: False
|
| 705 |
+
2023-04-05 13:05:25,281 DEV : loss 0.22465617954730988 - f1-score (micro avg) 0.8848
|
| 706 |
+
2023-04-05 13:05:25,286 BAD EPOCHS (no improvement): 0
|
| 707 |
+
2023-04-05 13:05:25,287 ----------------------------------------------------------------------------------------------------
|
| 708 |
+
2023-04-05 13:05:28,197 epoch 21 - iter 7/71 - loss 0.14072648 - time (sec): 2.91 - samples/sec: 198.61 - lr: 0.025000
|
| 709 |
+
2023-04-05 13:05:31,972 epoch 21 - iter 14/71 - loss 0.12496798 - time (sec): 6.68 - samples/sec: 176.52 - lr: 0.025000
|
| 710 |
+
2023-04-05 13:05:35,617 epoch 21 - iter 21/71 - loss 0.12397679 - time (sec): 10.33 - samples/sec: 172.03 - lr: 0.025000
|
| 711 |
+
2023-04-05 13:05:39,209 epoch 21 - iter 28/71 - loss 0.12982706 - time (sec): 13.92 - samples/sec: 169.16 - lr: 0.025000
|
| 712 |
+
2023-04-05 13:05:42,824 epoch 21 - iter 35/71 - loss 0.12812647 - time (sec): 17.54 - samples/sec: 164.51 - lr: 0.025000
|
| 713 |
+
2023-04-05 13:05:46,347 epoch 21 - iter 42/71 - loss 0.12693924 - time (sec): 21.06 - samples/sec: 163.67 - lr: 0.025000
|
| 714 |
+
2023-04-05 13:05:49,905 epoch 21 - iter 49/71 - loss 0.12797163 - time (sec): 24.62 - samples/sec: 165.08 - lr: 0.025000
|
| 715 |
+
2023-04-05 13:05:53,577 epoch 21 - iter 56/71 - loss 0.13077894 - time (sec): 28.29 - samples/sec: 164.40 - lr: 0.025000
|
| 716 |
+
2023-04-05 13:05:57,191 epoch 21 - iter 63/71 - loss 0.13144443 - time (sec): 31.90 - samples/sec: 163.71 - lr: 0.025000
|
| 717 |
+
2023-04-05 13:06:00,722 epoch 21 - iter 70/71 - loss 0.13238450 - time (sec): 35.43 - samples/sec: 166.14 - lr: 0.025000
|
| 718 |
+
2023-04-05 13:06:01,424 ----------------------------------------------------------------------------------------------------
|
| 719 |
+
2023-04-05 13:06:01,425 EPOCH 21 done: loss 0.1318 - lr 0.025000
|
| 720 |
+
2023-04-05 13:06:30,981 Evaluating as a multi-label problem: False
|
| 721 |
+
2023-04-05 13:06:30,997 TRAIN : loss 0.06548392027616501 - f1-score (micro avg) 0.9687
|
| 722 |
+
2023-04-05 13:06:34,514 Evaluating as a multi-label problem: False
|
| 723 |
+
2023-04-05 13:06:34,524 DEV : loss 0.22177472710609436 - f1-score (micro avg) 0.8868
|
| 724 |
+
2023-04-05 13:06:34,529 BAD EPOCHS (no improvement): 0
|
| 725 |
+
2023-04-05 13:06:34,531 saving best model
|
| 726 |
+
2023-04-05 13:06:36,118 ----------------------------------------------------------------------------------------------------
|
| 727 |
+
2023-04-05 13:06:39,070 epoch 22 - iter 7/71 - loss 0.10843301 - time (sec): 2.95 - samples/sec: 193.84 - lr: 0.025000
|
| 728 |
+
2023-04-05 13:06:42,690 epoch 22 - iter 14/71 - loss 0.12542959 - time (sec): 6.57 - samples/sec: 174.26 - lr: 0.025000
|
| 729 |
+
2023-04-05 13:06:46,379 epoch 22 - iter 21/71 - loss 0.11852796 - time (sec): 10.26 - samples/sec: 165.88 - lr: 0.025000
|
| 730 |
+
2023-04-05 13:06:50,145 epoch 22 - iter 28/71 - loss 0.11984452 - time (sec): 14.03 - samples/sec: 165.33 - lr: 0.025000
|
| 731 |
+
2023-04-05 13:06:53,619 epoch 22 - iter 35/71 - loss 0.12448799 - time (sec): 17.50 - samples/sec: 166.46 - lr: 0.025000
|
| 732 |
+
2023-04-05 13:06:57,251 epoch 22 - iter 42/71 - loss 0.12873390 - time (sec): 21.13 - samples/sec: 165.81 - lr: 0.025000
|
| 733 |
+
2023-04-05 13:07:00,920 epoch 22 - iter 49/71 - loss 0.12402659 - time (sec): 24.80 - samples/sec: 164.83 - lr: 0.025000
|
| 734 |
+
2023-04-05 13:07:04,547 epoch 22 - iter 56/71 - loss 0.12855204 - time (sec): 28.43 - samples/sec: 165.01 - lr: 0.025000
|
| 735 |
+
2023-04-05 13:07:08,144 epoch 22 - iter 63/71 - loss 0.12747032 - time (sec): 32.02 - samples/sec: 165.53 - lr: 0.025000
|
| 736 |
+
2023-04-05 13:07:11,868 epoch 22 - iter 70/71 - loss 0.12699574 - time (sec): 35.75 - samples/sec: 164.37 - lr: 0.025000
|
| 737 |
+
2023-04-05 13:07:12,670 ----------------------------------------------------------------------------------------------------
|
| 738 |
+
2023-04-05 13:07:12,671 EPOCH 22 done: loss 0.1266 - lr 0.025000
|
| 739 |
+
2023-04-05 13:07:42,659 Evaluating as a multi-label problem: False
|
| 740 |
+
2023-04-05 13:07:42,680 TRAIN : loss 0.06098590046167374 - f1-score (micro avg) 0.971
|
| 741 |
+
2023-04-05 13:07:46,246 Evaluating as a multi-label problem: False
|
| 742 |
+
2023-04-05 13:07:46,256 DEV : loss 0.21829380095005035 - f1-score (micro avg) 0.8881
|
| 743 |
+
2023-04-05 13:07:46,263 BAD EPOCHS (no improvement): 0
|
| 744 |
+
2023-04-05 13:07:46,264 saving best model
|
| 745 |
+
2023-04-05 13:07:47,600 ----------------------------------------------------------------------------------------------------
|
| 746 |
+
2023-04-05 13:07:50,798 epoch 23 - iter 7/71 - loss 0.15008917 - time (sec): 3.20 - samples/sec: 187.07 - lr: 0.025000
|
| 747 |
+
2023-04-05 13:07:54,819 epoch 23 - iter 14/71 - loss 0.14280265 - time (sec): 7.22 - samples/sec: 162.93 - lr: 0.025000
|
| 748 |
+
2023-04-05 13:07:58,438 epoch 23 - iter 21/71 - loss 0.13304625 - time (sec): 10.84 - samples/sec: 160.38 - lr: 0.025000
|
| 749 |
+
2023-04-05 13:08:02,077 epoch 23 - iter 28/71 - loss 0.12460526 - time (sec): 14.48 - samples/sec: 161.23 - lr: 0.025000
|
| 750 |
+
2023-04-05 13:08:05,787 epoch 23 - iter 35/71 - loss 0.11730006 - time (sec): 18.19 - samples/sec: 161.67 - lr: 0.025000
|
| 751 |
+
2023-04-05 13:08:09,415 epoch 23 - iter 42/71 - loss 0.11912754 - time (sec): 21.81 - samples/sec: 161.64 - lr: 0.025000
|
| 752 |
+
2023-04-05 13:08:13,061 epoch 23 - iter 49/71 - loss 0.12580741 - time (sec): 25.46 - samples/sec: 161.24 - lr: 0.025000
|
| 753 |
+
2023-04-05 13:08:16,757 epoch 23 - iter 56/71 - loss 0.12476487 - time (sec): 29.16 - samples/sec: 161.24 - lr: 0.025000
|
| 754 |
+
2023-04-05 13:08:20,720 epoch 23 - iter 63/71 - loss 0.12545874 - time (sec): 33.12 - samples/sec: 160.03 - lr: 0.025000
|
| 755 |
+
2023-04-05 13:08:24,409 epoch 23 - iter 70/71 - loss 0.12915321 - time (sec): 36.81 - samples/sec: 159.80 - lr: 0.025000
|
| 756 |
+
2023-04-05 13:08:25,252 ----------------------------------------------------------------------------------------------------
|
| 757 |
+
2023-04-05 13:08:25,253 EPOCH 23 done: loss 0.1288 - lr 0.025000
|
| 758 |
+
2023-04-05 13:08:55,761 Evaluating as a multi-label problem: False
|
| 759 |
+
2023-04-05 13:08:55,783 TRAIN : loss 0.05855342745780945 - f1-score (micro avg) 0.9743
|
| 760 |
+
2023-04-05 13:08:59,415 Evaluating as a multi-label problem: False
|
| 761 |
+
2023-04-05 13:08:59,424 DEV : loss 0.21743902564048767 - f1-score (micro avg) 0.8901
|
| 762 |
+
2023-04-05 13:08:59,429 BAD EPOCHS (no improvement): 0
|
| 763 |
+
2023-04-05 13:08:59,431 saving best model
|
| 764 |
+
2023-04-05 13:09:00,565 ----------------------------------------------------------------------------------------------------
|
| 765 |
+
2023-04-05 13:09:03,643 epoch 24 - iter 7/71 - loss 0.09932632 - time (sec): 3.08 - samples/sec: 193.76 - lr: 0.025000
|
| 766 |
+
2023-04-05 13:09:07,439 epoch 24 - iter 14/71 - loss 0.11461498 - time (sec): 6.87 - samples/sec: 173.88 - lr: 0.025000
|
| 767 |
+
2023-04-05 13:09:11,176 epoch 24 - iter 21/71 - loss 0.12509619 - time (sec): 10.61 - samples/sec: 169.85 - lr: 0.025000
|
| 768 |
+
2023-04-05 13:09:15,006 epoch 24 - iter 28/71 - loss 0.12628469 - time (sec): 14.44 - samples/sec: 164.13 - lr: 0.025000
|
| 769 |
+
2023-04-05 13:09:18,758 epoch 24 - iter 35/71 - loss 0.12660201 - time (sec): 18.19 - samples/sec: 162.82 - lr: 0.025000
|
| 770 |
+
2023-04-05 13:09:22,402 epoch 24 - iter 42/71 - loss 0.13016555 - time (sec): 21.84 - samples/sec: 163.50 - lr: 0.025000
|
| 771 |
+
2023-04-05 13:09:26,014 epoch 24 - iter 49/71 - loss 0.12703872 - time (sec): 25.45 - samples/sec: 163.55 - lr: 0.025000
|
| 772 |
+
2023-04-05 13:09:29,654 epoch 24 - iter 56/71 - loss 0.12356562 - time (sec): 29.09 - samples/sec: 162.13 - lr: 0.025000
|
| 773 |
+
2023-04-05 13:09:33,309 epoch 24 - iter 63/71 - loss 0.12854118 - time (sec): 32.74 - samples/sec: 162.21 - lr: 0.025000
|
| 774 |
+
2023-04-05 13:09:37,052 epoch 24 - iter 70/71 - loss 0.12844792 - time (sec): 36.49 - samples/sec: 161.19 - lr: 0.025000
|
| 775 |
+
2023-04-05 13:09:37,848 ----------------------------------------------------------------------------------------------------
|
| 776 |
+
2023-04-05 13:09:37,849 EPOCH 24 done: loss 0.1292 - lr 0.025000
|
| 777 |
+
2023-04-05 13:10:08,561 Evaluating as a multi-label problem: False
|
| 778 |
+
2023-04-05 13:10:08,579 TRAIN : loss 0.05762539058923721 - f1-score (micro avg) 0.9705
|
| 779 |
+
2023-04-05 13:10:12,126 Evaluating as a multi-label problem: False
|
| 780 |
+
2023-04-05 13:10:12,136 DEV : loss 0.21672436594963074 - f1-score (micro avg) 0.8831
|
| 781 |
+
2023-04-05 13:10:12,142 BAD EPOCHS (no improvement): 1
|
| 782 |
+
2023-04-05 13:10:12,143 ----------------------------------------------------------------------------------------------------
|
| 783 |
+
2023-04-05 13:10:15,262 epoch 25 - iter 7/71 - loss 0.10395902 - time (sec): 3.12 - samples/sec: 192.28 - lr: 0.025000
|
| 784 |
+
2023-04-05 13:10:19,012 epoch 25 - iter 14/71 - loss 0.11430174 - time (sec): 6.87 - samples/sec: 175.34 - lr: 0.025000
|
| 785 |
+
2023-04-05 13:10:22,832 epoch 25 - iter 21/71 - loss 0.11574001 - time (sec): 10.69 - samples/sec: 168.99 - lr: 0.025000
|
| 786 |
+
2023-04-05 13:10:26,537 epoch 25 - iter 28/71 - loss 0.12338041 - time (sec): 14.39 - samples/sec: 165.38 - lr: 0.025000
|
| 787 |
+
2023-04-05 13:10:30,355 epoch 25 - iter 35/71 - loss 0.12176332 - time (sec): 18.21 - samples/sec: 161.95 - lr: 0.025000
|
| 788 |
+
2023-04-05 13:10:34,020 epoch 25 - iter 42/71 - loss 0.11983290 - time (sec): 21.87 - samples/sec: 161.01 - lr: 0.025000
|
| 789 |
+
2023-04-05 13:10:37,749 epoch 25 - iter 49/71 - loss 0.11602293 - time (sec): 25.60 - samples/sec: 159.00 - lr: 0.025000
|
| 790 |
+
2023-04-05 13:10:41,445 epoch 25 - iter 56/71 - loss 0.11461970 - time (sec): 29.30 - samples/sec: 158.94 - lr: 0.025000
|
| 791 |
+
2023-04-05 13:10:45,185 epoch 25 - iter 63/71 - loss 0.11841378 - time (sec): 33.04 - samples/sec: 159.87 - lr: 0.025000
|
| 792 |
+
2023-04-05 13:10:48,840 epoch 25 - iter 70/71 - loss 0.12186183 - time (sec): 36.70 - samples/sec: 160.21 - lr: 0.025000
|
| 793 |
+
2023-04-05 13:10:49,657 ----------------------------------------------------------------------------------------------------
|
| 794 |
+
2023-04-05 13:10:49,659 EPOCH 25 done: loss 0.1226 - lr 0.025000
|
| 795 |
+
2023-04-05 13:11:20,148 Evaluating as a multi-label problem: False
|
| 796 |
+
2023-04-05 13:11:20,167 TRAIN : loss 0.05489451438188553 - f1-score (micro avg) 0.9758
|
| 797 |
+
2023-04-05 13:11:23,819 Evaluating as a multi-label problem: False
|
| 798 |
+
2023-04-05 13:11:23,833 DEV : loss 0.20767748355865479 - f1-score (micro avg) 0.8959
|
| 799 |
+
2023-04-05 13:11:23,837 BAD EPOCHS (no improvement): 0
|
| 800 |
+
2023-04-05 13:11:23,840 saving best model
|
| 801 |
+
2023-04-05 13:11:25,356 ----------------------------------------------------------------------------------------------------
|
| 802 |
+
2023-04-05 13:11:28,428 epoch 26 - iter 7/71 - loss 0.14960103 - time (sec): 3.07 - samples/sec: 195.35 - lr: 0.025000
|
| 803 |
+
2023-04-05 13:11:32,086 epoch 26 - iter 14/71 - loss 0.13459322 - time (sec): 6.73 - samples/sec: 176.72 - lr: 0.025000
|
| 804 |
+
2023-04-05 13:11:35,752 epoch 26 - iter 21/71 - loss 0.14211700 - time (sec): 10.39 - samples/sec: 168.54 - lr: 0.025000
|
| 805 |
+
2023-04-05 13:11:39,370 epoch 26 - iter 28/71 - loss 0.13695156 - time (sec): 14.01 - samples/sec: 166.77 - lr: 0.025000
|
| 806 |
+
2023-04-05 13:11:43,047 epoch 26 - iter 35/71 - loss 0.13044682 - time (sec): 17.69 - samples/sec: 164.66 - lr: 0.025000
|
| 807 |
+
2023-04-05 13:11:46,632 epoch 26 - iter 42/71 - loss 0.12209705 - time (sec): 21.27 - samples/sec: 162.92 - lr: 0.025000
|
| 808 |
+
2023-04-05 13:11:50,270 epoch 26 - iter 49/71 - loss 0.12045467 - time (sec): 24.91 - samples/sec: 163.36 - lr: 0.025000
|
| 809 |
+
2023-04-05 13:11:53,914 epoch 26 - iter 56/71 - loss 0.12313593 - time (sec): 28.56 - samples/sec: 163.85 - lr: 0.025000
|
| 810 |
+
2023-04-05 13:11:57,552 epoch 26 - iter 63/71 - loss 0.12808456 - time (sec): 32.20 - samples/sec: 164.06 - lr: 0.025000
|
| 811 |
+
2023-04-05 13:12:01,370 epoch 26 - iter 70/71 - loss 0.13030809 - time (sec): 36.01 - samples/sec: 163.25 - lr: 0.025000
|
| 812 |
+
2023-04-05 13:12:02,206 ----------------------------------------------------------------------------------------------------
|
| 813 |
+
2023-04-05 13:12:02,207 EPOCH 26 done: loss 0.1298 - lr 0.025000
|
| 814 |
+
2023-04-05 13:12:32,718 Evaluating as a multi-label problem: False
|
| 815 |
+
2023-04-05 13:12:32,738 TRAIN : loss 0.053497862070798874 - f1-score (micro avg) 0.9731
|
| 816 |
+
2023-04-05 13:12:36,375 Evaluating as a multi-label problem: False
|
| 817 |
+
2023-04-05 13:12:36,388 DEV : loss 0.21015514433383942 - f1-score (micro avg) 0.8782
|
| 818 |
+
2023-04-05 13:12:36,393 BAD EPOCHS (no improvement): 1
|
| 819 |
+
2023-04-05 13:12:36,395 ----------------------------------------------------------------------------------------------------
|
| 820 |
+
2023-04-05 13:12:39,594 epoch 27 - iter 7/71 - loss 0.14380847 - time (sec): 3.20 - samples/sec: 190.03 - lr: 0.025000
|
| 821 |
+
2023-04-05 13:12:43,352 epoch 27 - iter 14/71 - loss 0.14111434 - time (sec): 6.96 - samples/sec: 173.92 - lr: 0.025000
|
| 822 |
+
2023-04-05 13:12:47,121 epoch 27 - iter 21/71 - loss 0.12589741 - time (sec): 10.73 - samples/sec: 168.18 - lr: 0.025000
|
| 823 |
+
2023-04-05 13:12:51,064 epoch 27 - iter 28/71 - loss 0.12663826 - time (sec): 14.67 - samples/sec: 163.67 - lr: 0.025000
|
| 824 |
+
2023-04-05 13:12:54,988 epoch 27 - iter 35/71 - loss 0.12494789 - time (sec): 18.59 - samples/sec: 160.22 - lr: 0.025000
|
| 825 |
+
2023-04-05 13:12:58,720 epoch 27 - iter 42/71 - loss 0.12722136 - time (sec): 22.33 - samples/sec: 158.56 - lr: 0.025000
|
| 826 |
+
2023-04-05 13:13:02,624 epoch 27 - iter 49/71 - loss 0.12614770 - time (sec): 26.23 - samples/sec: 157.34 - lr: 0.025000
|
| 827 |
+
2023-04-05 13:13:06,387 epoch 27 - iter 56/71 - loss 0.12524578 - time (sec): 29.99 - samples/sec: 156.77 - lr: 0.025000
|
| 828 |
+
2023-04-05 13:13:10,205 epoch 27 - iter 63/71 - loss 0.12175985 - time (sec): 33.81 - samples/sec: 156.61 - lr: 0.025000
|
| 829 |
+
2023-04-05 13:13:14,135 epoch 27 - iter 70/71 - loss 0.12627498 - time (sec): 37.74 - samples/sec: 155.75 - lr: 0.025000
|
| 830 |
+
2023-04-05 13:13:15,072 ----------------------------------------------------------------------------------------------------
|
| 831 |
+
2023-04-05 13:13:15,073 EPOCH 27 done: loss 0.1258 - lr 0.025000
|
| 832 |
+
2023-04-05 13:13:46,132 Evaluating as a multi-label problem: False
|
| 833 |
+
2023-04-05 13:13:46,148 TRAIN : loss 0.05426767095923424 - f1-score (micro avg) 0.9714
|
| 834 |
+
2023-04-05 13:13:49,886 Evaluating as a multi-label problem: False
|
| 835 |
+
2023-04-05 13:13:49,900 DEV : loss 0.21591384708881378 - f1-score (micro avg) 0.8922
|
| 836 |
+
2023-04-05 13:13:49,905 BAD EPOCHS (no improvement): 2
|
| 837 |
+
2023-04-05 13:13:49,907 ----------------------------------------------------------------------------------------------------
|
| 838 |
+
2023-04-05 13:13:53,123 epoch 28 - iter 7/71 - loss 0.14167046 - time (sec): 3.21 - samples/sec: 183.52 - lr: 0.025000
|
| 839 |
+
2023-04-05 13:13:56,811 epoch 28 - iter 14/71 - loss 0.12129582 - time (sec): 6.90 - samples/sec: 174.86 - lr: 0.025000
|
| 840 |
+
2023-04-05 13:14:00,650 epoch 28 - iter 21/71 - loss 0.11742410 - time (sec): 10.74 - samples/sec: 170.64 - lr: 0.025000
|
| 841 |
+
2023-04-05 13:14:04,744 epoch 28 - iter 28/71 - loss 0.11524606 - time (sec): 14.84 - samples/sec: 163.59 - lr: 0.025000
|
| 842 |
+
2023-04-05 13:14:08,572 epoch 28 - iter 35/71 - loss 0.11837341 - time (sec): 18.66 - samples/sec: 161.11 - lr: 0.025000
|
| 843 |
+
2023-04-05 13:14:12,510 epoch 28 - iter 42/71 - loss 0.11728808 - time (sec): 22.60 - samples/sec: 159.06 - lr: 0.025000
|
| 844 |
+
2023-04-05 13:14:16,455 epoch 28 - iter 49/71 - loss 0.11521758 - time (sec): 26.55 - samples/sec: 154.90 - lr: 0.025000
|
| 845 |
+
2023-04-05 13:14:20,473 epoch 28 - iter 56/71 - loss 0.11345810 - time (sec): 30.57 - samples/sec: 152.56 - lr: 0.025000
|
| 846 |
+
2023-04-05 13:14:24,545 epoch 28 - iter 63/71 - loss 0.11271746 - time (sec): 34.64 - samples/sec: 152.47 - lr: 0.025000
|
| 847 |
+
2023-04-05 13:14:28,525 epoch 28 - iter 70/71 - loss 0.10973027 - time (sec): 38.62 - samples/sec: 152.11 - lr: 0.025000
|
| 848 |
+
2023-04-05 13:14:29,417 ----------------------------------------------------------------------------------------------------
|
| 849 |
+
2023-04-05 13:14:29,418 EPOCH 28 done: loss 0.1105 - lr 0.025000
|
| 850 |
+
2023-04-05 13:15:01,734 Evaluating as a multi-label problem: False
|
| 851 |
+
2023-04-05 13:15:01,755 TRAIN : loss 0.05138213932514191 - f1-score (micro avg) 0.9737
|
| 852 |
+
2023-04-05 13:15:05,435 Evaluating as a multi-label problem: False
|
| 853 |
+
2023-04-05 13:15:05,449 DEV : loss 0.20661891996860504 - f1-score (micro avg) 0.9
|
| 854 |
+
2023-04-05 13:15:05,454 BAD EPOCHS (no improvement): 0
|
| 855 |
+
2023-04-05 13:15:05,456 saving best model
|
| 856 |
+
2023-04-05 13:15:06,556 ----------------------------------------------------------------------------------------------------
|
| 857 |
+
2023-04-05 13:15:09,647 epoch 29 - iter 7/71 - loss 0.09828948 - time (sec): 3.09 - samples/sec: 191.35 - lr: 0.025000
|
| 858 |
+
2023-04-05 13:15:13,489 epoch 29 - iter 14/71 - loss 0.09758705 - time (sec): 6.93 - samples/sec: 169.24 - lr: 0.025000
|
| 859 |
+
2023-04-05 13:15:17,592 epoch 29 - iter 21/71 - loss 0.10076913 - time (sec): 11.03 - samples/sec: 159.06 - lr: 0.025000
|
| 860 |
+
2023-04-05 13:15:21,325 epoch 29 - iter 28/71 - loss 0.10712337 - time (sec): 14.77 - samples/sec: 159.01 - lr: 0.025000
|
| 861 |
+
2023-04-05 13:15:25,125 epoch 29 - iter 35/71 - loss 0.10798193 - time (sec): 18.57 - samples/sec: 159.10 - lr: 0.025000
|
| 862 |
+
2023-04-05 13:15:28,807 epoch 29 - iter 42/71 - loss 0.10437713 - time (sec): 22.25 - samples/sec: 159.56 - lr: 0.025000
|
| 863 |
+
2023-04-05 13:15:32,624 epoch 29 - iter 49/71 - loss 0.10833207 - time (sec): 26.07 - samples/sec: 158.25 - lr: 0.025000
|
| 864 |
+
2023-04-05 13:15:36,363 epoch 29 - iter 56/71 - loss 0.10749686 - time (sec): 29.80 - samples/sec: 158.10 - lr: 0.025000
|
| 865 |
+
2023-04-05 13:15:40,194 epoch 29 - iter 63/71 - loss 0.10740193 - time (sec): 33.64 - samples/sec: 157.39 - lr: 0.025000
|
| 866 |
+
2023-04-05 13:15:43,973 epoch 29 - iter 70/71 - loss 0.10763063 - time (sec): 37.41 - samples/sec: 157.24 - lr: 0.025000
|
| 867 |
+
2023-04-05 13:15:44,840 ----------------------------------------------------------------------------------------------------
|
| 868 |
+
2023-04-05 13:15:44,841 EPOCH 29 done: loss 0.1075 - lr 0.025000
|
| 869 |
+
2023-04-05 13:16:16,518 Evaluating as a multi-label problem: False
|
| 870 |
+
2023-04-05 13:16:16,538 TRAIN : loss 0.04754020646214485 - f1-score (micro avg) 0.9807
|
| 871 |
+
2023-04-05 13:16:20,428 Evaluating as a multi-label problem: False
|
| 872 |
+
2023-04-05 13:16:20,441 DEV : loss 0.20554056763648987 - f1-score (micro avg) 0.8831
|
| 873 |
+
2023-04-05 13:16:20,447 BAD EPOCHS (no improvement): 1
|
| 874 |
+
2023-04-05 13:16:20,449 ----------------------------------------------------------------------------------------------------
|
| 875 |
+
2023-04-05 13:16:23,644 epoch 30 - iter 7/71 - loss 0.12214543 - time (sec): 3.19 - samples/sec: 190.64 - lr: 0.025000
|
| 876 |
+
2023-04-05 13:16:27,504 epoch 30 - iter 14/71 - loss 0.10450958 - time (sec): 7.05 - samples/sec: 169.69 - lr: 0.025000
|
| 877 |
+
2023-04-05 13:16:31,376 epoch 30 - iter 21/71 - loss 0.10887640 - time (sec): 10.93 - samples/sec: 163.00 - lr: 0.025000
|
| 878 |
+
2023-04-05 13:16:35,226 epoch 30 - iter 28/71 - loss 0.10731823 - time (sec): 14.78 - samples/sec: 160.73 - lr: 0.025000
|
| 879 |
+
2023-04-05 13:16:39,102 epoch 30 - iter 35/71 - loss 0.10467509 - time (sec): 18.65 - samples/sec: 158.64 - lr: 0.025000
|
| 880 |
+
2023-04-05 13:16:43,120 epoch 30 - iter 42/71 - loss 0.11094270 - time (sec): 22.67 - samples/sec: 156.95 - lr: 0.025000
|
| 881 |
+
2023-04-05 13:16:46,982 epoch 30 - iter 49/71 - loss 0.10634089 - time (sec): 26.53 - samples/sec: 157.10 - lr: 0.025000
|
| 882 |
+
2023-04-05 13:16:51,073 epoch 30 - iter 56/71 - loss 0.10685095 - time (sec): 30.62 - samples/sec: 154.23 - lr: 0.025000
|
| 883 |
+
2023-04-05 13:16:55,184 epoch 30 - iter 63/71 - loss 0.11220058 - time (sec): 34.73 - samples/sec: 152.73 - lr: 0.025000
|
| 884 |
+
2023-04-05 13:16:59,355 epoch 30 - iter 70/71 - loss 0.11420165 - time (sec): 38.91 - samples/sec: 151.32 - lr: 0.025000
|
| 885 |
+
2023-04-05 13:17:00,290 ----------------------------------------------------------------------------------------------------
|
| 886 |
+
2023-04-05 13:17:00,291 EPOCH 30 done: loss 0.1138 - lr 0.025000
|
| 887 |
+
2023-04-05 13:17:33,935 Evaluating as a multi-label problem: False
|
| 888 |
+
2023-04-05 13:17:33,955 TRAIN : loss 0.049051374197006226 - f1-score (micro avg) 0.9765
|
| 889 |
+
2023-04-05 13:17:37,925 Evaluating as a multi-label problem: False
|
| 890 |
+
2023-04-05 13:17:37,937 DEV : loss 0.21063490211963654 - f1-score (micro avg) 0.8989
|
| 891 |
+
2023-04-05 13:17:37,943 BAD EPOCHS (no improvement): 2
|
| 892 |
+
2023-04-05 13:17:37,944 ----------------------------------------------------------------------------------------------------
|
| 893 |
+
2023-04-05 13:17:41,197 epoch 31 - iter 7/71 - loss 0.12085872 - time (sec): 3.25 - samples/sec: 182.37 - lr: 0.025000
|
| 894 |
+
2023-04-05 13:17:44,963 epoch 31 - iter 14/71 - loss 0.12129786 - time (sec): 7.02 - samples/sec: 164.16 - lr: 0.025000
|
| 895 |
+
2023-04-05 13:17:48,794 epoch 31 - iter 21/71 - loss 0.11010256 - time (sec): 10.85 - samples/sec: 160.39 - lr: 0.025000
|
| 896 |
+
2023-04-05 13:17:52,660 epoch 31 - iter 28/71 - loss 0.11869391 - time (sec): 14.71 - samples/sec: 157.67 - lr: 0.025000
|
| 897 |
+
2023-04-05 13:17:56,463 epoch 31 - iter 35/71 - loss 0.11978621 - time (sec): 18.52 - samples/sec: 157.63 - lr: 0.025000
|
| 898 |
+
2023-04-05 13:18:00,669 epoch 31 - iter 42/71 - loss 0.12158608 - time (sec): 22.72 - samples/sec: 153.76 - lr: 0.025000
|
| 899 |
+
2023-04-05 13:18:04,928 epoch 31 - iter 49/71 - loss 0.12205383 - time (sec): 26.98 - samples/sec: 151.10 - lr: 0.025000
|
| 900 |
+
2023-04-05 13:18:08,749 epoch 31 - iter 56/71 - loss 0.12407999 - time (sec): 30.80 - samples/sec: 152.35 - lr: 0.025000
|
| 901 |
+
2023-04-05 13:18:12,993 epoch 31 - iter 63/71 - loss 0.12182435 - time (sec): 35.05 - samples/sec: 151.14 - lr: 0.025000
|
| 902 |
+
2023-04-05 13:18:17,045 epoch 31 - iter 70/71 - loss 0.11982468 - time (sec): 39.10 - samples/sec: 150.39 - lr: 0.025000
|
| 903 |
+
2023-04-05 13:18:17,906 ----------------------------------------------------------------------------------------------------
|
| 904 |
+
2023-04-05 13:18:17,907 EPOCH 31 done: loss 0.1199 - lr 0.025000
|
| 905 |
+
2023-04-05 13:18:50,399 Evaluating as a multi-label problem: False
|
| 906 |
+
2023-04-05 13:18:50,415 TRAIN : loss 0.04879188537597656 - f1-score (micro avg) 0.9767
|
| 907 |
+
2023-04-05 13:18:54,298 Evaluating as a multi-label problem: False
|
| 908 |
+
2023-04-05 13:18:54,311 DEV : loss 0.21518820524215698 - f1-score (micro avg) 0.8856
|
| 909 |
+
2023-04-05 13:18:54,319 BAD EPOCHS (no improvement): 3
|
| 910 |
+
2023-04-05 13:18:54,320 ----------------------------------------------------------------------------------------------------
|
| 911 |
+
2023-04-05 13:18:57,672 epoch 32 - iter 7/71 - loss 0.09686100 - time (sec): 3.35 - samples/sec: 182.98 - lr: 0.025000
|
| 912 |
+
2023-04-05 13:19:01,687 epoch 32 - iter 14/71 - loss 0.11264203 - time (sec): 7.36 - samples/sec: 163.07 - lr: 0.025000
|
| 913 |
+
2023-04-05 13:19:05,453 epoch 32 - iter 21/71 - loss 0.11587441 - time (sec): 11.13 - samples/sec: 160.90 - lr: 0.025000
|
| 914 |
+
2023-04-05 13:19:09,495 epoch 32 - iter 28/71 - loss 0.11953699 - time (sec): 15.17 - samples/sec: 154.68 - lr: 0.025000
|
| 915 |
+
2023-04-05 13:19:13,613 epoch 32 - iter 35/71 - loss 0.11792902 - time (sec): 19.29 - samples/sec: 151.52 - lr: 0.025000
|
| 916 |
+
2023-04-05 13:19:17,553 epoch 32 - iter 42/71 - loss 0.11791943 - time (sec): 23.23 - samples/sec: 151.60 - lr: 0.025000
|
| 917 |
+
2023-04-05 13:19:21,512 epoch 32 - iter 49/71 - loss 0.11984092 - time (sec): 27.19 - samples/sec: 151.85 - lr: 0.025000
|
| 918 |
+
2023-04-05 13:19:25,273 epoch 32 - iter 56/71 - loss 0.12028672 - time (sec): 30.95 - samples/sec: 152.37 - lr: 0.025000
|
| 919 |
+
2023-04-05 13:19:29,003 epoch 32 - iter 63/71 - loss 0.11505568 - time (sec): 34.68 - samples/sec: 153.28 - lr: 0.025000
|
| 920 |
+
2023-04-05 13:19:32,783 epoch 32 - iter 70/71 - loss 0.11360413 - time (sec): 38.46 - samples/sec: 152.91 - lr: 0.025000
|
| 921 |
+
2023-04-05 13:19:33,667 ----------------------------------------------------------------------------------------------------
|
| 922 |
+
2023-04-05 13:19:33,668 EPOCH 32 done: loss 0.1136 - lr 0.025000
|
| 923 |
+
2023-04-05 13:20:05,299 Evaluating as a multi-label problem: False
|
| 924 |
+
2023-04-05 13:20:05,317 TRAIN : loss 0.04602975398302078 - f1-score (micro avg) 0.9764
|
| 925 |
+
2023-04-05 13:20:08,959 Evaluating as a multi-label problem: False
|
| 926 |
+
2023-04-05 13:20:08,971 DEV : loss 0.2196006327867508 - f1-score (micro avg) 0.8831
|
| 927 |
+
2023-04-05 13:20:08,977 Epoch 32: reducing learning rate of group 0 to 1.2500e-02.
|
| 928 |
+
2023-04-05 13:20:08,978 BAD EPOCHS (no improvement): 4
|
| 929 |
+
2023-04-05 13:20:08,980 ----------------------------------------------------------------------------------------------------
|
| 930 |
+
2023-04-05 13:20:12,222 epoch 33 - iter 7/71 - loss 0.11581283 - time (sec): 3.24 - samples/sec: 177.97 - lr: 0.012500
|
| 931 |
+
2023-04-05 13:20:16,008 epoch 33 - iter 14/71 - loss 0.11978144 - time (sec): 7.03 - samples/sec: 169.61 - lr: 0.012500
|
| 932 |
+
2023-04-05 13:20:19,936 epoch 33 - iter 21/71 - loss 0.11537349 - time (sec): 10.96 - samples/sec: 159.65 - lr: 0.012500
|
| 933 |
+
2023-04-05 13:20:23,770 epoch 33 - iter 28/71 - loss 0.10736190 - time (sec): 14.79 - samples/sec: 160.05 - lr: 0.012500
|
| 934 |
+
2023-04-05 13:20:27,878 epoch 33 - iter 35/71 - loss 0.11021785 - time (sec): 18.90 - samples/sec: 156.10 - lr: 0.012500
|
| 935 |
+
2023-04-05 13:20:31,966 epoch 33 - iter 42/71 - loss 0.10713205 - time (sec): 22.99 - samples/sec: 152.88 - lr: 0.012500
|
| 936 |
+
2023-04-05 13:20:35,718 epoch 33 - iter 49/71 - loss 0.10934260 - time (sec): 26.74 - samples/sec: 153.53 - lr: 0.012500
|
| 937 |
+
2023-04-05 13:20:39,467 epoch 33 - iter 56/71 - loss 0.10654440 - time (sec): 30.49 - samples/sec: 153.51 - lr: 0.012500
|
| 938 |
+
2023-04-05 13:20:43,388 epoch 33 - iter 63/71 - loss 0.10765648 - time (sec): 34.41 - samples/sec: 153.34 - lr: 0.012500
|
| 939 |
+
2023-04-05 13:20:47,348 epoch 33 - iter 70/71 - loss 0.11051452 - time (sec): 38.37 - samples/sec: 153.15 - lr: 0.012500
|
| 940 |
+
2023-04-05 13:20:48,204 ----------------------------------------------------------------------------------------------------
|
| 941 |
+
2023-04-05 13:20:48,205 EPOCH 33 done: loss 0.1101 - lr 0.012500
|
| 942 |
+
2023-04-05 13:21:21,733 Evaluating as a multi-label problem: False
|
| 943 |
+
2023-04-05 13:21:21,755 TRAIN : loss 0.04103841260075569 - f1-score (micro avg) 0.9818
|
| 944 |
+
2023-04-05 13:21:25,443 Evaluating as a multi-label problem: False
|
| 945 |
+
2023-04-05 13:21:25,453 DEV : loss 0.19789864122867584 - f1-score (micro avg) 0.8848
|
| 946 |
+
2023-04-05 13:21:25,460 BAD EPOCHS (no improvement): 1
|
| 947 |
+
2023-04-05 13:21:25,462 ----------------------------------------------------------------------------------------------------
|
| 948 |
+
2023-04-05 13:21:28,952 epoch 34 - iter 7/71 - loss 0.11239772 - time (sec): 3.49 - samples/sec: 171.41 - lr: 0.012500
|
| 949 |
+
2023-04-05 13:21:33,108 epoch 34 - iter 14/71 - loss 0.09980787 - time (sec): 7.64 - samples/sec: 155.67 - lr: 0.012500
|
| 950 |
+
2023-04-05 13:21:37,298 epoch 34 - iter 21/71 - loss 0.09221232 - time (sec): 11.84 - samples/sec: 150.74 - lr: 0.012500
|
| 951 |
+
2023-04-05 13:21:41,356 epoch 34 - iter 28/71 - loss 0.09881858 - time (sec): 15.89 - samples/sec: 148.74 - lr: 0.012500
|
| 952 |
+
2023-04-05 13:21:45,307 epoch 34 - iter 35/71 - loss 0.10014088 - time (sec): 19.84 - samples/sec: 147.75 - lr: 0.012500
|
| 953 |
+
2023-04-05 13:21:49,356 epoch 34 - iter 42/71 - loss 0.09659220 - time (sec): 23.89 - samples/sec: 147.12 - lr: 0.012500
|
| 954 |
+
2023-04-05 13:21:53,323 epoch 34 - iter 49/71 - loss 0.09546800 - time (sec): 27.86 - samples/sec: 147.34 - lr: 0.012500
|
| 955 |
+
2023-04-05 13:21:57,491 epoch 34 - iter 56/71 - loss 0.09803050 - time (sec): 32.03 - samples/sec: 146.53 - lr: 0.012500
|
| 956 |
+
2023-04-05 13:22:01,642 epoch 34 - iter 63/71 - loss 0.10373384 - time (sec): 36.18 - samples/sec: 147.24 - lr: 0.012500
|
| 957 |
+
2023-04-05 13:22:05,766 epoch 34 - iter 70/71 - loss 0.10094599 - time (sec): 40.30 - samples/sec: 145.94 - lr: 0.012500
|
| 958 |
+
2023-04-05 13:22:06,689 ----------------------------------------------------------------------------------------------------
|
| 959 |
+
2023-04-05 13:22:06,690 EPOCH 34 done: loss 0.1011 - lr 0.012500
|
| 960 |
+
2023-04-05 13:22:39,510 Evaluating as a multi-label problem: False
|
| 961 |
+
2023-04-05 13:22:39,527 TRAIN : loss 0.04190446436405182 - f1-score (micro avg) 0.9799
|
| 962 |
+
2023-04-05 13:22:43,323 Evaluating as a multi-label problem: False
|
| 963 |
+
2023-04-05 13:22:43,335 DEV : loss 0.20444391667842865 - f1-score (micro avg) 0.8951
|
| 964 |
+
2023-04-05 13:22:43,341 BAD EPOCHS (no improvement): 2
|
| 965 |
+
2023-04-05 13:22:43,342 ----------------------------------------------------------------------------------------------------
|
| 966 |
+
2023-04-05 13:22:46,687 epoch 35 - iter 7/71 - loss 0.12740153 - time (sec): 3.34 - samples/sec: 179.39 - lr: 0.012500
|
| 967 |
+
2023-04-05 13:22:50,605 epoch 35 - iter 14/71 - loss 0.09989261 - time (sec): 7.26 - samples/sec: 162.61 - lr: 0.012500
|
| 968 |
+
2023-04-05 13:22:54,708 epoch 35 - iter 21/71 - loss 0.09380928 - time (sec): 11.37 - samples/sec: 158.03 - lr: 0.012500
|
| 969 |
+
2023-04-05 13:22:58,737 epoch 35 - iter 28/71 - loss 0.10283196 - time (sec): 15.39 - samples/sec: 154.02 - lr: 0.012500
|
| 970 |
+
2023-04-05 13:23:02,706 epoch 35 - iter 35/71 - loss 0.10197240 - time (sec): 19.36 - samples/sec: 153.54 - lr: 0.012500
|
| 971 |
+
2023-04-05 13:23:06,476 epoch 35 - iter 42/71 - loss 0.09712460 - time (sec): 23.13 - samples/sec: 152.98 - lr: 0.012500
|
| 972 |
+
2023-04-05 13:23:10,261 epoch 35 - iter 49/71 - loss 0.09975067 - time (sec): 26.92 - samples/sec: 152.68 - lr: 0.012500
|
| 973 |
+
2023-04-05 13:23:14,485 epoch 35 - iter 56/71 - loss 0.09870986 - time (sec): 31.14 - samples/sec: 150.82 - lr: 0.012500
|
| 974 |
+
2023-04-05 13:23:18,733 epoch 35 - iter 63/71 - loss 0.10022879 - time (sec): 35.39 - samples/sec: 149.50 - lr: 0.012500
|
| 975 |
+
2023-04-05 13:23:22,647 epoch 35 - iter 70/71 - loss 0.10151940 - time (sec): 39.30 - samples/sec: 149.76 - lr: 0.012500
|
| 976 |
+
2023-04-05 13:23:23,519 ----------------------------------------------------------------------------------------------------
|
| 977 |
+
2023-04-05 13:23:23,520 EPOCH 35 done: loss 0.1017 - lr 0.012500
|
| 978 |
+
2023-04-05 13:23:56,251 Evaluating as a multi-label problem: False
|
| 979 |
+
2023-04-05 13:23:56,267 TRAIN : loss 0.03953753411769867 - f1-score (micro avg) 0.9826
|
| 980 |
+
2023-04-05 13:24:00,070 Evaluating as a multi-label problem: False
|
| 981 |
+
2023-04-05 13:24:00,083 DEV : loss 0.20463646948337555 - f1-score (micro avg) 0.8939
|
| 982 |
+
2023-04-05 13:24:00,089 BAD EPOCHS (no improvement): 3
|
| 983 |
+
2023-04-05 13:24:00,091 ----------------------------------------------------------------------------------------------------
|
| 984 |
+
2023-04-05 13:24:03,512 epoch 36 - iter 7/71 - loss 0.09733479 - time (sec): 3.42 - samples/sec: 169.82 - lr: 0.012500
|
| 985 |
+
2023-04-05 13:24:07,519 epoch 36 - iter 14/71 - loss 0.09929472 - time (sec): 7.43 - samples/sec: 155.63 - lr: 0.012500
|
| 986 |
+
2023-04-05 13:24:11,244 epoch 36 - iter 21/71 - loss 0.09236051 - time (sec): 11.15 - samples/sec: 156.73 - lr: 0.012500
|
| 987 |
+
2023-04-05 13:24:14,927 epoch 36 - iter 28/71 - loss 0.09717889 - time (sec): 14.84 - samples/sec: 158.54 - lr: 0.012500
|
| 988 |
+
2023-04-05 13:24:18,720 epoch 36 - iter 35/71 - loss 0.09814890 - time (sec): 18.63 - samples/sec: 157.23 - lr: 0.012500
|
| 989 |
+
2023-04-05 13:24:22,810 epoch 36 - iter 42/71 - loss 0.09793933 - time (sec): 22.72 - samples/sec: 154.50 - lr: 0.012500
|
| 990 |
+
2023-04-05 13:24:26,643 epoch 36 - iter 49/71 - loss 0.09698078 - time (sec): 26.55 - samples/sec: 154.15 - lr: 0.012500
|
| 991 |
+
2023-04-05 13:24:30,395 epoch 36 - iter 56/71 - loss 0.09628057 - time (sec): 30.30 - samples/sec: 155.19 - lr: 0.012500
|
| 992 |
+
2023-04-05 13:24:34,471 epoch 36 - iter 63/71 - loss 0.09878813 - time (sec): 34.38 - samples/sec: 154.22 - lr: 0.012500
|
| 993 |
+
2023-04-05 13:24:38,425 epoch 36 - iter 70/71 - loss 0.10012306 - time (sec): 38.33 - samples/sec: 153.20 - lr: 0.012500
|
| 994 |
+
2023-04-05 13:24:39,322 ----------------------------------------------------------------------------------------------------
|
| 995 |
+
2023-04-05 13:24:39,323 EPOCH 36 done: loss 0.1011 - lr 0.012500
|
| 996 |
+
2023-04-05 13:25:11,618 Evaluating as a multi-label problem: False
|
| 997 |
+
2023-04-05 13:25:11,634 TRAIN : loss 0.03854582458734512 - f1-score (micro avg) 0.9824
|
| 998 |
+
2023-04-05 13:25:15,735 Evaluating as a multi-label problem: False
|
| 999 |
+
2023-04-05 13:25:15,747 DEV : loss 0.20157837867736816 - f1-score (micro avg) 0.8885
|
| 1000 |
+
2023-04-05 13:25:15,751 Epoch 36: reducing learning rate of group 0 to 6.2500e-03.
|
| 1001 |
+
2023-04-05 13:25:15,752 BAD EPOCHS (no improvement): 4
|
| 1002 |
+
2023-04-05 13:25:15,754 ----------------------------------------------------------------------------------------------------
|
| 1003 |
+
2023-04-05 13:25:19,364 epoch 37 - iter 7/71 - loss 0.10420912 - time (sec): 3.61 - samples/sec: 165.95 - lr: 0.006250
|
| 1004 |
+
2023-04-05 13:25:23,503 epoch 37 - iter 14/71 - loss 0.09970839 - time (sec): 7.75 - samples/sec: 153.57 - lr: 0.006250
|
| 1005 |
+
2023-04-05 13:25:27,746 epoch 37 - iter 21/71 - loss 0.09589796 - time (sec): 11.99 - samples/sec: 149.02 - lr: 0.006250
|
| 1006 |
+
2023-04-05 13:25:31,896 epoch 37 - iter 28/71 - loss 0.09015041 - time (sec): 16.14 - samples/sec: 145.40 - lr: 0.006250
|
| 1007 |
+
2023-04-05 13:25:35,703 epoch 37 - iter 35/71 - loss 0.09060108 - time (sec): 19.95 - samples/sec: 145.42 - lr: 0.006250
|
| 1008 |
+
2023-04-05 13:25:39,605 epoch 37 - iter 42/71 - loss 0.08912907 - time (sec): 23.85 - samples/sec: 145.99 - lr: 0.006250
|
| 1009 |
+
2023-04-05 13:25:43,386 epoch 37 - iter 49/71 - loss 0.09042087 - time (sec): 27.63 - samples/sec: 148.27 - lr: 0.006250
|
| 1010 |
+
2023-04-05 13:25:47,332 epoch 37 - iter 56/71 - loss 0.08986784 - time (sec): 31.58 - samples/sec: 148.59 - lr: 0.006250
|
| 1011 |
+
2023-04-05 13:25:51,296 epoch 37 - iter 63/71 - loss 0.09147423 - time (sec): 35.54 - samples/sec: 148.87 - lr: 0.006250
|
| 1012 |
+
2023-04-05 13:25:55,383 epoch 37 - iter 70/71 - loss 0.09523919 - time (sec): 39.63 - samples/sec: 148.30 - lr: 0.006250
|
| 1013 |
+
2023-04-05 13:25:56,305 ----------------------------------------------------------------------------------------------------
|
| 1014 |
+
2023-04-05 13:25:56,307 EPOCH 37 done: loss 0.0947 - lr 0.006250
|
| 1015 |
+
2023-04-05 13:26:28,852 Evaluating as a multi-label problem: False
|
| 1016 |
+
2023-04-05 13:26:28,871 TRAIN : loss 0.03739665448665619 - f1-score (micro avg) 0.9842
|
| 1017 |
+
2023-04-05 13:26:32,646 Evaluating as a multi-label problem: False
|
| 1018 |
+
2023-04-05 13:26:32,656 DEV : loss 0.20396985113620758 - f1-score (micro avg) 0.8972
|
| 1019 |
+
2023-04-05 13:26:32,663 BAD EPOCHS (no improvement): 1
|
| 1020 |
+
2023-04-05 13:26:32,664 ----------------------------------------------------------------------------------------------------
|
| 1021 |
+
2023-04-05 13:26:35,733 epoch 38 - iter 7/71 - loss 0.05286205 - time (sec): 3.07 - samples/sec: 184.49 - lr: 0.006250
|
| 1022 |
+
2023-04-05 13:26:39,585 epoch 38 - iter 14/71 - loss 0.07231140 - time (sec): 6.92 - samples/sec: 168.21 - lr: 0.006250
|
| 1023 |
+
2023-04-05 13:26:43,417 epoch 38 - iter 21/71 - loss 0.07400087 - time (sec): 10.75 - samples/sec: 163.88 - lr: 0.006250
|
| 1024 |
+
2023-04-05 13:26:47,164 epoch 38 - iter 28/71 - loss 0.08128836 - time (sec): 14.50 - samples/sec: 161.87 - lr: 0.006250
|
| 1025 |
+
2023-04-05 13:26:50,941 epoch 38 - iter 35/71 - loss 0.08052234 - time (sec): 18.28 - samples/sec: 161.08 - lr: 0.006250
|
| 1026 |
+
2023-04-05 13:26:54,710 epoch 38 - iter 42/71 - loss 0.08768679 - time (sec): 22.04 - samples/sec: 158.99 - lr: 0.006250
|
| 1027 |
+
2023-04-05 13:26:58,468 epoch 38 - iter 49/71 - loss 0.08866679 - time (sec): 25.80 - samples/sec: 159.44 - lr: 0.006250
|
| 1028 |
+
2023-04-05 13:27:02,174 epoch 38 - iter 56/71 - loss 0.09098401 - time (sec): 29.51 - samples/sec: 159.68 - lr: 0.006250
|
| 1029 |
+
2023-04-05 13:27:06,037 epoch 38 - iter 63/71 - loss 0.09004818 - time (sec): 33.37 - samples/sec: 158.34 - lr: 0.006250
|
| 1030 |
+
2023-04-05 13:27:09,739 epoch 38 - iter 70/71 - loss 0.09029994 - time (sec): 37.07 - samples/sec: 158.63 - lr: 0.006250
|
| 1031 |
+
2023-04-05 13:27:10,609 ----------------------------------------------------------------------------------------------------
|
| 1032 |
+
2023-04-05 13:27:10,610 EPOCH 38 done: loss 0.0905 - lr 0.006250
|
| 1033 |
+
2023-04-05 13:27:41,464 Evaluating as a multi-label problem: False
|
| 1034 |
+
2023-04-05 13:27:41,484 TRAIN : loss 0.03672816604375839 - f1-score (micro avg) 0.984
|
| 1035 |
+
2023-04-05 13:27:45,153 Evaluating as a multi-label problem: False
|
| 1036 |
+
2023-04-05 13:27:45,162 DEV : loss 0.1981133222579956 - f1-score (micro avg) 0.9026
|
| 1037 |
+
2023-04-05 13:27:45,168 BAD EPOCHS (no improvement): 0
|
| 1038 |
+
2023-04-05 13:27:45,170 saving best model
|
| 1039 |
+
2023-04-05 13:27:46,871 ----------------------------------------------------------------------------------------------------
|
| 1040 |
+
2023-04-05 13:27:50,071 epoch 39 - iter 7/71 - loss 0.09828755 - time (sec): 3.20 - samples/sec: 178.58 - lr: 0.006250
|
| 1041 |
+
2023-04-05 13:27:53,862 epoch 39 - iter 14/71 - loss 0.08947759 - time (sec): 6.99 - samples/sec: 166.98 - lr: 0.006250
|
| 1042 |
+
2023-04-05 13:27:57,588 epoch 39 - iter 21/71 - loss 0.09770654 - time (sec): 10.71 - samples/sec: 164.63 - lr: 0.006250
|
| 1043 |
+
2023-04-05 13:28:01,143 epoch 39 - iter 28/71 - loss 0.09769007 - time (sec): 14.27 - samples/sec: 162.79 - lr: 0.006250
|
| 1044 |
+
2023-04-05 13:28:04,815 epoch 39 - iter 35/71 - loss 0.09508572 - time (sec): 17.94 - samples/sec: 161.85 - lr: 0.006250
|
| 1045 |
+
2023-04-05 13:28:08,491 epoch 39 - iter 42/71 - loss 0.09602471 - time (sec): 21.62 - samples/sec: 160.52 - lr: 0.006250
|
| 1046 |
+
2023-04-05 13:28:12,205 epoch 39 - iter 49/71 - loss 0.09330580 - time (sec): 25.33 - samples/sec: 158.85 - lr: 0.006250
|
| 1047 |
+
2023-04-05 13:28:15,838 epoch 39 - iter 56/71 - loss 0.09344274 - time (sec): 28.96 - samples/sec: 160.30 - lr: 0.006250
|
| 1048 |
+
2023-04-05 13:28:19,582 epoch 39 - iter 63/71 - loss 0.09424575 - time (sec): 32.71 - samples/sec: 161.12 - lr: 0.006250
|
| 1049 |
+
2023-04-05 13:28:23,266 epoch 39 - iter 70/71 - loss 0.09570928 - time (sec): 36.39 - samples/sec: 161.54 - lr: 0.006250
|
| 1050 |
+
2023-04-05 13:28:24,136 ----------------------------------------------------------------------------------------------------
|
| 1051 |
+
2023-04-05 13:28:24,137 EPOCH 39 done: loss 0.0955 - lr 0.006250
|
| 1052 |
+
2023-04-05 13:28:55,230 Evaluating as a multi-label problem: False
|
| 1053 |
+
2023-04-05 13:28:55,254 TRAIN : loss 0.03638274222612381 - f1-score (micro avg) 0.984
|
| 1054 |
+
2023-04-05 13:28:59,307 Evaluating as a multi-label problem: False
|
| 1055 |
+
2023-04-05 13:28:59,322 DEV : loss 0.20241594314575195 - f1-score (micro avg) 0.8935
|
| 1056 |
+
2023-04-05 13:28:59,327 BAD EPOCHS (no improvement): 1
|
| 1057 |
+
2023-04-05 13:28:59,329 ----------------------------------------------------------------------------------------------------
|
| 1058 |
+
2023-04-05 13:29:02,529 epoch 40 - iter 7/71 - loss 0.10842314 - time (sec): 3.20 - samples/sec: 172.19 - lr: 0.006250
|
| 1059 |
+
2023-04-05 13:29:06,624 epoch 40 - iter 14/71 - loss 0.10954390 - time (sec): 7.29 - samples/sec: 154.36 - lr: 0.006250
|
| 1060 |
+
2023-04-05 13:29:10,512 epoch 40 - iter 21/71 - loss 0.11209631 - time (sec): 11.18 - samples/sec: 155.68 - lr: 0.006250
|
| 1061 |
+
2023-04-05 13:29:14,381 epoch 40 - iter 28/71 - loss 0.10212289 - time (sec): 15.05 - samples/sec: 153.20 - lr: 0.006250
|
| 1062 |
+
2023-04-05 13:29:18,226 epoch 40 - iter 35/71 - loss 0.09686748 - time (sec): 18.90 - samples/sec: 154.84 - lr: 0.006250
|
| 1063 |
+
2023-04-05 13:29:22,011 epoch 40 - iter 42/71 - loss 0.09428059 - time (sec): 22.68 - samples/sec: 155.41 - lr: 0.006250
|
| 1064 |
+
2023-04-05 13:29:25,738 epoch 40 - iter 49/71 - loss 0.09207549 - time (sec): 26.41 - samples/sec: 156.04 - lr: 0.006250
|
| 1065 |
+
2023-04-05 13:29:29,548 epoch 40 - iter 56/71 - loss 0.09320325 - time (sec): 30.22 - samples/sec: 156.72 - lr: 0.006250
|
| 1066 |
+
2023-04-05 13:29:33,362 epoch 40 - iter 63/71 - loss 0.09517989 - time (sec): 34.03 - samples/sec: 155.61 - lr: 0.006250
|
| 1067 |
+
2023-04-05 13:29:37,104 epoch 40 - iter 70/71 - loss 0.09762175 - time (sec): 37.78 - samples/sec: 155.84 - lr: 0.006250
|
| 1068 |
+
2023-04-05 13:29:37,944 ----------------------------------------------------------------------------------------------------
|
| 1069 |
+
2023-04-05 13:29:37,944 EPOCH 40 done: loss 0.0976 - lr 0.006250
|
| 1070 |
+
2023-04-05 13:30:09,218 Evaluating as a multi-label problem: False
|
| 1071 |
+
2023-04-05 13:30:09,236 TRAIN : loss 0.03600259870290756 - f1-score (micro avg) 0.9838
|
| 1072 |
+
2023-04-05 13:30:13,253 Evaluating as a multi-label problem: False
|
| 1073 |
+
2023-04-05 13:30:13,263 DEV : loss 0.20378927886486053 - f1-score (micro avg) 0.8939
|
| 1074 |
+
2023-04-05 13:30:13,269 BAD EPOCHS (no improvement): 2
|
| 1075 |
+
2023-04-05 13:30:14,367 ----------------------------------------------------------------------------------------------------
|
| 1076 |
+
2023-04-05 13:30:18,413 SequenceTagger predicts: Dictionary with 11 tags: O, S-PERSON, B-PERSON, E-PERSON, I-PERSON, S-ORG, B-ORG, E-ORG, I-ORG, <START>, <STOP>
|
| 1077 |
+
2023-04-05 13:30:24,734 Evaluating as a multi-label problem: False
|
| 1078 |
+
2023-04-05 13:30:24,746 0.9125 0.905 0.9087 0.8456
|
| 1079 |
+
2023-04-05 13:30:24,747
|
| 1080 |
+
Results:
|
| 1081 |
+
- F-score (micro) 0.9087
|
| 1082 |
+
- F-score (macro) 0.8766
|
| 1083 |
+
- Accuracy 0.8456
|
| 1084 |
+
|
| 1085 |
+
By class:
|
| 1086 |
+
precision recall f1-score support
|
| 1087 |
+
|
| 1088 |
+
PERSON 0.9391 0.9443 0.9417 359
|
| 1089 |
+
ORG 0.8319 0.7920 0.8115 125
|
| 1090 |
+
|
| 1091 |
+
micro avg 0.9125 0.9050 0.9087 484
|
| 1092 |
+
macro avg 0.8855 0.8681 0.8766 484
|
| 1093 |
+
weighted avg 0.9114 0.9050 0.9080 484
|
| 1094 |
+
|
| 1095 |
+
2023-04-05 13:30:24,748 ----------------------------------------------------------------------------------------------------
|