Create nli_head_alignbanked_conv5d_76pct_output.txt
Browse files
training_metrics/nli_head_alignbanked_conv5d_76pct_output.txt
ADDED
|
@@ -0,0 +1,279 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
=================================================================
|
| 2 |
+
NLI HEAD: Compositional Convolution (conv5d)
|
| 3 |
+
=================================================================
|
| 4 |
+
Device: cuda
|
| 5 |
+
|
| 6 |
+
=================================================================
|
| 7 |
+
LOADING MODEL
|
| 8 |
+
=================================================================
|
| 9 |
+
Loadingβweights:β100%
|
| 10 |
+
β112/112β[00:00<00:00,β3881.72it/s,βMaterializingβparam=token_emb.weight]
|
| 11 |
+
Model: 32,424,960 params (frozen)
|
| 12 |
+
Bank: present
|
| 13 |
+
|
| 14 |
+
=================================================================
|
| 15 |
+
LOADING SNLI
|
| 16 |
+
=================================================================
|
| 17 |
+
Train: 549,367 Val: 9,842
|
| 18 |
+
|
| 19 |
+
=================================================================
|
| 20 |
+
PRE-ENCODING
|
| 21 |
+
=================================================================
|
| 22 |
+
Encoding: 100%|ββββββββββ| 537/537 [02:21<00:00, 3.80it/s]
|
| 23 |
+
Encoding: 100%|ββββββββββ| 10/10 [00:02<00:00, 4.16it/s]
|
| 24 |
+
Enriched: 896 (raw=768 + bank=128)
|
| 25 |
+
Train: 549,000 Val: 9,800
|
| 26 |
+
entailment: 183,293 (33.4%)
|
| 27 |
+
neutral: 182,642 (33.3%)
|
| 28 |
+
contradiction: 183,065 (33.3%)
|
| 29 |
+
|
| 30 |
+
=================================================================
|
| 31 |
+
COMPOSITIONAL CONV NLI HEAD
|
| 32 |
+
=================================================================
|
| 33 |
+
Compositions of 5: 16 paths
|
| 34 |
+
(1, 1, 1, 1, 1)
|
| 35 |
+
(1, 1, 1, 2)
|
| 36 |
+
(1, 1, 2, 1)
|
| 37 |
+
(1, 1, 3)
|
| 38 |
+
(1, 2, 1, 1)
|
| 39 |
+
(1, 2, 2)
|
| 40 |
+
(1, 3, 1)
|
| 41 |
+
(1, 4)
|
| 42 |
+
(2, 1, 1, 1)
|
| 43 |
+
(2, 1, 2)
|
| 44 |
+
(2, 2, 1)
|
| 45 |
+
(2, 3)
|
| 46 |
+
(3, 1, 1)
|
| 47 |
+
(3, 2)
|
| 48 |
+
(4, 1)
|
| 49 |
+
(5,)
|
| 50 |
+
Head params: 1,808,657
|
| 51 |
+
|
| 52 |
+
=================================================================
|
| 53 |
+
TRAINING (20 epochs, 4289 batches/epoch)
|
| 54 |
+
=================================================================
|
| 55 |
+
|
| 56 |
+
E 1: 29s
|
| 57 |
+
Task: loss=0.7502 t_acc=0.6717 v_acc=0.7094 v_loss=0.6855
|
| 58 |
+
Per-class: ent=0.833 neu=0.687 con=0.605
|
| 59 |
+
Paths: (5,)=0.066 (4, 1)=0.065 (2, 1, 2)=0.064 spread=0.0066
|
| 60 |
+
Protos: sim=-0.0341 temp=9.96
|
| 61 |
+
β
New best: 0.7094
|
| 62 |
+
|
| 63 |
+
E 2: 29s
|
| 64 |
+
Task: loss=0.6658 t_acc=0.7171 v_acc=0.7282 v_loss=0.6359
|
| 65 |
+
Per-class: ent=0.773 neu=0.663 con=0.747
|
| 66 |
+
Paths: (5,)=0.074 (2, 1, 2)=0.068 (3, 2)=0.067 spread=0.0176
|
| 67 |
+
Protos: sim=-0.0761 temp=9.95
|
| 68 |
+
β
New best: 0.7282
|
| 69 |
+
|
| 70 |
+
E 3: 29s
|
| 71 |
+
Task: loss=0.6256 t_acc=0.7370 v_acc=0.7299 v_loss=0.6288
|
| 72 |
+
Per-class: ent=0.740 neu=0.738 con=0.711
|
| 73 |
+
Paths: (5,)=0.083 (2, 1, 2)=0.071 (3, 2)=0.070 spread=0.0314
|
| 74 |
+
Protos: sim=-0.1379 temp=9.98
|
| 75 |
+
β
New best: 0.7299
|
| 76 |
+
|
| 77 |
+
E 4: 29s
|
| 78 |
+
Task: loss=0.5921 t_acc=0.7536 v_acc=0.7401 v_loss=0.6132
|
| 79 |
+
Per-class: ent=0.758 neu=0.676 con=0.785
|
| 80 |
+
Paths: (5,)=0.092 (2, 1, 2)=0.073 (1, 4)=0.073 spread=0.0441
|
| 81 |
+
Protos: sim=-0.2124 temp=10.04
|
| 82 |
+
β
New best: 0.7401
|
| 83 |
+
|
| 84 |
+
E 5: 29s
|
| 85 |
+
Task: loss=0.5618 t_acc=0.7688 v_acc=0.7457 v_loss=0.6052
|
| 86 |
+
Per-class: ent=0.797 neu=0.692 con=0.747
|
| 87 |
+
Paths: (5,)=0.100 (1, 4)=0.083 (2, 1, 2)=0.074 spread=0.0552
|
| 88 |
+
Protos: sim=-0.2886 temp=10.14
|
| 89 |
+
β
New best: 0.7457
|
| 90 |
+
|
| 91 |
+
E 6: 29s
|
| 92 |
+
Task: loss=0.5313 t_acc=0.7834 v_acc=0.7540 v_loss=0.5993
|
| 93 |
+
Per-class: ent=0.786 neu=0.721 con=0.754
|
| 94 |
+
Paths: (5,)=0.106 (1, 4)=0.095 (2, 3)=0.075 spread=0.0640
|
| 95 |
+
Protos: sim=-0.3571 temp=10.27
|
| 96 |
+
β
New best: 0.7540
|
| 97 |
+
|
| 98 |
+
E 7: 29s
|
| 99 |
+
Task: loss=0.5010 t_acc=0.7977 v_acc=0.7603 v_loss=0.5865
|
| 100 |
+
Per-class: ent=0.803 neu=0.721 con=0.755
|
| 101 |
+
Paths: (5,)=0.109 (1, 4)=0.109 (2, 3)=0.079 spread=0.0708
|
| 102 |
+
Protos: sim=-0.4113 temp=10.44
|
| 103 |
+
β
New best: 0.7603
|
| 104 |
+
|
| 105 |
+
E 8: 29s
|
| 106 |
+
Task: loss=0.4705 t_acc=0.8131 v_acc=0.7563 v_loss=0.5946
|
| 107 |
+
Per-class: ent=0.792 neu=0.695 con=0.781
|
| 108 |
+
Paths: (1, 4)=0.122 (5,)=0.112 (2, 3)=0.082 spread=0.0864
|
| 109 |
+
Protos: sim=-0.4490 temp=10.62
|
| 110 |
+
|
| 111 |
+
E 9: 29s
|
| 112 |
+
Task: loss=0.4413 t_acc=0.8273 v_acc=0.7600 v_loss=0.5955
|
| 113 |
+
Per-class: ent=0.795 neu=0.719 con=0.765
|
| 114 |
+
Paths: (1, 4)=0.135 (5,)=0.113 (2, 3)=0.085 spread=0.1014
|
| 115 |
+
Protos: sim=-0.4716 temp=10.80
|
| 116 |
+
|
| 117 |
+
E10: 29s
|
| 118 |
+
Task: loss=0.4135 t_acc=0.8419 v_acc=0.7609 v_loss=0.5967
|
| 119 |
+
Per-class: ent=0.780 neu=0.718 con=0.784
|
| 120 |
+
Paths: (1, 4)=0.146 (5,)=0.113 (2, 3)=0.087 spread=0.1141
|
| 121 |
+
Protos: sim=-0.4840 temp=10.98
|
| 122 |
+
β
New best: 0.7609
|
| 123 |
+
|
| 124 |
+
E11: 29s
|
| 125 |
+
Task: loss=0.3878 t_acc=0.8552 v_acc=0.7602 v_loss=0.5929
|
| 126 |
+
Per-class: ent=0.791 neu=0.730 con=0.759
|
| 127 |
+
Paths: (1, 4)=0.155 (5,)=0.113 (2, 3)=0.089 spread=0.1241
|
| 128 |
+
Protos: sim=-0.4904 temp=11.14
|
| 129 |
+
|
| 130 |
+
E12: 29s
|
| 131 |
+
Task: loss=0.3643 t_acc=0.8680 v_acc=0.7634 v_loss=0.5950
|
| 132 |
+
Per-class: ent=0.795 neu=0.736 con=0.758
|
| 133 |
+
Paths: (1, 4)=0.161 (5,)=0.113 (2, 3)=0.091 spread=0.1317
|
| 134 |
+
Protos: sim=-0.4938 temp=11.28
|
| 135 |
+
β
New best: 0.7634
|
| 136 |
+
|
| 137 |
+
E13: 29s
|
| 138 |
+
Task: loss=0.3438 t_acc=0.8795 v_acc=0.7597 v_loss=0.6002
|
| 139 |
+
Per-class: ent=0.804 neu=0.722 con=0.752
|
| 140 |
+
Paths: (1, 4)=0.166 (5,)=0.113 (2, 3)=0.092 spread=0.1373
|
| 141 |
+
Protos: sim=-0.4958 temp=11.39
|
| 142 |
+
|
| 143 |
+
E14: 29s
|
| 144 |
+
Task: loss=0.3263 t_acc=0.8899 v_acc=0.7604 v_loss=0.6013
|
| 145 |
+
Per-class: ent=0.797 neu=0.718 con=0.765
|
| 146 |
+
Paths: (1, 4)=0.169 (5,)=0.113 (2, 3)=0.093 spread=0.1412
|
| 147 |
+
Protos: sim=-0.4969 temp=11.48
|
| 148 |
+
|
| 149 |
+
E15: 29s
|
| 150 |
+
Task: loss=0.3110 t_acc=0.8987 v_acc=0.7587 v_loss=0.6004
|
| 151 |
+
Per-class: ent=0.797 neu=0.720 con=0.758
|
| 152 |
+
Paths: (1, 4)=0.171 (5,)=0.113 (2, 3)=0.093 spread=0.1439
|
| 153 |
+
Protos: sim=-0.4976 temp=11.55
|
| 154 |
+
|
| 155 |
+
E16: 29s
|
| 156 |
+
Task: loss=0.2988 t_acc=0.9056 v_acc=0.7621 v_loss=0.6004
|
| 157 |
+
Per-class: ent=0.794 neu=0.732 con=0.760
|
| 158 |
+
Paths: (1, 4)=0.173 (5,)=0.113 (2, 3)=0.094 spread=0.1458
|
| 159 |
+
Protos: sim=-0.4980 temp=11.60
|
| 160 |
+
|
| 161 |
+
E17: 29s
|
| 162 |
+
Task: loss=0.2892 t_acc=0.9115 v_acc=0.7615 v_loss=0.6010
|
| 163 |
+
Per-class: ent=0.792 neu=0.732 con=0.759
|
| 164 |
+
Paths: (1, 4)=0.174 (5,)=0.113 (2, 3)=0.094 spread=0.1469
|
| 165 |
+
Protos: sim=-0.4982 temp=11.63
|
| 166 |
+
|
| 167 |
+
E18: 29s
|
| 168 |
+
Task: loss=0.2824 t_acc=0.9159 v_acc=0.7601 v_loss=0.6022
|
| 169 |
+
Per-class: ent=0.792 neu=0.724 con=0.764
|
| 170 |
+
Paths: (1, 4)=0.174 (5,)=0.113 (2, 3)=0.094 spread=0.1475
|
| 171 |
+
Protos: sim=-0.4984 temp=11.64
|
| 172 |
+
|
| 173 |
+
E19: 29s
|
| 174 |
+
Task: loss=0.2778 t_acc=0.9186 v_acc=0.7611 v_loss=0.6022
|
| 175 |
+
Per-class: ent=0.789 neu=0.732 con=0.762
|
| 176 |
+
Paths: (1, 4)=0.174 (5,)=0.113 (2, 3)=0.094 spread=0.1478
|
| 177 |
+
Protos: sim=-0.4984 temp=11.65
|
| 178 |
+
|
| 179 |
+
E20: 29s
|
| 180 |
+
Task: loss=0.2754 t_acc=0.9199 v_acc=0.7606 v_loss=0.6020
|
| 181 |
+
Per-class: ent=0.790 neu=0.728 con=0.763
|
| 182 |
+
Paths: (1, 4)=0.174 (5,)=0.113 (2, 3)=0.094 spread=0.1480
|
| 183 |
+
Protos: sim=-0.4985 temp=11.66
|
| 184 |
+
|
| 185 |
+
=================================================================
|
| 186 |
+
PATH WEIGHT ANALYSIS
|
| 187 |
+
=================================================================
|
| 188 |
+
|
| 189 |
+
Path Weight Type
|
| 190 |
+
--------------------------------------------------
|
| 191 |
+
(1, 4) 0.1609 geoβrest ββββββββββββββββ
|
| 192 |
+
(5,) 0.1133 holistic βββββββββββ
|
| 193 |
+
(2, 3) 0.0905 geo+structβ... βββββββββ
|
| 194 |
+
(3, 2) 0.0717 geo-first βββββββ
|
| 195 |
+
(2, 1, 2) 0.0677 geo+structβ... ββββββ
|
| 196 |
+
(1, 1, 3) 0.0669 geoβrest ββββββ
|
| 197 |
+
(1, 1, 1, 2) 0.0552 geoβrest βββββ
|
| 198 |
+
(4, 1) 0.0549 geo-first βββββ
|
| 199 |
+
(2, 2, 1) 0.0471 geo+structβ... ββββ
|
| 200 |
+
(3, 1, 1) 0.0468 geo-first ββββ
|
| 201 |
+
(1, 2, 2) 0.0434 geoβrest ββββ
|
| 202 |
+
(2, 1, 1, 1) 0.0422 geo+structβ... ββββ
|
| 203 |
+
(1, 3, 1) 0.0380 geoβrest βββ
|
| 204 |
+
(1, 1, 2, 1) 0.0370 geoβrest βββ
|
| 205 |
+
(1, 1, 1, 1, 1) 0.0351 independent βββ
|
| 206 |
+
(1, 2, 1, 1) 0.0292 geoβrest ββ
|
| 207 |
+
|
| 208 |
+
=================================================================
|
| 209 |
+
COMPOSITIONAL ORDER TEST
|
| 210 |
+
=================================================================
|
| 211 |
+
A new version of the following files was downloaded from https://huggingface.co/AbstractPhil/geolip-captionbert-8192:
|
| 212 |
+
- modeling_caption_bert.py
|
| 213 |
+
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
|
| 214 |
+
Loadingβweights:β100%
|
| 215 |
+
β112/112β[00:00<00:00,β3890.69it/s,βMaterializingβparam=token_emb.weight]
|
| 216 |
+
|
| 217 |
+
P: a potato on top of a table
|
| 218 |
+
H: a table on top of a potato
|
| 219 |
+
Pooled cos: 0.987
|
| 220 |
+
NLI: entailment [E=0.507 N=0.056 C=0.437]
|
| 221 |
+
|
| 222 |
+
P: a potato on top of a table
|
| 223 |
+
H: there is a potato
|
| 224 |
+
Pooled cos: 0.502
|
| 225 |
+
NLI: entailment [E=0.842 N=0.093 C=0.065]
|
| 226 |
+
|
| 227 |
+
P: a cat is sitting on a mat
|
| 228 |
+
H: a mat is sitting on a cat
|
| 229 |
+
Pooled cos: 0.993
|
| 230 |
+
NLI: entailment [E=0.917 N=0.014 C=0.070]
|
| 231 |
+
|
| 232 |
+
P: a dog chased the cat
|
| 233 |
+
H: the cat chased the dog
|
| 234 |
+
Pooled cos: 0.977
|
| 235 |
+
NLI: entailment [E=0.427 N=0.201 C=0.372]
|
| 236 |
+
|
| 237 |
+
P: a woman is holding a baby
|
| 238 |
+
H: a baby is holding a woman
|
| 239 |
+
Pooled cos: 0.996
|
| 240 |
+
NLI: entailment [E=0.991 N=0.005 C=0.004]
|
| 241 |
+
|
| 242 |
+
P: the boy kicked the ball
|
| 243 |
+
H: the ball kicked the boy
|
| 244 |
+
Pooled cos: 0.986
|
| 245 |
+
NLI: contradiction [E=0.462 N=0.014 C=0.524]
|
| 246 |
+
|
| 247 |
+
P: a man is riding a horse
|
| 248 |
+
H: a horse is riding a man
|
| 249 |
+
Pooled cos: 0.995
|
| 250 |
+
NLI: entailment [E=0.862 N=0.004 C=0.134]
|
| 251 |
+
|
| 252 |
+
P: a girl is painting a picture
|
| 253 |
+
H: a girl is creating art
|
| 254 |
+
Pooled cos: 0.796
|
| 255 |
+
NLI: neutral [E=0.313 N=0.596 C=0.092]
|
| 256 |
+
|
| 257 |
+
P: two dogs are playing in a park
|
| 258 |
+
H: animals are outdoors
|
| 259 |
+
Pooled cos: 0.676
|
| 260 |
+
NLI: entailment [E=0.985 N=0.013 C=0.001]
|
| 261 |
+
|
| 262 |
+
P: a person is swimming in the ocean
|
| 263 |
+
H: nobody is in the water
|
| 264 |
+
Pooled cos: 0.778
|
| 265 |
+
NLI: contradiction [E=0.059 N=0.004 C=0.937]
|
| 266 |
+
|
| 267 |
+
=================================================================
|
| 268 |
+
SUMMARY
|
| 269 |
+
=================================================================
|
| 270 |
+
Best val accuracy: 0.7634
|
| 271 |
+
Head params: 1,808,657
|
| 272 |
+
Paths: 16
|
| 273 |
+
Components: 5 β d_path=256
|
| 274 |
+
Bank present: True
|
| 275 |
+
Saved: nli_conv5d_best.pt
|
| 276 |
+
|
| 277 |
+
=================================================================
|
| 278 |
+
DONE
|
| 279 |
+
=================================================================
|