AbstractPhil commited on
Commit
71075fa
Β·
verified Β·
1 Parent(s): 802f97a

Create nli_head_alignbanked_conv5d_76pct_output.txt

Browse files
training_metrics/nli_head_alignbanked_conv5d_76pct_output.txt ADDED
@@ -0,0 +1,279 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ =================================================================
2
+ NLI HEAD: Compositional Convolution (conv5d)
3
+ =================================================================
4
+ Device: cuda
5
+
6
+ =================================================================
7
+ LOADING MODEL
8
+ =================================================================
9
+ Loading weights: 100%
10
+  112/112 [00:00<00:00, 3881.72it/s, Materializing param=token_emb.weight]
11
+ Model: 32,424,960 params (frozen)
12
+ Bank: present
13
+
14
+ =================================================================
15
+ LOADING SNLI
16
+ =================================================================
17
+ Train: 549,367 Val: 9,842
18
+
19
+ =================================================================
20
+ PRE-ENCODING
21
+ =================================================================
22
+ Encoding: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 537/537 [02:21<00:00, 3.80it/s]
23
+ Encoding: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 10/10 [00:02<00:00, 4.16it/s]
24
+ Enriched: 896 (raw=768 + bank=128)
25
+ Train: 549,000 Val: 9,800
26
+ entailment: 183,293 (33.4%)
27
+ neutral: 182,642 (33.3%)
28
+ contradiction: 183,065 (33.3%)
29
+
30
+ =================================================================
31
+ COMPOSITIONAL CONV NLI HEAD
32
+ =================================================================
33
+ Compositions of 5: 16 paths
34
+ (1, 1, 1, 1, 1)
35
+ (1, 1, 1, 2)
36
+ (1, 1, 2, 1)
37
+ (1, 1, 3)
38
+ (1, 2, 1, 1)
39
+ (1, 2, 2)
40
+ (1, 3, 1)
41
+ (1, 4)
42
+ (2, 1, 1, 1)
43
+ (2, 1, 2)
44
+ (2, 2, 1)
45
+ (2, 3)
46
+ (3, 1, 1)
47
+ (3, 2)
48
+ (4, 1)
49
+ (5,)
50
+ Head params: 1,808,657
51
+
52
+ =================================================================
53
+ TRAINING (20 epochs, 4289 batches/epoch)
54
+ =================================================================
55
+
56
+ E 1: 29s
57
+ Task: loss=0.7502 t_acc=0.6717 v_acc=0.7094 v_loss=0.6855
58
+ Per-class: ent=0.833 neu=0.687 con=0.605
59
+ Paths: (5,)=0.066 (4, 1)=0.065 (2, 1, 2)=0.064 spread=0.0066
60
+ Protos: sim=-0.0341 temp=9.96
61
+ β˜… New best: 0.7094
62
+
63
+ E 2: 29s
64
+ Task: loss=0.6658 t_acc=0.7171 v_acc=0.7282 v_loss=0.6359
65
+ Per-class: ent=0.773 neu=0.663 con=0.747
66
+ Paths: (5,)=0.074 (2, 1, 2)=0.068 (3, 2)=0.067 spread=0.0176
67
+ Protos: sim=-0.0761 temp=9.95
68
+ β˜… New best: 0.7282
69
+
70
+ E 3: 29s
71
+ Task: loss=0.6256 t_acc=0.7370 v_acc=0.7299 v_loss=0.6288
72
+ Per-class: ent=0.740 neu=0.738 con=0.711
73
+ Paths: (5,)=0.083 (2, 1, 2)=0.071 (3, 2)=0.070 spread=0.0314
74
+ Protos: sim=-0.1379 temp=9.98
75
+ β˜… New best: 0.7299
76
+
77
+ E 4: 29s
78
+ Task: loss=0.5921 t_acc=0.7536 v_acc=0.7401 v_loss=0.6132
79
+ Per-class: ent=0.758 neu=0.676 con=0.785
80
+ Paths: (5,)=0.092 (2, 1, 2)=0.073 (1, 4)=0.073 spread=0.0441
81
+ Protos: sim=-0.2124 temp=10.04
82
+ β˜… New best: 0.7401
83
+
84
+ E 5: 29s
85
+ Task: loss=0.5618 t_acc=0.7688 v_acc=0.7457 v_loss=0.6052
86
+ Per-class: ent=0.797 neu=0.692 con=0.747
87
+ Paths: (5,)=0.100 (1, 4)=0.083 (2, 1, 2)=0.074 spread=0.0552
88
+ Protos: sim=-0.2886 temp=10.14
89
+ β˜… New best: 0.7457
90
+
91
+ E 6: 29s
92
+ Task: loss=0.5313 t_acc=0.7834 v_acc=0.7540 v_loss=0.5993
93
+ Per-class: ent=0.786 neu=0.721 con=0.754
94
+ Paths: (5,)=0.106 (1, 4)=0.095 (2, 3)=0.075 spread=0.0640
95
+ Protos: sim=-0.3571 temp=10.27
96
+ β˜… New best: 0.7540
97
+
98
+ E 7: 29s
99
+ Task: loss=0.5010 t_acc=0.7977 v_acc=0.7603 v_loss=0.5865
100
+ Per-class: ent=0.803 neu=0.721 con=0.755
101
+ Paths: (5,)=0.109 (1, 4)=0.109 (2, 3)=0.079 spread=0.0708
102
+ Protos: sim=-0.4113 temp=10.44
103
+ β˜… New best: 0.7603
104
+
105
+ E 8: 29s
106
+ Task: loss=0.4705 t_acc=0.8131 v_acc=0.7563 v_loss=0.5946
107
+ Per-class: ent=0.792 neu=0.695 con=0.781
108
+ Paths: (1, 4)=0.122 (5,)=0.112 (2, 3)=0.082 spread=0.0864
109
+ Protos: sim=-0.4490 temp=10.62
110
+
111
+ E 9: 29s
112
+ Task: loss=0.4413 t_acc=0.8273 v_acc=0.7600 v_loss=0.5955
113
+ Per-class: ent=0.795 neu=0.719 con=0.765
114
+ Paths: (1, 4)=0.135 (5,)=0.113 (2, 3)=0.085 spread=0.1014
115
+ Protos: sim=-0.4716 temp=10.80
116
+
117
+ E10: 29s
118
+ Task: loss=0.4135 t_acc=0.8419 v_acc=0.7609 v_loss=0.5967
119
+ Per-class: ent=0.780 neu=0.718 con=0.784
120
+ Paths: (1, 4)=0.146 (5,)=0.113 (2, 3)=0.087 spread=0.1141
121
+ Protos: sim=-0.4840 temp=10.98
122
+ β˜… New best: 0.7609
123
+
124
+ E11: 29s
125
+ Task: loss=0.3878 t_acc=0.8552 v_acc=0.7602 v_loss=0.5929
126
+ Per-class: ent=0.791 neu=0.730 con=0.759
127
+ Paths: (1, 4)=0.155 (5,)=0.113 (2, 3)=0.089 spread=0.1241
128
+ Protos: sim=-0.4904 temp=11.14
129
+
130
+ E12: 29s
131
+ Task: loss=0.3643 t_acc=0.8680 v_acc=0.7634 v_loss=0.5950
132
+ Per-class: ent=0.795 neu=0.736 con=0.758
133
+ Paths: (1, 4)=0.161 (5,)=0.113 (2, 3)=0.091 spread=0.1317
134
+ Protos: sim=-0.4938 temp=11.28
135
+ β˜… New best: 0.7634
136
+
137
+ E13: 29s
138
+ Task: loss=0.3438 t_acc=0.8795 v_acc=0.7597 v_loss=0.6002
139
+ Per-class: ent=0.804 neu=0.722 con=0.752
140
+ Paths: (1, 4)=0.166 (5,)=0.113 (2, 3)=0.092 spread=0.1373
141
+ Protos: sim=-0.4958 temp=11.39
142
+
143
+ E14: 29s
144
+ Task: loss=0.3263 t_acc=0.8899 v_acc=0.7604 v_loss=0.6013
145
+ Per-class: ent=0.797 neu=0.718 con=0.765
146
+ Paths: (1, 4)=0.169 (5,)=0.113 (2, 3)=0.093 spread=0.1412
147
+ Protos: sim=-0.4969 temp=11.48
148
+
149
+ E15: 29s
150
+ Task: loss=0.3110 t_acc=0.8987 v_acc=0.7587 v_loss=0.6004
151
+ Per-class: ent=0.797 neu=0.720 con=0.758
152
+ Paths: (1, 4)=0.171 (5,)=0.113 (2, 3)=0.093 spread=0.1439
153
+ Protos: sim=-0.4976 temp=11.55
154
+
155
+ E16: 29s
156
+ Task: loss=0.2988 t_acc=0.9056 v_acc=0.7621 v_loss=0.6004
157
+ Per-class: ent=0.794 neu=0.732 con=0.760
158
+ Paths: (1, 4)=0.173 (5,)=0.113 (2, 3)=0.094 spread=0.1458
159
+ Protos: sim=-0.4980 temp=11.60
160
+
161
+ E17: 29s
162
+ Task: loss=0.2892 t_acc=0.9115 v_acc=0.7615 v_loss=0.6010
163
+ Per-class: ent=0.792 neu=0.732 con=0.759
164
+ Paths: (1, 4)=0.174 (5,)=0.113 (2, 3)=0.094 spread=0.1469
165
+ Protos: sim=-0.4982 temp=11.63
166
+
167
+ E18: 29s
168
+ Task: loss=0.2824 t_acc=0.9159 v_acc=0.7601 v_loss=0.6022
169
+ Per-class: ent=0.792 neu=0.724 con=0.764
170
+ Paths: (1, 4)=0.174 (5,)=0.113 (2, 3)=0.094 spread=0.1475
171
+ Protos: sim=-0.4984 temp=11.64
172
+
173
+ E19: 29s
174
+ Task: loss=0.2778 t_acc=0.9186 v_acc=0.7611 v_loss=0.6022
175
+ Per-class: ent=0.789 neu=0.732 con=0.762
176
+ Paths: (1, 4)=0.174 (5,)=0.113 (2, 3)=0.094 spread=0.1478
177
+ Protos: sim=-0.4984 temp=11.65
178
+
179
+ E20: 29s
180
+ Task: loss=0.2754 t_acc=0.9199 v_acc=0.7606 v_loss=0.6020
181
+ Per-class: ent=0.790 neu=0.728 con=0.763
182
+ Paths: (1, 4)=0.174 (5,)=0.113 (2, 3)=0.094 spread=0.1480
183
+ Protos: sim=-0.4985 temp=11.66
184
+
185
+ =================================================================
186
+ PATH WEIGHT ANALYSIS
187
+ =================================================================
188
+
189
+ Path Weight Type
190
+ --------------------------------------------------
191
+ (1, 4) 0.1609 geoβ†’rest β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
192
+ (5,) 0.1133 holistic β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
193
+ (2, 3) 0.0905 geo+structβ†’... β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
194
+ (3, 2) 0.0717 geo-first β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
195
+ (2, 1, 2) 0.0677 geo+structβ†’... β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
196
+ (1, 1, 3) 0.0669 geoβ†’rest β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
197
+ (1, 1, 1, 2) 0.0552 geoβ†’rest β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
198
+ (4, 1) 0.0549 geo-first β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
199
+ (2, 2, 1) 0.0471 geo+structβ†’... β–ˆβ–ˆβ–ˆβ–ˆ
200
+ (3, 1, 1) 0.0468 geo-first β–ˆβ–ˆβ–ˆβ–ˆ
201
+ (1, 2, 2) 0.0434 geoβ†’rest β–ˆβ–ˆβ–ˆβ–ˆ
202
+ (2, 1, 1, 1) 0.0422 geo+structβ†’... β–ˆβ–ˆβ–ˆβ–ˆ
203
+ (1, 3, 1) 0.0380 geoβ†’rest β–ˆβ–ˆβ–ˆ
204
+ (1, 1, 2, 1) 0.0370 geoβ†’rest β–ˆβ–ˆβ–ˆ
205
+ (1, 1, 1, 1, 1) 0.0351 independent β–ˆβ–ˆβ–ˆ
206
+ (1, 2, 1, 1) 0.0292 geoβ†’rest β–ˆβ–ˆ
207
+
208
+ =================================================================
209
+ COMPOSITIONAL ORDER TEST
210
+ =================================================================
211
+ A new version of the following files was downloaded from https://huggingface.co/AbstractPhil/geolip-captionbert-8192:
212
+ - modeling_caption_bert.py
213
+ . Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
214
+ Loading weights: 100%
215
+  112/112 [00:00<00:00, 3890.69it/s, Materializing param=token_emb.weight]
216
+
217
+ P: a potato on top of a table
218
+ H: a table on top of a potato
219
+ Pooled cos: 0.987
220
+ NLI: entailment [E=0.507 N=0.056 C=0.437]
221
+
222
+ P: a potato on top of a table
223
+ H: there is a potato
224
+ Pooled cos: 0.502
225
+ NLI: entailment [E=0.842 N=0.093 C=0.065]
226
+
227
+ P: a cat is sitting on a mat
228
+ H: a mat is sitting on a cat
229
+ Pooled cos: 0.993
230
+ NLI: entailment [E=0.917 N=0.014 C=0.070]
231
+
232
+ P: a dog chased the cat
233
+ H: the cat chased the dog
234
+ Pooled cos: 0.977
235
+ NLI: entailment [E=0.427 N=0.201 C=0.372]
236
+
237
+ P: a woman is holding a baby
238
+ H: a baby is holding a woman
239
+ Pooled cos: 0.996
240
+ NLI: entailment [E=0.991 N=0.005 C=0.004]
241
+
242
+ P: the boy kicked the ball
243
+ H: the ball kicked the boy
244
+ Pooled cos: 0.986
245
+ NLI: contradiction [E=0.462 N=0.014 C=0.524]
246
+
247
+ P: a man is riding a horse
248
+ H: a horse is riding a man
249
+ Pooled cos: 0.995
250
+ NLI: entailment [E=0.862 N=0.004 C=0.134]
251
+
252
+ P: a girl is painting a picture
253
+ H: a girl is creating art
254
+ Pooled cos: 0.796
255
+ NLI: neutral [E=0.313 N=0.596 C=0.092]
256
+
257
+ P: two dogs are playing in a park
258
+ H: animals are outdoors
259
+ Pooled cos: 0.676
260
+ NLI: entailment [E=0.985 N=0.013 C=0.001]
261
+
262
+ P: a person is swimming in the ocean
263
+ H: nobody is in the water
264
+ Pooled cos: 0.778
265
+ NLI: contradiction [E=0.059 N=0.004 C=0.937]
266
+
267
+ =================================================================
268
+ SUMMARY
269
+ =================================================================
270
+ Best val accuracy: 0.7634
271
+ Head params: 1,808,657
272
+ Paths: 16
273
+ Components: 5 β†’ d_path=256
274
+ Bank present: True
275
+ Saved: nli_conv5d_best.pt
276
+
277
+ =================================================================
278
+ DONE
279
+ =================================================================