AbstractPhil commited on
Commit
ce06bae
·
verified ·
1 Parent(s): f8f95e1

Update trainers/nil_head_trainer.py

Browse files
Files changed (1) hide show
  1. trainers/nil_head_trainer.py +122 -0
trainers/nil_head_trainer.py CHANGED
@@ -30,6 +30,128 @@
30
  # premise_mask, hyp_mask)
31
  # ============================================================================
32
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  import torch
34
  import torch.nn as nn
35
  import torch.nn.functional as F
 
30
  # premise_mask, hyp_mask)
31
  # ============================================================================
32
 
33
+ """
34
+
35
+ Conclusion, this trainer is invalid. It cannot conform the system with simply crossentropy alone, it requires procrustes whiteneing on every inteernalized assesment,
36
+ as each assement causes misalignment from the spectral scope without the 5 point expert paradigm.
37
+ =================================================================
38
+ NLI HEAD TRAINING
39
+ =================================================================
40
+
41
+ Loading backbone...
42
+ config.json: 100%
43
+  938/938 [00:00<00:00, 298kB/s]
44
+ modeling_caption_bert.py: 
45
+  6.62k/? [00:00<00:00, 2.23MB/s]
46
+ A new version of the following files was downloaded from https://huggingface.co/AbstractPhil/geolip-captionbert-8192:
47
+ - modeling_caption_bert.py
48
+ . Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
49
+ model.safetensors: 100%
50
+  104M/104M [00:05<00:00, 34.3MB/s]
51
+ Loading weights: 100%
52
+  82/82 [00:00<00:00, 3678.27it/s, Materializing param=token_emb.weight]
53
+ tokenizer_config.json: 100%
54
+  322/322 [00:00<00:00, 108kB/s]
55
+ tokenizer.json: 
56
+  711k/? [00:00<00:00, 9.48MB/s]
57
+ Backbone: 25,958,016 params (frozen)
58
+
59
+ Loading SNLI...
60
+ README.md: 
61
+  16.0k/? [00:00<00:00, 4.95MB/s]
62
+ plain_text/test-00000-of-00001.parquet: 100%
63
+  412k/412k [00:00<00:00, 2.06MB/s]
64
+ plain_text/validation-00000-of-00001.par(…): 100%
65
+  413k/413k [00:00<00:00, 2.07MB/s]
66
+ plain_text/train-00000-of-00001.parquet: 100%
67
+  19.6M/19.6M [00:00<00:00, 98.2MB/s]
68
+ Generating test split: 100%
69
+  10000/10000 [00:00<00:00, 291773.61 examples/s]
70
+ Generating validation split: 100%
71
+  10000/10000 [00:00<00:00, 1825276.99 examples/s]
72
+ Generating train split: 100%
73
+  550152/550152 [00:00<00:00, 6170326.70 examples/s]
74
+ Filter: 100%
75
+  550152/550152 [00:00<00:00, 692748.73 examples/s]
76
+ Filter: 100%
77
+  10000/10000 [00:00<00:00, 514001.54 examples/s]
78
+ Train: 549,367 Val: 9,842
79
+
80
+ Pre-encoding with frozen backbone...
81
+ Encoding: 100%|██████████| 391/391 [00:33<00:00, 11.84it/s]
82
+ Encoding: 100%|██████████| 39/39 [00:03<00:00, 12.52it/s]
83
+
84
+ =================================================================
85
+ NLI HEAD
86
+ =================================================================
87
+ Parameters: 7,427,715
88
+ Epochs: 10
89
+ Batch size: 128
90
+ Batches/epoch: 781
91
+
92
+ =================================================================
93
+ TRAINING (10 epochs)
94
+ =================================================================
95
+ E 1: 16s loss=0.8299 t_acc=0.6237 v_loss=0.7563 v_acc=0.6675
96
+ E 2: 16s loss=0.6971 t_acc=0.7043 v_loss=0.6849 v_acc=0.7179
97
+ E 3: 16s loss=0.6380 t_acc=0.7357 v_loss=0.6430 v_acc=0.7349
98
+ E 4: 16s loss=0.5846 t_acc=0.7619 v_loss=0.6198 v_acc=0.7479
99
+ E 5: 16s loss=0.5287 t_acc=0.7876 v_loss=0.6282 v_acc=0.7460
100
+ E 6: 16s loss=0.4652 t_acc=0.8169 v_loss=0.6321 v_acc=0.7542
101
+ E 7: 16s loss=0.3938 t_acc=0.8488 v_loss=0.6682 v_acc=0.7533
102
+ E 8: 16s loss=0.3255 t_acc=0.8778 v_loss=0.7224 v_acc=0.7525
103
+ E 9: 16s loss=0.2754 t_acc=0.9001 v_loss=0.7758 v_acc=0.7489
104
+ E10: 16s loss=0.2503 t_acc=0.9110 v_loss=0.8039 v_acc=0.7491
105
+
106
+ =================================================================
107
+ COMPOSITIONAL ORDER TEST
108
+ =================================================================
109
+ Loading weights: 100%
110
+  82/82 [00:00<00:00, 3646.91it/s, Materializing param=token_emb.weight]
111
+
112
+ P: a potato on top of a table
113
+ H: a table on top of a potato
114
+ Pooled cos: 0.987 (order-blind)
115
+ NLI: entailment [E=0.838 N=0.052 C=0.110]
116
+
117
+ P: a potato on top of a table
118
+ H: there is a potato
119
+ Pooled cos: 0.502 (order-blind)
120
+ NLI: entailment [E=0.900 N=0.082 C=0.018]
121
+
122
+ P: a cat is sitting on a mat
123
+ H: a mat is sitting on a cat
124
+ Pooled cos: 0.993 (order-blind)
125
+ NLI: entailment [E=0.792 N=0.148 C=0.060]
126
+
127
+ P: a dog chased the cat
128
+ H: the cat chased the dog
129
+ Pooled cos: 0.977 (order-blind)
130
+ NLI: entailment [E=0.588 N=0.204 C=0.208]
131
+
132
+ P: a woman is holding a baby
133
+ H: a baby is holding a woman
134
+ Pooled cos: 0.996 (order-blind)
135
+ NLI: entailment [E=0.913 N=0.045 C=0.041]
136
+
137
+ P: the boy kicked the ball
138
+ H: the ball kicked the boy
139
+ Pooled cos: 0.986 (order-blind)
140
+ NLI: entailment [E=0.684 N=0.133 C=0.183]
141
+
142
+ P: a man is riding a horse
143
+ H: a horse is riding a man
144
+ Pooled cos: 0.995 (order-blind)
145
+ NLI: entailment [E=0.859 N=0.075 C=0.066]
146
+
147
+ Best val accuracy: 0.7542
148
+
149
+ =================================================================
150
+ DONE
151
+ =================================================================
152
+ """
153
+
154
+
155
  import torch
156
  import torch.nn as nn
157
  import torch.nn.functional as F