AbstractPhil commited on
Commit
d6db93e
·
verified ·
1 Parent(s): 6724fce

Update trainers/trainer_alignment_base.py

Browse files
Files changed (1) hide show
  1. trainers/trainer_alignment_base.py +123 -0
trainers/trainer_alignment_base.py CHANGED
@@ -8,6 +8,129 @@
8
  # 4. Train small standalone transformer from scratch
9
  # 5. No expert models needed at inference
10
  # ============================================================================
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
  import math
13
  import os
 
8
  # 4. Train small standalone transformer from scratch
9
  # 5. No expert models needed at inference
10
  # ============================================================================
11
+ """
12
+
13
+ Conclusion, this trainer is invalid. It cannot conform the system with simply crossentropy alone, it requires procrustes whiteneing on every inteernalized assesment,
14
+ as each assement causes misalignment from the spectral scope without the 5 point expert paradigm.
15
+ =================================================================
16
+ NLI HEAD TRAINING
17
+ =================================================================
18
+
19
+ Loading backbone...
20
+ config.json: 100%
21
+  938/938 [00:00<00:00, 298kB/s]
22
+ modeling_caption_bert.py: 
23
+  6.62k/? [00:00<00:00, 2.23MB/s]
24
+ A new version of the following files was downloaded from https://huggingface.co/AbstractPhil/geolip-captionbert-8192:
25
+ - modeling_caption_bert.py
26
+ . Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
27
+ model.safetensors: 100%
28
+  104M/104M [00:05<00:00, 34.3MB/s]
29
+ Loading weights: 100%
30
+  82/82 [00:00<00:00, 3678.27it/s, Materializing param=token_emb.weight]
31
+ tokenizer_config.json: 100%
32
+  322/322 [00:00<00:00, 108kB/s]
33
+ tokenizer.json: 
34
+  711k/? [00:00<00:00, 9.48MB/s]
35
+ Backbone: 25,958,016 params (frozen)
36
+
37
+ Loading SNLI...
38
+ README.md: 
39
+  16.0k/? [00:00<00:00, 4.95MB/s]
40
+ plain_text/test-00000-of-00001.parquet: 100%
41
+  412k/412k [00:00<00:00, 2.06MB/s]
42
+ plain_text/validation-00000-of-00001.par(…): 100%
43
+  413k/413k [00:00<00:00, 2.07MB/s]
44
+ plain_text/train-00000-of-00001.parquet: 100%
45
+  19.6M/19.6M [00:00<00:00, 98.2MB/s]
46
+ Generating test split: 100%
47
+  10000/10000 [00:00<00:00, 291773.61 examples/s]
48
+ Generating validation split: 100%
49
+  10000/10000 [00:00<00:00, 1825276.99 examples/s]
50
+ Generating train split: 100%
51
+  550152/550152 [00:00<00:00, 6170326.70 examples/s]
52
+ Filter: 100%
53
+  550152/550152 [00:00<00:00, 692748.73 examples/s]
54
+ Filter: 100%
55
+  10000/10000 [00:00<00:00, 514001.54 examples/s]
56
+ Train: 549,367 Val: 9,842
57
+
58
+ Pre-encoding with frozen backbone...
59
+ Encoding: 100%|██████████| 391/391 [00:33<00:00, 11.84it/s]
60
+ Encoding: 100%|██████████| 39/39 [00:03<00:00, 12.52it/s]
61
+
62
+ =================================================================
63
+ NLI HEAD
64
+ =================================================================
65
+ Parameters: 7,427,715
66
+ Epochs: 10
67
+ Batch size: 128
68
+ Batches/epoch: 781
69
+
70
+ =================================================================
71
+ TRAINING (10 epochs)
72
+ =================================================================
73
+ E 1: 16s loss=0.8299 t_acc=0.6237 v_loss=0.7563 v_acc=0.6675
74
+ E 2: 16s loss=0.6971 t_acc=0.7043 v_loss=0.6849 v_acc=0.7179
75
+ E 3: 16s loss=0.6380 t_acc=0.7357 v_loss=0.6430 v_acc=0.7349
76
+ E 4: 16s loss=0.5846 t_acc=0.7619 v_loss=0.6198 v_acc=0.7479
77
+ E 5: 16s loss=0.5287 t_acc=0.7876 v_loss=0.6282 v_acc=0.7460
78
+ E 6: 16s loss=0.4652 t_acc=0.8169 v_loss=0.6321 v_acc=0.7542
79
+ E 7: 16s loss=0.3938 t_acc=0.8488 v_loss=0.6682 v_acc=0.7533
80
+ E 8: 16s loss=0.3255 t_acc=0.8778 v_loss=0.7224 v_acc=0.7525
81
+ E 9: 16s loss=0.2754 t_acc=0.9001 v_loss=0.7758 v_acc=0.7489
82
+ E10: 16s loss=0.2503 t_acc=0.9110 v_loss=0.8039 v_acc=0.7491
83
+
84
+ =================================================================
85
+ COMPOSITIONAL ORDER TEST
86
+ =================================================================
87
+ Loading weights: 100%
88
+  82/82 [00:00<00:00, 3646.91it/s, Materializing param=token_emb.weight]
89
+
90
+ P: a potato on top of a table
91
+ H: a table on top of a potato
92
+ Pooled cos: 0.987 (order-blind)
93
+ NLI: entailment [E=0.838 N=0.052 C=0.110]
94
+
95
+ P: a potato on top of a table
96
+ H: there is a potato
97
+ Pooled cos: 0.502 (order-blind)
98
+ NLI: entailment [E=0.900 N=0.082 C=0.018]
99
+
100
+ P: a cat is sitting on a mat
101
+ H: a mat is sitting on a cat
102
+ Pooled cos: 0.993 (order-blind)
103
+ NLI: entailment [E=0.792 N=0.148 C=0.060]
104
+
105
+ P: a dog chased the cat
106
+ H: the cat chased the dog
107
+ Pooled cos: 0.977 (order-blind)
108
+ NLI: entailment [E=0.588 N=0.204 C=0.208]
109
+
110
+ P: a woman is holding a baby
111
+ H: a baby is holding a woman
112
+ Pooled cos: 0.996 (order-blind)
113
+ NLI: entailment [E=0.913 N=0.045 C=0.041]
114
+
115
+ P: the boy kicked the ball
116
+ H: the ball kicked the boy
117
+ Pooled cos: 0.986 (order-blind)
118
+ NLI: entailment [E=0.684 N=0.133 C=0.183]
119
+
120
+ P: a man is riding a horse
121
+ H: a horse is riding a man
122
+ Pooled cos: 0.995 (order-blind)
123
+ NLI: entailment [E=0.859 N=0.075 C=0.066]
124
+
125
+ Best val accuracy: 0.7542
126
+
127
+ =================================================================
128
+ DONE
129
+ =================================================================
130
+ """
131
+
132
+
133
+
134
 
135
  import math
136
  import os