AbstractPhil commited on
Commit
f8f95e1
·
verified ·
1 Parent(s): d6db93e

Update trainers/trainer_alignment_base.py

Browse files
Files changed (1) hide show
  1. trainers/trainer_alignment_base.py +0 -124
trainers/trainer_alignment_base.py CHANGED
@@ -8,130 +8,6 @@
8
  # 4. Train small standalone transformer from scratch
9
  # 5. No expert models needed at inference
10
  # ============================================================================
11
- """
12
-
13
- Conclusion, this trainer is invalid. It cannot conform the system with simply crossentropy alone, it requires procrustes whiteneing on every inteernalized assesment,
14
- as each assement causes misalignment from the spectral scope without the 5 point expert paradigm.
15
- =================================================================
16
- NLI HEAD TRAINING
17
- =================================================================
18
-
19
- Loading backbone...
20
- config.json: 100%
21
-  938/938 [00:00<00:00, 298kB/s]
22
- modeling_caption_bert.py: 
23
-  6.62k/? [00:00<00:00, 2.23MB/s]
24
- A new version of the following files was downloaded from https://huggingface.co/AbstractPhil/geolip-captionbert-8192:
25
- - modeling_caption_bert.py
26
- . Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
27
- model.safetensors: 100%
28
-  104M/104M [00:05<00:00, 34.3MB/s]
29
- Loading weights: 100%
30
-  82/82 [00:00<00:00, 3678.27it/s, Materializing param=token_emb.weight]
31
- tokenizer_config.json: 100%
32
-  322/322 [00:00<00:00, 108kB/s]
33
- tokenizer.json: 
34
-  711k/? [00:00<00:00, 9.48MB/s]
35
- Backbone: 25,958,016 params (frozen)
36
-
37
- Loading SNLI...
38
- README.md: 
39
-  16.0k/? [00:00<00:00, 4.95MB/s]
40
- plain_text/test-00000-of-00001.parquet: 100%
41
-  412k/412k [00:00<00:00, 2.06MB/s]
42
- plain_text/validation-00000-of-00001.par(…): 100%
43
-  413k/413k [00:00<00:00, 2.07MB/s]
44
- plain_text/train-00000-of-00001.parquet: 100%
45
-  19.6M/19.6M [00:00<00:00, 98.2MB/s]
46
- Generating test split: 100%
47
-  10000/10000 [00:00<00:00, 291773.61 examples/s]
48
- Generating validation split: 100%
49
-  10000/10000 [00:00<00:00, 1825276.99 examples/s]
50
- Generating train split: 100%
51
-  550152/550152 [00:00<00:00, 6170326.70 examples/s]
52
- Filter: 100%
53
-  550152/550152 [00:00<00:00, 692748.73 examples/s]
54
- Filter: 100%
55
-  10000/10000 [00:00<00:00, 514001.54 examples/s]
56
- Train: 549,367 Val: 9,842
57
-
58
- Pre-encoding with frozen backbone...
59
- Encoding: 100%|██████████| 391/391 [00:33<00:00, 11.84it/s]
60
- Encoding: 100%|██████████| 39/39 [00:03<00:00, 12.52it/s]
61
-
62
- =================================================================
63
- NLI HEAD
64
- =================================================================
65
- Parameters: 7,427,715
66
- Epochs: 10
67
- Batch size: 128
68
- Batches/epoch: 781
69
-
70
- =================================================================
71
- TRAINING (10 epochs)
72
- =================================================================
73
- E 1: 16s loss=0.8299 t_acc=0.6237 v_loss=0.7563 v_acc=0.6675
74
- E 2: 16s loss=0.6971 t_acc=0.7043 v_loss=0.6849 v_acc=0.7179
75
- E 3: 16s loss=0.6380 t_acc=0.7357 v_loss=0.6430 v_acc=0.7349
76
- E 4: 16s loss=0.5846 t_acc=0.7619 v_loss=0.6198 v_acc=0.7479
77
- E 5: 16s loss=0.5287 t_acc=0.7876 v_loss=0.6282 v_acc=0.7460
78
- E 6: 16s loss=0.4652 t_acc=0.8169 v_loss=0.6321 v_acc=0.7542
79
- E 7: 16s loss=0.3938 t_acc=0.8488 v_loss=0.6682 v_acc=0.7533
80
- E 8: 16s loss=0.3255 t_acc=0.8778 v_loss=0.7224 v_acc=0.7525
81
- E 9: 16s loss=0.2754 t_acc=0.9001 v_loss=0.7758 v_acc=0.7489
82
- E10: 16s loss=0.2503 t_acc=0.9110 v_loss=0.8039 v_acc=0.7491
83
-
84
- =================================================================
85
- COMPOSITIONAL ORDER TEST
86
- =================================================================
87
- Loading weights: 100%
88
-  82/82 [00:00<00:00, 3646.91it/s, Materializing param=token_emb.weight]
89
-
90
- P: a potato on top of a table
91
- H: a table on top of a potato
92
- Pooled cos: 0.987 (order-blind)
93
- NLI: entailment [E=0.838 N=0.052 C=0.110]
94
-
95
- P: a potato on top of a table
96
- H: there is a potato
97
- Pooled cos: 0.502 (order-blind)
98
- NLI: entailment [E=0.900 N=0.082 C=0.018]
99
-
100
- P: a cat is sitting on a mat
101
- H: a mat is sitting on a cat
102
- Pooled cos: 0.993 (order-blind)
103
- NLI: entailment [E=0.792 N=0.148 C=0.060]
104
-
105
- P: a dog chased the cat
106
- H: the cat chased the dog
107
- Pooled cos: 0.977 (order-blind)
108
- NLI: entailment [E=0.588 N=0.204 C=0.208]
109
-
110
- P: a woman is holding a baby
111
- H: a baby is holding a woman
112
- Pooled cos: 0.996 (order-blind)
113
- NLI: entailment [E=0.913 N=0.045 C=0.041]
114
-
115
- P: the boy kicked the ball
116
- H: the ball kicked the boy
117
- Pooled cos: 0.986 (order-blind)
118
- NLI: entailment [E=0.684 N=0.133 C=0.183]
119
-
120
- P: a man is riding a horse
121
- H: a horse is riding a man
122
- Pooled cos: 0.995 (order-blind)
123
- NLI: entailment [E=0.859 N=0.075 C=0.066]
124
-
125
- Best val accuracy: 0.7542
126
-
127
- =================================================================
128
- DONE
129
- =================================================================
130
- """
131
-
132
-
133
-
134
-
135
  import math
136
  import os
137
  import time
 
8
  # 4. Train small standalone transformer from scratch
9
  # 5. No expert models needed at inference
10
  # ============================================================================
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  import math
12
  import os
13
  import time