Update trainers/trainer_alignment_base.py
Browse files
trainers/trainer_alignment_base.py
CHANGED
|
@@ -8,130 +8,6 @@
|
|
| 8 |
# 4. Train small standalone transformer from scratch
|
| 9 |
# 5. No expert models needed at inference
|
| 10 |
# ============================================================================
|
| 11 |
-
"""
|
| 12 |
-
|
| 13 |
-
Conclusion, this trainer is invalid. It cannot conform the system with simply crossentropy alone, it requires procrustes whiteneing on every inteernalized assesment,
|
| 14 |
-
as each assement causes misalignment from the spectral scope without the 5 point expert paradigm.
|
| 15 |
-
=================================================================
|
| 16 |
-
NLI HEAD TRAINING
|
| 17 |
-
=================================================================
|
| 18 |
-
|
| 19 |
-
Loading backbone...
|
| 20 |
-
config.json: 100%
|
| 21 |
-
938/938 [00:00<00:00, 298kB/s]
|
| 22 |
-
modeling_caption_bert.py:
|
| 23 |
-
6.62k/? [00:00<00:00, 2.23MB/s]
|
| 24 |
-
A new version of the following files was downloaded from https://huggingface.co/AbstractPhil/geolip-captionbert-8192:
|
| 25 |
-
- modeling_caption_bert.py
|
| 26 |
-
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
|
| 27 |
-
model.safetensors: 100%
|
| 28 |
-
104M/104M [00:05<00:00, 34.3MB/s]
|
| 29 |
-
Loading weights: 100%
|
| 30 |
-
82/82 [00:00<00:00, 3678.27it/s, Materializing param=token_emb.weight]
|
| 31 |
-
tokenizer_config.json: 100%
|
| 32 |
-
322/322 [00:00<00:00, 108kB/s]
|
| 33 |
-
tokenizer.json:
|
| 34 |
-
711k/? [00:00<00:00, 9.48MB/s]
|
| 35 |
-
Backbone: 25,958,016 params (frozen)
|
| 36 |
-
|
| 37 |
-
Loading SNLI...
|
| 38 |
-
README.md:
|
| 39 |
-
16.0k/? [00:00<00:00, 4.95MB/s]
|
| 40 |
-
plain_text/test-00000-of-00001.parquet: 100%
|
| 41 |
-
412k/412k [00:00<00:00, 2.06MB/s]
|
| 42 |
-
plain_text/validation-00000-of-00001.par(…): 100%
|
| 43 |
-
413k/413k [00:00<00:00, 2.07MB/s]
|
| 44 |
-
plain_text/train-00000-of-00001.parquet: 100%
|
| 45 |
-
19.6M/19.6M [00:00<00:00, 98.2MB/s]
|
| 46 |
-
Generating test split: 100%
|
| 47 |
-
10000/10000 [00:00<00:00, 291773.61 examples/s]
|
| 48 |
-
Generating validation split: 100%
|
| 49 |
-
10000/10000 [00:00<00:00, 1825276.99 examples/s]
|
| 50 |
-
Generating train split: 100%
|
| 51 |
-
550152/550152 [00:00<00:00, 6170326.70 examples/s]
|
| 52 |
-
Filter: 100%
|
| 53 |
-
550152/550152 [00:00<00:00, 692748.73 examples/s]
|
| 54 |
-
Filter: 100%
|
| 55 |
-
10000/10000 [00:00<00:00, 514001.54 examples/s]
|
| 56 |
-
Train: 549,367 Val: 9,842
|
| 57 |
-
|
| 58 |
-
Pre-encoding with frozen backbone...
|
| 59 |
-
Encoding: 100%|██████████| 391/391 [00:33<00:00, 11.84it/s]
|
| 60 |
-
Encoding: 100%|██████████| 39/39 [00:03<00:00, 12.52it/s]
|
| 61 |
-
|
| 62 |
-
=================================================================
|
| 63 |
-
NLI HEAD
|
| 64 |
-
=================================================================
|
| 65 |
-
Parameters: 7,427,715
|
| 66 |
-
Epochs: 10
|
| 67 |
-
Batch size: 128
|
| 68 |
-
Batches/epoch: 781
|
| 69 |
-
|
| 70 |
-
=================================================================
|
| 71 |
-
TRAINING (10 epochs)
|
| 72 |
-
=================================================================
|
| 73 |
-
E 1: 16s loss=0.8299 t_acc=0.6237 v_loss=0.7563 v_acc=0.6675
|
| 74 |
-
E 2: 16s loss=0.6971 t_acc=0.7043 v_loss=0.6849 v_acc=0.7179
|
| 75 |
-
E 3: 16s loss=0.6380 t_acc=0.7357 v_loss=0.6430 v_acc=0.7349
|
| 76 |
-
E 4: 16s loss=0.5846 t_acc=0.7619 v_loss=0.6198 v_acc=0.7479
|
| 77 |
-
E 5: 16s loss=0.5287 t_acc=0.7876 v_loss=0.6282 v_acc=0.7460
|
| 78 |
-
E 6: 16s loss=0.4652 t_acc=0.8169 v_loss=0.6321 v_acc=0.7542
|
| 79 |
-
E 7: 16s loss=0.3938 t_acc=0.8488 v_loss=0.6682 v_acc=0.7533
|
| 80 |
-
E 8: 16s loss=0.3255 t_acc=0.8778 v_loss=0.7224 v_acc=0.7525
|
| 81 |
-
E 9: 16s loss=0.2754 t_acc=0.9001 v_loss=0.7758 v_acc=0.7489
|
| 82 |
-
E10: 16s loss=0.2503 t_acc=0.9110 v_loss=0.8039 v_acc=0.7491
|
| 83 |
-
|
| 84 |
-
=================================================================
|
| 85 |
-
COMPOSITIONAL ORDER TEST
|
| 86 |
-
=================================================================
|
| 87 |
-
Loading weights: 100%
|
| 88 |
-
82/82 [00:00<00:00, 3646.91it/s, Materializing param=token_emb.weight]
|
| 89 |
-
|
| 90 |
-
P: a potato on top of a table
|
| 91 |
-
H: a table on top of a potato
|
| 92 |
-
Pooled cos: 0.987 (order-blind)
|
| 93 |
-
NLI: entailment [E=0.838 N=0.052 C=0.110]
|
| 94 |
-
|
| 95 |
-
P: a potato on top of a table
|
| 96 |
-
H: there is a potato
|
| 97 |
-
Pooled cos: 0.502 (order-blind)
|
| 98 |
-
NLI: entailment [E=0.900 N=0.082 C=0.018]
|
| 99 |
-
|
| 100 |
-
P: a cat is sitting on a mat
|
| 101 |
-
H: a mat is sitting on a cat
|
| 102 |
-
Pooled cos: 0.993 (order-blind)
|
| 103 |
-
NLI: entailment [E=0.792 N=0.148 C=0.060]
|
| 104 |
-
|
| 105 |
-
P: a dog chased the cat
|
| 106 |
-
H: the cat chased the dog
|
| 107 |
-
Pooled cos: 0.977 (order-blind)
|
| 108 |
-
NLI: entailment [E=0.588 N=0.204 C=0.208]
|
| 109 |
-
|
| 110 |
-
P: a woman is holding a baby
|
| 111 |
-
H: a baby is holding a woman
|
| 112 |
-
Pooled cos: 0.996 (order-blind)
|
| 113 |
-
NLI: entailment [E=0.913 N=0.045 C=0.041]
|
| 114 |
-
|
| 115 |
-
P: the boy kicked the ball
|
| 116 |
-
H: the ball kicked the boy
|
| 117 |
-
Pooled cos: 0.986 (order-blind)
|
| 118 |
-
NLI: entailment [E=0.684 N=0.133 C=0.183]
|
| 119 |
-
|
| 120 |
-
P: a man is riding a horse
|
| 121 |
-
H: a horse is riding a man
|
| 122 |
-
Pooled cos: 0.995 (order-blind)
|
| 123 |
-
NLI: entailment [E=0.859 N=0.075 C=0.066]
|
| 124 |
-
|
| 125 |
-
Best val accuracy: 0.7542
|
| 126 |
-
|
| 127 |
-
=================================================================
|
| 128 |
-
DONE
|
| 129 |
-
=================================================================
|
| 130 |
-
"""
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
|
| 135 |
import math
|
| 136 |
import os
|
| 137 |
import time
|
|
|
|
| 8 |
# 4. Train small standalone transformer from scratch
|
| 9 |
# 5. No expert models needed at inference
|
| 10 |
# ============================================================================
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
import math
|
| 12 |
import os
|
| 13 |
import time
|