Create bank_training_try1_500k_output.txt
Browse files
training_metrics/bank_training_try1_500k_output.txt
ADDED
|
@@ -0,0 +1,297 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
=================================================================
|
| 2 |
+
ALIGNMENT BANK: 5-Expert CaptionBERT-8192
|
| 3 |
+
=================================================================
|
| 4 |
+
Device: cuda
|
| 5 |
+
Experts: 5
|
| 6 |
+
Anchors: 512
|
| 7 |
+
Bank dim: 128
|
| 8 |
+
|
| 9 |
+
=================================================================
|
| 10 |
+
PHASE 0: EXPERT EMBEDDINGS
|
| 11 |
+
=================================================================
|
| 12 |
+
Loading 500,000 captions...
|
| 13 |
+
Got 500,000 captions
|
| 14 |
+
|
| 15 |
+
Extracting: bert (google-bert/bert-base-uncased, max_len=512)...
|
| 16 |
+
Loading weights: 100%
|
| 17 |
+
199/199 [00:00<00:00, 4325.21it/s, Materializing param=pooler.dense.weight]
|
| 18 |
+
BertModel LOAD REPORT from: google-bert/bert-base-uncased
|
| 19 |
+
Key | Status | |
|
| 20 |
+
-------------------------------------------+------------+--+-
|
| 21 |
+
cls.predictions.transform.dense.weight | UNEXPECTED | |
|
| 22 |
+
cls.predictions.transform.LayerNorm.weight | UNEXPECTED | |
|
| 23 |
+
cls.seq_relationship.weight | UNEXPECTED | |
|
| 24 |
+
cls.seq_relationship.bias | UNEXPECTED | |
|
| 25 |
+
cls.predictions.transform.dense.bias | UNEXPECTED | |
|
| 26 |
+
cls.predictions.transform.LayerNorm.bias | UNEXPECTED | |
|
| 27 |
+
cls.predictions.bias | UNEXPECTED | |
|
| 28 |
+
|
| 29 |
+
Notes:
|
| 30 |
+
- UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
|
| 31 |
+
109,482,240 params
|
| 32 |
+
bert: 100%|██████████| 3907/3907 [09:46<00:00, 6.66it/s]
|
| 33 |
+
Saved: torch.Size([500000, 768])
|
| 34 |
+
|
| 35 |
+
Extracting: modern (answerdotai/ModernBERT-base, max_len=8192)...
|
| 36 |
+
Loading weights: 100%
|
| 37 |
+
134/134 [00:00<00:00, 3173.52it/s, Materializing param=layers.21.mlp_norm.weight]
|
| 38 |
+
ModernBertModel LOAD REPORT from: answerdotai/ModernBERT-base
|
| 39 |
+
Key | Status | |
|
| 40 |
+
------------------+------------+--+-
|
| 41 |
+
decoder.bias | UNEXPECTED | |
|
| 42 |
+
head.dense.weight | UNEXPECTED | |
|
| 43 |
+
head.norm.weight | UNEXPECTED | |
|
| 44 |
+
|
| 45 |
+
Notes:
|
| 46 |
+
- UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
|
| 47 |
+
149,014,272 params
|
| 48 |
+
modern: 100%|██████████| 3907/3907 [15:54<00:00, 4.09it/s]
|
| 49 |
+
Saved: torch.Size([500000, 768])
|
| 50 |
+
|
| 51 |
+
Extracting: roberta (FacebookAI/roberta-base, max_len=512)...
|
| 52 |
+
config.json: 100%
|
| 53 |
+
481/481 [00:00<00:00, 146kB/s]
|
| 54 |
+
model.safetensors: 100%
|
| 55 |
+
499M/499M [00:01<00:00, 557MB/s]
|
| 56 |
+
Loading weights: 100%
|
| 57 |
+
197/197 [00:00<00:00, 4340.38it/s, Materializing param=encoder.layer.11.output.dense.weight]
|
| 58 |
+
RobertaModel LOAD REPORT from: FacebookAI/roberta-base
|
| 59 |
+
Key | Status |
|
| 60 |
+
--------------------------------+------------+-
|
| 61 |
+
lm_head.layer_norm.bias | UNEXPECTED |
|
| 62 |
+
roberta.embeddings.position_ids | UNEXPECTED |
|
| 63 |
+
lm_head.dense.bias | UNEXPECTED |
|
| 64 |
+
lm_head.bias | UNEXPECTED |
|
| 65 |
+
lm_head.dense.weight | UNEXPECTED |
|
| 66 |
+
lm_head.layer_norm.weight | UNEXPECTED |
|
| 67 |
+
pooler.dense.weight | MISSING |
|
| 68 |
+
pooler.dense.bias | MISSING |
|
| 69 |
+
|
| 70 |
+
Notes:
|
| 71 |
+
- UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
|
| 72 |
+
- MISSING :those params were newly initialized because missing from the checkpoint. Consider training on your downstream task.
|
| 73 |
+
tokenizer_config.json: 100%
|
| 74 |
+
25.0/25.0 [00:00<00:00, 8.38kB/s]
|
| 75 |
+
vocab.json:
|
| 76 |
+
899k/? [00:00<00:00, 35.8MB/s]
|
| 77 |
+
merges.txt:
|
| 78 |
+
456k/? [00:00<00:00, 43.1MB/s]
|
| 79 |
+
tokenizer.json:
|
| 80 |
+
1.36M/? [00:00<00:00, 149MB/s]
|
| 81 |
+
124,645,632 params
|
| 82 |
+
roberta: 100%|██████████| 3907/3907 [09:41<00:00, 6.71it/s]
|
| 83 |
+
Saved: torch.Size([500000, 768])
|
| 84 |
+
|
| 85 |
+
Extracting: albert (albert/albert-base-v2, max_len=512)...
|
| 86 |
+
config.json: 100%
|
| 87 |
+
684/684 [00:00<00:00, 205kB/s]
|
| 88 |
+
model.safetensors: 100%
|
| 89 |
+
47.4M/47.4M [00:00<00:00, 236MB/s]
|
| 90 |
+
Loading weights: 100%
|
| 91 |
+
25/25 [00:00<00:00, 2864.57it/s, Materializing param=pooler.weight]
|
| 92 |
+
AlbertModel LOAD REPORT from: albert/albert-base-v2
|
| 93 |
+
Key | Status | |
|
| 94 |
+
-----------------------------+------------+--+-
|
| 95 |
+
predictions.dense.weight | UNEXPECTED | |
|
| 96 |
+
predictions.bias | UNEXPECTED | |
|
| 97 |
+
predictions.decoder.bias | UNEXPECTED | |
|
| 98 |
+
predictions.LayerNorm.weight | UNEXPECTED | |
|
| 99 |
+
predictions.LayerNorm.bias | UNEXPECTED | |
|
| 100 |
+
predictions.dense.bias | UNEXPECTED | |
|
| 101 |
+
|
| 102 |
+
Notes:
|
| 103 |
+
- UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
|
| 104 |
+
tokenizer_config.json: 100%
|
| 105 |
+
25.0/25.0 [00:00<00:00, 9.29kB/s]
|
| 106 |
+
spiece.model: 100%
|
| 107 |
+
760k/760k [00:00<00:00, 17.3MB/s]
|
| 108 |
+
tokenizer.json:
|
| 109 |
+
1.31M/? [00:00<00:00, 18.5MB/s]
|
| 110 |
+
11,683,584 params
|
| 111 |
+
albert: 100%|██████████| 3907/3907 [13:32<00:00, 4.81it/s]
|
| 112 |
+
Saved: torch.Size([500000, 768])
|
| 113 |
+
|
| 114 |
+
Extracting: distil (distilbert/distilbert-base-uncased, max_len=512)...
|
| 115 |
+
config.json: 100%
|
| 116 |
+
483/483 [00:00<00:00, 112kB/s]
|
| 117 |
+
model.safetensors: 100%
|
| 118 |
+
268M/268M [00:01<00:00, 292MB/s]
|
| 119 |
+
Loading weights: 100%
|
| 120 |
+
100/100 [00:00<00:00, 3628.42it/s, Materializing param=transformer.layer.5.sa_layer_norm.weight]
|
| 121 |
+
DistilBertModel LOAD REPORT from: distilbert/distilbert-base-uncased
|
| 122 |
+
Key | Status | |
|
| 123 |
+
------------------------+------------+--+-
|
| 124 |
+
vocab_transform.bias | UNEXPECTED | |
|
| 125 |
+
vocab_layer_norm.weight | UNEXPECTED | |
|
| 126 |
+
vocab_transform.weight | UNEXPECTED | |
|
| 127 |
+
vocab_projector.bias | UNEXPECTED | |
|
| 128 |
+
vocab_layer_norm.bias | UNEXPECTED | |
|
| 129 |
+
|
| 130 |
+
Notes:
|
| 131 |
+
- UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
|
| 132 |
+
tokenizer_config.json: 100%
|
| 133 |
+
48.0/48.0 [00:00<00:00, 15.0kB/s]
|
| 134 |
+
vocab.txt:
|
| 135 |
+
232k/? [00:00<00:00, 12.3MB/s]
|
| 136 |
+
tokenizer.json:
|
| 137 |
+
466k/? [00:00<00:00, 59.4MB/s]
|
| 138 |
+
66,362,880 params
|
| 139 |
+
distil: 100%|██████████| 3907/3907 [05:23<00:00, 12.09it/s]
|
| 140 |
+
Saved: torch.Size([500000, 768])
|
| 141 |
+
Using 500,000 samples
|
| 142 |
+
|
| 143 |
+
=================================================================
|
| 144 |
+
PHASE 1: GENERALIZED PROCRUSTES ALIGNMENT
|
| 145 |
+
=================================================================
|
| 146 |
+
GPA iter 1: delta=1.99174462
|
| 147 |
+
GPA iter 3: delta=0.00037972
|
| 148 |
+
GPA iter 6: delta=0.00006116
|
| 149 |
+
GPA iter 9: delta=0.00002506
|
| 150 |
+
GPA iter 12: delta=0.00001342
|
| 151 |
+
GPA iter 15: delta=0.00000849
|
| 152 |
+
bert : cos_after=0.7664 cos_to_mean=0.9879
|
| 153 |
+
modern : cos_after=0.7166 cos_to_mean=0.9829
|
| 154 |
+
roberta : cos_after=0.7435 cos_to_mean=0.9884
|
| 155 |
+
albert : cos_after=0.7150 cos_to_mean=0.9863
|
| 156 |
+
distil : cos_after=0.7864 cos_to_mean=0.9909
|
| 157 |
+
cos(consensus, bert): 0.9880
|
| 158 |
+
cos(consensus, modern): 0.9831
|
| 159 |
+
cos(consensus, roberta): 0.9885
|
| 160 |
+
cos(consensus, albert): 0.9864
|
| 161 |
+
cos(consensus, distil): 0.9909
|
| 162 |
+
Equidistance range: 0.0079
|
| 163 |
+
|
| 164 |
+
Measuring consensus statistics...
|
| 165 |
+
CV: 0.2543
|
| 166 |
+
Mean cos: 0.0035
|
| 167 |
+
Eff dim: 50.2
|
| 168 |
+
|
| 169 |
+
=================================================================
|
| 170 |
+
PHASE 2: ENCODE FROZEN STUDENT
|
| 171 |
+
=================================================================
|
| 172 |
+
A new version of the following files was downloaded from https://huggingface.co/AbstractPhil/geolip-captionbert-8192:
|
| 173 |
+
- modeling_caption_bert.py
|
| 174 |
+
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
|
| 175 |
+
Loading weights: 100%
|
| 176 |
+
82/82 [00:00<00:00, 3652.45it/s, Materializing param=token_emb.weight]
|
| 177 |
+
Student: 25,958,016 params (frozen)
|
| 178 |
+
Encoding 500,000 captions...
|
| 179 |
+
Encoding: 100%|██████████| 1954/1954 [04:43<00:00, 6.88it/s]
|
| 180 |
+
Student embeddings: torch.Size([500000, 768])
|
| 181 |
+
Train: 495,000 Val: 5,000
|
| 182 |
+
|
| 183 |
+
=================================================================
|
| 184 |
+
PHASE 3: TRAIN ALIGNMENT BANK
|
| 185 |
+
=================================================================
|
| 186 |
+
Expert 0 (bert): loaded, cos_after=0.7664
|
| 187 |
+
Expert 1 (modern): loaded, cos_after=0.7166
|
| 188 |
+
Expert 2 (roberta): loaded, cos_after=0.7435
|
| 189 |
+
Expert 3 (albert): loaded, cos_after=0.7150
|
| 190 |
+
Expert 4 (distil): loaded, cos_after=0.7864
|
| 191 |
+
Anchors: 512 from consensus
|
| 192 |
+
Targets: CV=0.2543
|
| 193 |
+
Calibrated (n=5000):
|
| 194 |
+
cross_cos: 0.0449 ± 0.0316
|
| 195 |
+
disagree_ratio: median=0.000000
|
| 196 |
+
expert_cos: 1.0000 ± 0.0000
|
| 197 |
+
cross pairs: 10
|
| 198 |
+
Bank: 6,466,944 params
|
| 199 |
+
|
| 200 |
+
E 1: 46s loss=0.5256 v_loss=0.4934
|
| 201 |
+
Geometry: b_cv=0.2628 e_cv=0.0827 spread=0.01876 a_max=0.553
|
| 202 |
+
Experts: cos=0.600±0.003 agr=0.000014 ortho=0.000138
|
| 203 |
+
Disagree: x_cos=0.0385±0.0444 ratio=0.004107 preserve=0.000487
|
| 204 |
+
E 2: 45s loss=0.5010 v_loss=0.5092 exp=0.544 b_cv=0.2581 x_cos=0.0398
|
| 205 |
+
E 3: 45s loss=0.4991 v_loss=0.4963 exp=0.478 b_cv=0.2576 x_cos=0.0466
|
| 206 |
+
E 4: 45s loss=0.4979 v_loss=0.5021 exp=0.439 b_cv=0.2557 x_cos=0.0425
|
| 207 |
+
|
| 208 |
+
E 5: 45s loss=0.4973 v_loss=0.4921
|
| 209 |
+
Geometry: b_cv=0.2560 e_cv=0.0827 spread=0.01172 a_max=0.559
|
| 210 |
+
Experts: cos=0.457±0.002 agr=0.000009 ortho=0.000052
|
| 211 |
+
Disagree: x_cos=0.0407±0.0298 ratio=0.003423 preserve=0.000122
|
| 212 |
+
E 6: 45s loss=0.4970 v_loss=0.4914 exp=0.436 b_cv=0.2553 x_cos=0.0394
|
| 213 |
+
E 7: 45s loss=0.4961 v_loss=0.4875 exp=0.430 b_cv=0.2550 x_cos=0.0456
|
| 214 |
+
E 8: 45s loss=0.4960 v_loss=0.4845 exp=0.425 b_cv=0.2542 x_cos=0.0419
|
| 215 |
+
E 9: 45s loss=0.4959 v_loss=0.4917 exp=0.426 b_cv=0.2543 x_cos=0.0479
|
| 216 |
+
|
| 217 |
+
E10: 45s loss=0.4954 v_loss=0.4903
|
| 218 |
+
Geometry: b_cv=0.2542 e_cv=0.0822 spread=0.01146 a_max=0.561
|
| 219 |
+
Experts: cos=0.429±0.002 agr=0.000004 ortho=0.000030
|
| 220 |
+
Disagree: x_cos=0.0481±0.0329 ratio=0.003676 preserve=0.000075
|
| 221 |
+
E11: 45s loss=0.4952 v_loss=0.4855 exp=0.421 b_cv=0.2539 x_cos=0.0430
|
| 222 |
+
E12: 45s loss=0.4952 v_loss=0.4864 exp=0.455 b_cv=0.2536 x_cos=0.0457
|
| 223 |
+
E13: 45s loss=0.4947 v_loss=0.5101 exp=0.448 b_cv=0.2530 x_cos=0.0421
|
| 224 |
+
E14: 45s loss=0.4948 v_loss=0.5136 exp=0.433 b_cv=0.2523 x_cos=0.0437
|
| 225 |
+
|
| 226 |
+
E15: 45s loss=0.4941 v_loss=0.4953
|
| 227 |
+
Geometry: b_cv=0.2538 e_cv=0.0831 spread=0.01126 a_max=0.562
|
| 228 |
+
Experts: cos=0.450±0.002 agr=0.000001 ortho=0.000017
|
| 229 |
+
Disagree: x_cos=0.0402±0.0268 ratio=0.002627 preserve=0.000027
|
| 230 |
+
E16: 45s loss=0.4941 v_loss=0.4954 exp=0.433 b_cv=0.2535 x_cos=0.0407
|
| 231 |
+
E17: 45s loss=0.4945 v_loss=0.5051 exp=0.439 b_cv=0.2527 x_cos=0.0409
|
| 232 |
+
E18: 45s loss=0.4943 v_loss=0.4953 exp=0.427 b_cv=0.2529 x_cos=0.0450
|
| 233 |
+
E19: 45s loss=0.4944 v_loss=0.4889 exp=0.438 b_cv=0.2531 x_cos=0.0424
|
| 234 |
+
|
| 235 |
+
E20: 45s loss=0.4937 v_loss=0.4941
|
| 236 |
+
Geometry: b_cv=0.2519 e_cv=0.0827 spread=0.01126 a_max=0.562
|
| 237 |
+
Experts: cos=0.422±0.002 agr=0.000000 ortho=0.000012
|
| 238 |
+
Disagree: x_cos=0.0462±0.0330 ratio=0.002113 preserve=0.000014
|
| 239 |
+
E21: 45s loss=0.4936 v_loss=0.4904 exp=0.436 b_cv=0.2512 x_cos=0.0459
|
| 240 |
+
E22: 45s loss=0.4934 v_loss=0.4927 exp=0.438 b_cv=0.2527 x_cos=0.0448
|
| 241 |
+
E23: 45s loss=0.4936 v_loss=0.4840 exp=0.450 b_cv=0.2517 x_cos=0.0451
|
| 242 |
+
E24: 45s loss=0.4932 v_loss=0.4951 exp=0.457 b_cv=0.2508 x_cos=0.0475
|
| 243 |
+
|
| 244 |
+
E25: 45s loss=0.4929 v_loss=0.4924
|
| 245 |
+
Geometry: b_cv=0.2504 e_cv=0.0828 spread=0.01121 a_max=0.563
|
| 246 |
+
Experts: cos=0.441±0.001 agr=0.000000 ortho=0.000008
|
| 247 |
+
Disagree: x_cos=0.0431±0.0307 ratio=0.000861 preserve=0.000004
|
| 248 |
+
E26: 45s loss=0.4927 v_loss=0.4838 exp=0.444 b_cv=0.2518 x_cos=0.0446
|
| 249 |
+
E27: 45s loss=0.4927 v_loss=0.4901 exp=0.441 b_cv=0.2513 x_cos=0.0440
|
| 250 |
+
E28: 45s loss=0.4929 v_loss=0.5015 exp=0.452 b_cv=0.2512 x_cos=0.0447
|
| 251 |
+
E29: 45s loss=0.4925 v_loss=0.5042 exp=0.449 b_cv=0.2511 x_cos=0.0451
|
| 252 |
+
|
| 253 |
+
E30: 45s loss=0.4929 v_loss=0.4914
|
| 254 |
+
Geometry: b_cv=0.2518 e_cv=0.0823 spread=0.01116 a_max=0.563
|
| 255 |
+
Experts: cos=0.451±0.001 agr=0.000000 ortho=0.000007
|
| 256 |
+
Disagree: x_cos=0.0447±0.0312 ratio=0.000587 preserve=0.000001
|
| 257 |
+
|
| 258 |
+
=================================================================
|
| 259 |
+
PHASE 4: GEOMETRIC VERIFICATION
|
| 260 |
+
=================================================================
|
| 261 |
+
Passthrough: 1.000000
|
| 262 |
+
Emb CV: 0.0787 (consensus: 0.2543)
|
| 263 |
+
Geo context CV: 0.2584
|
| 264 |
+
Geo eff_dim: 18.9 / 128
|
| 265 |
+
Expert cos: 0.444 ± 0.001
|
| 266 |
+
Anchor max cos: 0.563
|
| 267 |
+
Cross-expert: 0.0446 ± 0.0314
|
| 268 |
+
Disagree ratio: 0.000668
|
| 269 |
+
|
| 270 |
+
=================================================================
|
| 271 |
+
PHASE 5: CLASSIFIER STABILITY TEST
|
| 272 |
+
=================================================================
|
| 273 |
+
|
| 274 |
+
Mode Dim Train Val Gap
|
| 275 |
+
--------------------------------------------------
|
| 276 |
+
raw_768 1536 0.532 0.328 0.204
|
| 277 |
+
raw+diff 3072 0.474 0.354 0.120
|
| 278 |
+
bank_enriched 1792 0.638 0.453 0.185
|
| 279 |
+
bank+diff 3584 0.589 0.512 0.077
|
| 280 |
+
geo_explicit 6 0.345 0.339 0.005
|
| 281 |
+
|
| 282 |
+
=================================================================
|
| 283 |
+
SUMMARY
|
| 284 |
+
=================================================================
|
| 285 |
+
Consensus CV: 0.2543
|
| 286 |
+
Consensus eff_dim: 50.2
|
| 287 |
+
Equidistance: 0.0079
|
| 288 |
+
Bank params: 6,466,944
|
| 289 |
+
Bank geo eff_dim: 18.9
|
| 290 |
+
Bank geo CV: 0.2584
|
| 291 |
+
Best val loss: 0.4838
|
| 292 |
+
|
| 293 |
+
Files: alignment_bank_best.pt, alignment_bank_final.pt
|
| 294 |
+
|
| 295 |
+
=================================================================
|
| 296 |
+
DONE
|
| 297 |
+
=================================================================
|