Create rapid_prototype_output_bigger_bank.txt
Browse files
rapid_prototype_output_bigger_bank.txt
ADDED
|
@@ -0,0 +1,118 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
=================================================================
|
| 2 |
+
RAPID PROTOTYPE: 2-Expert Consensus + Alignment Bank
|
| 3 |
+
=================================================================
|
| 4 |
+
Device: cuda
|
| 5 |
+
|
| 6 |
+
=================================================================
|
| 7 |
+
PHASE 0: EXTRACTION
|
| 8 |
+
=================================================================
|
| 9 |
+
Captions: 20,000
|
| 10 |
+
|
| 11 |
+
Extracting: bert...
|
| 12 |
+
Loading weights: 100%
|
| 13 |
+
199/199 [00:00<00:00, 4157.84it/s, Materializing param=pooler.dense.weight]
|
| 14 |
+
BertModel LOAD REPORT from: google-bert/bert-base-uncased
|
| 15 |
+
Key | Status | |
|
| 16 |
+
-------------------------------------------+------------+--+-
|
| 17 |
+
cls.predictions.transform.LayerNorm.bias | UNEXPECTED | |
|
| 18 |
+
cls.seq_relationship.weight | UNEXPECTED | |
|
| 19 |
+
cls.predictions.transform.dense.weight | UNEXPECTED | |
|
| 20 |
+
cls.predictions.transform.LayerNorm.weight | UNEXPECTED | |
|
| 21 |
+
cls.seq_relationship.bias | UNEXPECTED | |
|
| 22 |
+
cls.predictions.transform.dense.bias | UNEXPECTED | |
|
| 23 |
+
cls.predictions.bias | UNEXPECTED | |
|
| 24 |
+
|
| 25 |
+
Notes:
|
| 26 |
+
- UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
|
| 27 |
+
bert: 100%|██████████| 157/157 [00:24<00:00, 6.54it/s]
|
| 28 |
+
Shape: torch.Size([20000, 768])
|
| 29 |
+
|
| 30 |
+
Extracting: modern...
|
| 31 |
+
Loading weights: 100%
|
| 32 |
+
134/134 [00:00<00:00, 3957.73it/s, Materializing param=layers.21.mlp_norm.weight]
|
| 33 |
+
ModernBertModel LOAD REPORT from: answerdotai/ModernBERT-base
|
| 34 |
+
Key | Status | |
|
| 35 |
+
------------------+------------+--+-
|
| 36 |
+
head.dense.weight | UNEXPECTED | |
|
| 37 |
+
head.norm.weight | UNEXPECTED | |
|
| 38 |
+
decoder.bias | UNEXPECTED | |
|
| 39 |
+
|
| 40 |
+
Notes:
|
| 41 |
+
- UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
|
| 42 |
+
modern: 100%|██████████| 157/157 [00:35<00:00, 4.38it/s]
|
| 43 |
+
Shape: torch.Size([20000, 768])
|
| 44 |
+
|
| 45 |
+
=================================================================
|
| 46 |
+
PHASE 0b: PROCRUSTES ALIGNMENT
|
| 47 |
+
=================================================================
|
| 48 |
+
bert : cos 1.0000 → 1.0000
|
| 49 |
+
modern : cos -0.0025 → 0.4849
|
| 50 |
+
Consensus: torch.Size([20000, 768])
|
| 51 |
+
cos(consensus, bert): 0.9574
|
| 52 |
+
cos(consensus, modern): 0.9584
|
| 53 |
+
Consensus CV: 0.1316
|
| 54 |
+
|
| 55 |
+
=================================================================
|
| 56 |
+
PHASE 1: TRAIN STUDENT (2 experts, 20K captions)
|
| 57 |
+
=================================================================
|
| 58 |
+
Student: 11,269,632 params
|
| 59 |
+
E1: 2s loss=3.1731 t_acc=0.322 t_cos=0.287 v_acc=0.448 v_cos=0.437 v_cv=0.227
|
| 60 |
+
E2: 2s loss=1.6553 t_acc=0.713 t_cos=0.470 v_acc=0.649 v_cos=0.518 v_cv=0.197
|
| 61 |
+
E3: 2s loss=1.1581 t_acc=0.858 t_cos=0.531 v_acc=0.814 v_cos=0.566 v_cv=0.169
|
| 62 |
+
E4: 2s loss=0.8765 t_acc=0.922 t_cos=0.567 v_acc=0.867 v_cos=0.598 v_cv=0.167
|
| 63 |
+
E5: 2s loss=0.7007 t_acc=0.954 t_cos=0.593 v_acc=0.892 v_cos=0.612 v_cv=0.169
|
| 64 |
+
|
| 65 |
+
Student saved. v_cos=0.612, v_cv=0.169
|
| 66 |
+
|
| 67 |
+
=================================================================
|
| 68 |
+
PHASE 2: TRAIN ALIGNMENT BANK (student frozen)
|
| 69 |
+
=================================================================
|
| 70 |
+
Pre-encoding through frozen student...
|
| 71 |
+
Student embeddings: torch.Size([18000, 768])
|
| 72 |
+
Expert 0 (bert): rotation loaded, cos_after=1.0000
|
| 73 |
+
Expert 1 (modern): rotation loaded, cos_after=0.4849
|
| 74 |
+
Anchors: 128 initialized from consensus embeddings
|
| 75 |
+
Bank: 1,305,152 params
|
| 76 |
+
E 1: 1s loss=0.3517 v_loss=0.3123 expert_agr=0.00029 ortho=0.00018 spread=0.02575 cv=0.2469 anchor_max=0.560 expert_cos=0.657±0.061
|
| 77 |
+
E 2: 1s loss=0.2758 v_loss=0.2718 expert_agr=0.00017 ortho=0.00027 spread=0.02053 cv=0.2049 anchor_max=0.587 expert_cos=0.634±0.064
|
| 78 |
+
E 3: 1s loss=0.2505 v_loss=0.2533 expert_agr=0.00016 ortho=0.00023 spread=0.01885 cv=0.1988 anchor_max=0.590 expert_cos=0.619±0.067
|
| 79 |
+
E 4: 1s loss=0.2391 v_loss=0.2338 expert_agr=0.00017 ortho=0.00022 spread=0.01824 cv=0.1965 anchor_max=0.593 expert_cos=0.627±0.069
|
| 80 |
+
E 5: 1s loss=0.2327 v_loss=0.2376 expert_agr=0.00018 ortho=0.00023 spread=0.01752 cv=0.1933 anchor_max=0.595 expert_cos=0.613±0.070
|
| 81 |
+
E 6: 1s loss=0.2274 v_loss=0.2334 expert_agr=0.00017 ortho=0.00022 spread=0.01680 cv=0.1882 anchor_max=0.597 expert_cos=0.610±0.070
|
| 82 |
+
E 7: 1s loss=0.2249 v_loss=0.2297 expert_agr=0.00018 ortho=0.00021 spread=0.01603 cv=0.1880 anchor_max=0.597 expert_cos=0.614±0.074
|
| 83 |
+
E 8: 1s loss=0.2221 v_loss=0.2236 expert_agr=0.00017 ortho=0.00021 spread=0.01550 cv=0.1857 anchor_max=0.598 expert_cos=0.601±0.074
|
| 84 |
+
E 9: 1s loss=0.2206 v_loss=0.2351 expert_agr=0.00019 ortho=0.00021 spread=0.01500 cv=0.1850 anchor_max=0.599 expert_cos=0.588±0.077
|
| 85 |
+
E10: 1s loss=0.2200 v_loss=0.2142 expert_agr=0.00024 ortho=0.00022 spread=0.01470 cv=0.1831 anchor_max=0.599 expert_cos=0.605±0.075
|
| 86 |
+
E11: 1s loss=0.2181 v_loss=0.2254 expert_agr=0.00028 ortho=0.00021 spread=0.01445 cv=0.1782 anchor_max=0.599 expert_cos=0.629±0.072
|
| 87 |
+
E12: 1s loss=0.2179 v_loss=0.2212 expert_agr=0.00021 ortho=0.00023 spread=0.01419 cv=0.1812 anchor_max=0.599 expert_cos=0.634±0.070
|
| 88 |
+
E13: 1s loss=0.2170 v_loss=0.2259 expert_agr=0.00022 ortho=0.00023 spread=0.01387 cv=0.1792 anchor_max=0.599 expert_cos=0.614±0.072
|
| 89 |
+
E14: 1s loss=0.2152 v_loss=0.2106 expert_agr=0.00016 ortho=0.00020 spread=0.01400 cv=0.1750 anchor_max=0.599 expert_cos=0.647±0.069
|
| 90 |
+
E15: 1s loss=0.2151 v_loss=0.2301 expert_agr=0.00021 ortho=0.00021 spread=0.01368 cv=0.1766 anchor_max=0.599 expert_cos=0.639±0.073
|
| 91 |
+
E16: 1s loss=0.2149 v_loss=0.2134 expert_agr=0.00022 ortho=0.00023 spread=0.01338 cv=0.1754 anchor_max=0.599 expert_cos=0.631±0.070
|
| 92 |
+
E17: 1s loss=0.2151 v_loss=0.2110 expert_agr=0.00019 ortho=0.00022 spread=0.01341 cv=0.1778 anchor_max=0.599 expert_cos=0.642±0.073
|
| 93 |
+
E18: 1s loss=0.2146 v_loss=0.2114 expert_agr=0.00023 ortho=0.00022 spread=0.01306 cv=0.1734 anchor_max=0.599 expert_cos=0.630±0.069
|
| 94 |
+
E19: 1s loss=0.2147 v_loss=0.2127 expert_agr=0.00020 ortho=0.00023 spread=0.01300 cv=0.1768 anchor_max=0.599 expert_cos=0.610±0.072
|
| 95 |
+
E20: 1s loss=0.2151 v_loss=0.2211 expert_agr=0.00019 ortho=0.00020 spread=0.01300 cv=0.1779 anchor_max=0.599 expert_cos=0.626±0.072
|
| 96 |
+
|
| 97 |
+
=================================================================
|
| 98 |
+
PHASE 3: GEOMETRIC VERIFICATION
|
| 99 |
+
=================================================================
|
| 100 |
+
Passthrough integrity: 1.000000 (should be ~1.000)
|
| 101 |
+
Geo context CV: 0.1691
|
| 102 |
+
Geo context eff_dim: 21.9
|
| 103 |
+
Geo context shape: torch.Size([2000, 64])
|
| 104 |
+
|
| 105 |
+
=================================================================
|
| 106 |
+
PHASE 4: CLASSIFIER STABILITY TEST
|
| 107 |
+
=================================================================
|
| 108 |
+
with_bank : train_acc=0.499 val_acc=0.390 gap=0.109
|
| 109 |
+
without_bank : train_acc=0.442 val_acc=0.372 gap=0.070
|
| 110 |
+
|
| 111 |
+
=================================================================
|
| 112 |
+
DONE
|
| 113 |
+
=================================================================
|
| 114 |
+
|
| 115 |
+
Student: mini_student.pt
|
| 116 |
+
Bank: alignment_bank.pt
|
| 117 |
+
Consensus CV: 0.1316
|
| 118 |
+
Student v_cos: 0.612
|