File size: 7,198 Bytes
7a468da
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
=================================================================
RAPID PROTOTYPE: 2-Expert Consensus + Alignment Bank
=================================================================
  Device: cuda

=================================================================
PHASE 0: EXTRACTION
=================================================================
  Captions: 20,000

  Extracting: bert...
Loading weights: 100%
 199/199 [00:00<00:00, 4115.47it/s, Materializing param=pooler.dense.weight]
BertModel LOAD REPORT from: google-bert/bert-base-uncased
Key                                        | Status     |  | 
-------------------------------------------+------------+--+-
cls.predictions.transform.LayerNorm.bias   | UNEXPECTED |  | 
cls.seq_relationship.weight                | UNEXPECTED |  | 
cls.predictions.transform.dense.weight     | UNEXPECTED |  | 
cls.predictions.transform.LayerNorm.weight | UNEXPECTED |  | 
cls.seq_relationship.bias                  | UNEXPECTED |  | 
cls.predictions.transform.dense.bias       | UNEXPECTED |  | 
cls.predictions.bias                       | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.
    bert: 100%|██████████| 157/157 [00:23<00:00,  6.55it/s]
    Shape: torch.Size([20000, 768])

  Extracting: modern...
Loading weights: 100%
 134/134 [00:00<00:00, 3938.48it/s, Materializing param=layers.21.mlp_norm.weight]
ModernBertModel LOAD REPORT from: answerdotai/ModernBERT-base
Key               | Status     |  | 
------------------+------------+--+-
head.dense.weight | UNEXPECTED |  | 
head.norm.weight  | UNEXPECTED |  | 
decoder.bias      | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.
    modern: 100%|██████████| 157/157 [00:35<00:00,  4.38it/s]
    Shape: torch.Size([20000, 768])

=================================================================
PHASE 0b: PROCRUSTES ALIGNMENT
=================================================================
  bert      : cos 1.0000 → 1.0000
  modern    : cos -0.0025 → 0.4849
  Consensus: torch.Size([20000, 768])
  cos(consensus, bert): 0.9574
  cos(consensus, modern): 0.9584
  Consensus CV: 0.1316

=================================================================
PHASE 1: TRAIN STUDENT (2 experts, 20K captions)
=================================================================
  Student: 11,269,632 params
  E1: 2s  loss=3.1731  t_acc=0.322  t_cos=0.287  v_acc=0.448  v_cos=0.437  v_cv=0.227
  E2: 2s  loss=1.6553  t_acc=0.713  t_cos=0.470  v_acc=0.649  v_cos=0.518  v_cv=0.197
  E3: 2s  loss=1.1581  t_acc=0.858  t_cos=0.531  v_acc=0.814  v_cos=0.566  v_cv=0.169
  E4: 2s  loss=0.8765  t_acc=0.922  t_cos=0.567  v_acc=0.867  v_cos=0.598  v_cv=0.167
  E5: 2s  loss=0.7007  t_acc=0.954  t_cos=0.593  v_acc=0.892  v_cos=0.612  v_cv=0.169

  Student saved. v_cos=0.612, v_cv=0.169

=================================================================
PHASE 2: TRAIN ALIGNMENT BANK (student frozen)
=================================================================
  Pre-encoding through frozen student...
  Student embeddings: torch.Size([18000, 768])
    Expert 0 (bert): rotation loaded, cos_after=1.0000
    Expert 1 (modern): rotation loaded, cos_after=0.4849
    Anchors: 128 initialized from consensus embeddings
  Bank: 1,305,152 params
  E 1: 1s  loss=0.3517  v_loss=0.3123  expert_agr=0.00029  ortho=0.00018  spread=0.02575  cv=0.2469  anchor_max=0.560  expert_cos=0.657±0.061
  E 2: 1s  loss=0.2758  v_loss=0.2718  expert_agr=0.00017  ortho=0.00027  spread=0.02053  cv=0.2049  anchor_max=0.587  expert_cos=0.634±0.064
  E 3: 1s  loss=0.2505  v_loss=0.2533  expert_agr=0.00016  ortho=0.00023  spread=0.01885  cv=0.1988  anchor_max=0.590  expert_cos=0.619±0.067
  E 4: 1s  loss=0.2391  v_loss=0.2338  expert_agr=0.00017  ortho=0.00022  spread=0.01824  cv=0.1965  anchor_max=0.593  expert_cos=0.627±0.069
  E 5: 1s  loss=0.2327  v_loss=0.2376  expert_agr=0.00018  ortho=0.00023  spread=0.01752  cv=0.1933  anchor_max=0.595  expert_cos=0.613±0.070
  E 6: 1s  loss=0.2274  v_loss=0.2334  expert_agr=0.00017  ortho=0.00022  spread=0.01680  cv=0.1882  anchor_max=0.597  expert_cos=0.610±0.070
  E 7: 1s  loss=0.2249  v_loss=0.2297  expert_agr=0.00018  ortho=0.00021  spread=0.01603  cv=0.1880  anchor_max=0.597  expert_cos=0.614±0.074
  E 8: 1s  loss=0.2221  v_loss=0.2236  expert_agr=0.00017  ortho=0.00021  spread=0.01550  cv=0.1857  anchor_max=0.598  expert_cos=0.601±0.074
  E 9: 1s  loss=0.2206  v_loss=0.2351  expert_agr=0.00019  ortho=0.00021  spread=0.01500  cv=0.1850  anchor_max=0.599  expert_cos=0.588±0.077
  E10: 1s  loss=0.2200  v_loss=0.2142  expert_agr=0.00024  ortho=0.00022  spread=0.01470  cv=0.1831  anchor_max=0.599  expert_cos=0.605±0.075
  E11: 1s  loss=0.2181  v_loss=0.2254  expert_agr=0.00028  ortho=0.00021  spread=0.01445  cv=0.1782  anchor_max=0.599  expert_cos=0.629±0.072
  E12: 1s  loss=0.2179  v_loss=0.2212  expert_agr=0.00021  ortho=0.00023  spread=0.01419  cv=0.1812  anchor_max=0.599  expert_cos=0.634±0.070
  E13: 1s  loss=0.2170  v_loss=0.2259  expert_agr=0.00022  ortho=0.00023  spread=0.01387  cv=0.1792  anchor_max=0.599  expert_cos=0.614±0.072
  E14: 1s  loss=0.2152  v_loss=0.2106  expert_agr=0.00016  ortho=0.00020  spread=0.01400  cv=0.1750  anchor_max=0.599  expert_cos=0.647±0.069
  E15: 1s  loss=0.2151  v_loss=0.2301  expert_agr=0.00021  ortho=0.00021  spread=0.01368  cv=0.1766  anchor_max=0.599  expert_cos=0.639±0.073
  E16: 1s  loss=0.2149  v_loss=0.2134  expert_agr=0.00022  ortho=0.00023  spread=0.01338  cv=0.1754  anchor_max=0.599  expert_cos=0.631±0.070
  E17: 1s  loss=0.2151  v_loss=0.2110  expert_agr=0.00019  ortho=0.00022  spread=0.01341  cv=0.1778  anchor_max=0.599  expert_cos=0.642±0.073
  E18: 1s  loss=0.2146  v_loss=0.2114  expert_agr=0.00023  ortho=0.00022  spread=0.01306  cv=0.1734  anchor_max=0.599  expert_cos=0.630±0.069
  E19: 1s  loss=0.2147  v_loss=0.2127  expert_agr=0.00020  ortho=0.00023  spread=0.01300  cv=0.1768  anchor_max=0.599  expert_cos=0.610±0.072
  E20: 1s  loss=0.2151  v_loss=0.2211  expert_agr=0.00019  ortho=0.00020  spread=0.01300  cv=0.1779  anchor_max=0.599  expert_cos=0.626±0.072

=================================================================
PHASE 3: GEOMETRIC VERIFICATION
=================================================================
  Passthrough integrity: 1.000000 (should be ~1.000)
  Geo context CV: 0.1691
  Geo context eff_dim: 21.9
  Geo context shape: torch.Size([2000, 64])

=================================================================
PHASE 4: CLASSIFIER STABILITY TEST
=================================================================
  with_bank      : train_acc=0.499  val_acc=0.390  gap=0.109
  without_bank   : train_acc=0.442  val_acc=0.372  gap=0.070

=================================================================
DONE
=================================================================

  Student: mini_student.pt
  Bank: alignment_bank.pt
  Consensus CV: 0.1316
  Student v_cos: 0.612