File size: 10,715 Bytes
f34d430
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
=================================================================
RAPID PROTOTYPE v2: Differentiation-Centered Bank
=================================================================
  Device: cuda

=================================================================
PHASE 0: EXTRACTION
=================================================================
  Captions: 20,000

  Extracting: bert...
Loading weights: 100%
 199/199 [00:00<00:00, 4216.36it/s, Materializing param=pooler.dense.weight]
BertModel LOAD REPORT from: google-bert/bert-base-uncased
Key                                        | Status     |  | 
-------------------------------------------+------------+--+-
cls.predictions.bias                       | UNEXPECTED |  | 
cls.predictions.transform.dense.bias       | UNEXPECTED |  | 
cls.predictions.transform.LayerNorm.weight | UNEXPECTED |  | 
cls.predictions.transform.LayerNorm.bias   | UNEXPECTED |  | 
cls.seq_relationship.weight                | UNEXPECTED |  | 
cls.predictions.transform.dense.weight     | UNEXPECTED |  | 
cls.seq_relationship.bias                  | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.
    bert: 100%|██████████| 157/157 [00:23<00:00,  6.56it/s]
    Shape: torch.Size([20000, 768])

  Extracting: modern...
Loading weights: 100%
 134/134 [00:00<00:00, 4047.07it/s, Materializing param=layers.21.mlp_norm.weight]
ModernBertModel LOAD REPORT from: answerdotai/ModernBERT-base
Key               | Status     |  | 
------------------+------------+--+-
head.dense.weight | UNEXPECTED |  | 
head.norm.weight  | UNEXPECTED |  | 
decoder.bias      | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.
    modern: 100%|██████████| 157/157 [00:35<00:00,  4.39it/s]
    Shape: torch.Size([20000, 768])

=================================================================
PHASE 0b: GENERALIZED PROCRUSTES ALIGNMENT (no reference bias)
=================================================================
  GPA iter 1: delta=1.19668072
  GPA iter 3: delta=0.00029225
  GPA iter 6: delta=0.00006347
  GPA iter 9: delta=0.00002718
  bert      : cos_after=0.8541  cos_to_mean=0.9865
  modern    : cos_after=0.8577  cos_to_mean=0.9867
  cos(consensus, bert): 0.9867
  cos(consensus, modern): 0.9868
  Equidistance range: 0.0001 (should be near 0)

  Measuring consensus statistics...
    CV:       0.1771
    Mean cos: 0.0018
    Eff dim:  109.5
    Spectral: [0.0343, 0.0322, 0.0275, 0.0240, 0.0222...]

=================================================================
PHASE 1: TRAIN STUDENT
=================================================================
  Student: 11,269,632 params
  CV target: 0.1771
  E1: 2s  loss=2.9588  t_acc=0.362  t_cos=0.334  v_acc=0.494  v_cos=0.503  v_cv=0.223
  E2: 2s  loss=1.4268  t_acc=0.761  t_cos=0.543  v_acc=0.704  v_cos=0.588  v_cv=0.212
  E3: 2s  loss=0.9784  t_acc=0.887  t_cos=0.604  v_acc=0.822  v_cos=0.639  v_cv=0.182
  E4: 2s  loss=0.7289  t_acc=0.943  t_cos=0.641  v_acc=0.912  v_cos=0.676  v_cv=0.182
  E5: 2s  loss=0.5807  t_acc=0.968  t_cos=0.666  v_acc=0.920  v_cos=0.686  v_cv=0.182

  Student saved. v_cos=0.686, v_cv=0.182

=================================================================
PHASE 2: TRAIN ALIGNMENT BANK (student frozen)
=================================================================
  Pre-encoding through frozen student...
  Student embeddings: torch.Size([18000, 768])
    Expert 0 (bert): rotation + whitener + mean loaded, cos_after=0.8541
    Expert 1 (modern): rotation + whitener + mean loaded, cos_after=0.8577
    Anchors: 512 initialized from consensus embeddings
    Targets: CV=0.1771, mean_cos=0.0018
  Bank: 2,921,088 params
  Bank targets: CV=0.1771, mean_cos=0.0018
    Calibrated disagreement (n=2000):
      cross_cos: 0.0794 ± 0.0035
      disagree_ratio: median=0.000000  mean=0.000000  std=0.000000
      expert_cos: 1.0000 ± 0.0000

  E 1: 1s  loss=0.4789  v_loss=0.4172
    Geometry:  b_cv=0.2688  e_cv=0.1603  spread=0.03940  a_max=0.652
    Experts:   cos=0.794±0.006  agr=0.000092  ortho=0.000388
    Disagree:  x_cos=0.0740±0.0009  ratio=0.004326  preserve=0.013135  norms=0.1626

  E 2: 1s  loss=0.4002  v_loss=0.3818
    Geometry:  b_cv=0.2229  e_cv=0.1588  spread=0.02779  a_max=0.668
    Experts:   cos=0.807±0.006  agr=0.000007  ortho=0.000288
    Disagree:  x_cos=0.0805±0.0014  ratio=0.003575  preserve=0.000024  norms=0.1703

  E 3: 1s  loss=0.3743  v_loss=0.3625
    Geometry:  b_cv=0.2189  e_cv=0.1606  spread=0.02500  a_max=0.670
    Experts:   cos=0.835±0.005  agr=0.000005  ortho=0.000152
    Disagree:  x_cos=0.0774±0.0018  ratio=0.002279  preserve=0.000016  norms=0.1066

  E 4: 1s  loss=0.3591  v_loss=0.3615
    Geometry:  b_cv=0.2100  e_cv=0.1643  spread=0.02302  a_max=0.670
    Experts:   cos=0.822±0.005  agr=0.000003  ortho=0.000094
    Disagree:  x_cos=0.0781±0.0021  ratio=0.001569  preserve=0.000020  norms=0.1137

  E 5: 1s  loss=0.3537  v_loss=0.3665
    Geometry:  b_cv=0.2118  e_cv=0.1664  spread=0.02133  a_max=0.670
    Experts:   cos=0.815±0.006  agr=0.000002  ortho=0.000066
    Disagree:  x_cos=0.0765±0.0021  ratio=0.001389  preserve=0.000026  norms=0.1669

  E 6: 1s  loss=0.3506  v_loss=0.3527
    Geometry:  b_cv=0.2097  e_cv=0.1600  spread=0.02009  a_max=0.670
    Experts:   cos=0.829±0.005  agr=0.000003  ortho=0.000048
    Disagree:  x_cos=0.0846±0.0024  ratio=0.001772  preserve=0.000021  norms=0.1363

  E 7: 1s  loss=0.3459  v_loss=0.3502
    Geometry:  b_cv=0.2055  e_cv=0.1628  spread=0.01906  a_max=0.670
    Experts:   cos=0.759±0.007  agr=0.000004  ortho=0.000040
    Disagree:  x_cos=0.0774±0.0022  ratio=0.003070  preserve=0.000049  norms=0.1964

  E 8: 1s  loss=0.3442  v_loss=0.3479
    Geometry:  b_cv=0.2078  e_cv=0.1643  spread=0.01817  a_max=0.669
    Experts:   cos=0.745±0.007  agr=0.000003  ortho=0.000033
    Disagree:  x_cos=0.0782±0.0023  ratio=0.001258  preserve=0.000021  norms=0.1772

  E 9: 1s  loss=0.3419  v_loss=0.3451
    Geometry:  b_cv=0.2015  e_cv=0.1646  spread=0.01756  a_max=0.670
    Experts:   cos=0.767±0.006  agr=0.000007  ortho=0.000030
    Disagree:  x_cos=0.0823±0.0024  ratio=0.001625  preserve=0.000049  norms=0.2007

  E10: 1s  loss=0.3433  v_loss=0.3433
    Geometry:  b_cv=0.2074  e_cv=0.1594  spread=0.01746  a_max=0.669
    Experts:   cos=0.762±0.005  agr=0.000006  ortho=0.000026
    Disagree:  x_cos=0.0766±0.0018  ratio=0.001418  preserve=0.000073  norms=0.0529

  E11: 1s  loss=0.3392  v_loss=0.3501
    Geometry:  b_cv=0.2021  e_cv=0.1609  spread=0.01705  a_max=0.669
    Experts:   cos=0.721±0.007  agr=0.000004  ortho=0.000026
    Disagree:  x_cos=0.0698±0.0022  ratio=0.006405  preserve=0.000037  norms=0.1509

  E12: 1s  loss=0.3383  v_loss=0.3534
    Geometry:  b_cv=0.1983  e_cv=0.1639  spread=0.01693  a_max=0.668
    Experts:   cos=0.753±0.005  agr=0.000014  ortho=0.000026
    Disagree:  x_cos=0.0743±0.0021  ratio=0.000903  preserve=0.000076  norms=0.0763

  E13: 1s  loss=0.3374  v_loss=0.3398
    Geometry:  b_cv=0.1996  e_cv=0.1603  spread=0.01660  a_max=0.669
    Experts:   cos=0.714±0.006  agr=0.000004  ortho=0.000022
    Disagree:  x_cos=0.0791±0.0021  ratio=0.006335  preserve=0.000060  norms=0.1257

  E14: 1s  loss=0.3376  v_loss=0.3415
    Geometry:  b_cv=0.1992  e_cv=0.1657  spread=0.01647  a_max=0.669
    Experts:   cos=0.704±0.006  agr=0.000006  ortho=0.000022
    Disagree:  x_cos=0.0824±0.0021  ratio=0.006577  preserve=0.000061  norms=0.0873

  E15: 1s  loss=0.3372  v_loss=0.3409
    Geometry:  b_cv=0.2003  e_cv=0.1615  spread=0.01635  a_max=0.669
    Experts:   cos=0.745±0.005  agr=0.000003  ortho=0.000019
    Disagree:  x_cos=0.0760±0.0020  ratio=0.002660  preserve=0.000045  norms=0.0958

  E16: 1s  loss=0.3355  v_loss=0.3328
    Geometry:  b_cv=0.1990  e_cv=0.1601  spread=0.01600  a_max=0.669
    Experts:   cos=0.689±0.005  agr=0.000004  ortho=0.000018
    Disagree:  x_cos=0.0814±0.0024  ratio=0.002029  preserve=0.000042  norms=0.1414

  E17: 1s  loss=0.3350  v_loss=0.3432
    Geometry:  b_cv=0.1945  e_cv=0.1604  spread=0.01603  a_max=0.668
    Experts:   cos=0.751±0.003  agr=0.000028  ortho=0.000020
    Disagree:  x_cos=0.0825±0.0023  ratio=0.001129  preserve=0.000155  norms=0.0187

  E18: 1s  loss=0.3372  v_loss=0.3336
    Geometry:  b_cv=0.2044  e_cv=0.1605  spread=0.01590  a_max=0.668
    Experts:   cos=0.720±0.003  agr=0.000004  ortho=0.000022
    Disagree:  x_cos=0.0799±0.0020  ratio=0.002103  preserve=0.000055  norms=0.0331

  E19: 1s  loss=0.3326  v_loss=0.3456
    Geometry:  b_cv=0.1948  e_cv=0.1654  spread=0.01562  a_max=0.668
    Experts:   cos=0.741±0.003  agr=0.000004  ortho=0.000021
    Disagree:  x_cos=0.0797±0.0019  ratio=0.003153  preserve=0.000054  norms=0.0169

  E20: 1s  loss=0.3351  v_loss=0.3460
    Geometry:  b_cv=0.1992  e_cv=0.1596  spread=0.01567  a_max=0.668
    Experts:   cos=0.725±0.005  agr=0.000002  ortho=0.000018
    Disagree:  x_cos=0.0776±0.0023  ratio=0.008188  preserve=0.000053  norms=0.0326

=================================================================
PHASE 3: GEOMETRIC VERIFICATION
=================================================================
  Passthrough:     1.000000 (target: 1.000)
  Emb CV:          0.1635 (consensus: 0.1771)
  Geo context CV:  0.1892
  Geo eff_dim:     30.7 / 128
  Expert cos:      0.725 ± 0.005
  Anchor max cos:  0.668
  Disagreement:
    Cross-expert:  0.0776 ± 0.0023
    Ratio:         0.008188 (target: 0.000000)
    Norm spread:   0.0326

=================================================================
PHASE 4: CLASSIFIER STABILITY TEST
=================================================================

  Mode                    Dim   Train     Val     Gap
  --------------------------------------------------
  raw_768                1536   0.498   0.357   0.141
  raw+diff               3072   0.567   0.475   0.092
  bank_enriched          1792   0.766   0.532   0.235
  bank+diff              3584   0.722   0.670   0.052
  geo_explicit              6   0.326   0.363  -0.037

=================================================================
SUMMARY
=================================================================
  Consensus CV:     0.1771
  Consensus eff_dim:109.5
  Student v_cos:    0.686
  Student v_cv:     0.182
  Bank params:      2,921,088
  Bank geo_eff_dim: 30.7
  Bank geo_cv:      0.1892

=================================================================
DONE
=================================================================