File size: 10,363 Bytes
b8f685d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
=================================================================
RAPID PROTOTYPE v2: Differentiation-Centered Bank
=================================================================
  Device: cuda

=================================================================
PHASE 0: EXTRACTION
=================================================================
  Captions: 20,000

  Extracting: bert...
Loading weights: 100%
 199/199 [00:00<00:00, 4038.86it/s, Materializing param=pooler.dense.weight]
BertModel LOAD REPORT from: google-bert/bert-base-uncased
Key                                        | Status     |  | 
-------------------------------------------+------------+--+-
cls.predictions.bias                       | UNEXPECTED |  | 
cls.predictions.transform.dense.bias       | UNEXPECTED |  | 
cls.predictions.transform.LayerNorm.weight | UNEXPECTED |  | 
cls.predictions.transform.LayerNorm.bias   | UNEXPECTED |  | 
cls.seq_relationship.weight                | UNEXPECTED |  | 
cls.predictions.transform.dense.weight     | UNEXPECTED |  | 
cls.seq_relationship.bias                  | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.
    bert: 100%|██████████| 157/157 [00:23<00:00,  6.55it/s]
    Shape: torch.Size([20000, 768])

  Extracting: modern...
Loading weights: 100%
 134/134 [00:00<00:00, 4016.84it/s, Materializing param=layers.21.mlp_norm.weight]
ModernBertModel LOAD REPORT from: answerdotai/ModernBERT-base
Key               | Status     |  | 
------------------+------------+--+-
head.dense.weight | UNEXPECTED |  | 
head.norm.weight  | UNEXPECTED |  | 
decoder.bias      | UNEXPECTED |  | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.
    modern: 100%|██████████| 157/157 [00:35<00:00,  4.39it/s]
    Shape: torch.Size([20000, 768])

=================================================================
PHASE 0b: GENERALIZED PROCRUSTES ALIGNMENT (no reference bias)
=================================================================
  GPA iter 1: delta=1.19668072
  GPA iter 3: delta=0.00029225
  GPA iter 6: delta=0.00006347
  GPA iter 9: delta=0.00002718
  bert      : cos_after=0.8541  cos_to_mean=0.9865
  modern    : cos_after=0.8577  cos_to_mean=0.9867
  cos(consensus, bert): 0.9867
  cos(consensus, modern): 0.9868
  Equidistance range: 0.0001 (should be near 0)

  Measuring consensus statistics...
    CV:       0.1771
    Mean cos: 0.0018
    Eff dim:  109.5
    Spectral: [0.0343, 0.0322, 0.0275, 0.0240, 0.0222...]

=================================================================
PHASE 1: TRAIN STUDENT
=================================================================
  Student: 11,269,632 params
  CV target: 0.1771
  E1: 2s  loss=2.9588  t_acc=0.362  t_cos=0.334  v_acc=0.494  v_cos=0.503  v_cv=0.223
  E2: 2s  loss=1.4268  t_acc=0.761  t_cos=0.543  v_acc=0.704  v_cos=0.588  v_cv=0.212
  E3: 2s  loss=0.9784  t_acc=0.887  t_cos=0.604  v_acc=0.822  v_cos=0.639  v_cv=0.182
  E4: 2s  loss=0.7289  t_acc=0.943  t_cos=0.641  v_acc=0.912  v_cos=0.676  v_cv=0.182
  E5: 2s  loss=0.5807  t_acc=0.968  t_cos=0.666  v_acc=0.920  v_cos=0.686  v_cv=0.182

  Student saved. v_cos=0.686, v_cv=0.182

=================================================================
PHASE 2: TRAIN ALIGNMENT BANK (student frozen)
=================================================================
  Pre-encoding through frozen student...
  Student embeddings: torch.Size([18000, 768])
    Expert 0 (bert): rotation + whitener + mean loaded, cos_after=0.8541
    Expert 1 (modern): rotation + whitener + mean loaded, cos_after=0.8577
    Anchors: 512 initialized from consensus embeddings
    Targets: CV=0.1771, mean_cos=0.0018
  Bank: 2,921,088 params
  Bank targets: CV=0.1771, mean_cos=0.0018
    Calibrated disagreement:
      cross_cos: 0.0794 ± 0.0035
      disagree_ratio: 0.000000

  E 1: 1s  loss=0.4775  v_loss=0.4350
    Geometry:  b_cv=0.2673  e_cv=0.1680  spread=0.03938  a_max=0.653
    Experts:   cos=0.791±0.006  agr=0.000142  ortho=0.000440
    Disagree:  x_cos=0.0833±0.0019  ratio=0.004748  preserve=0.014587  norms=0.1814

  E 2: 1s  loss=0.3981  v_loss=0.3833
    Geometry:  b_cv=0.2230  e_cv=0.1651  spread=0.02783  a_max=0.669
    Experts:   cos=0.809±0.006  agr=0.000008  ortho=0.000344
    Disagree:  x_cos=0.0817±0.0018  ratio=0.003509  preserve=0.000024  norms=0.1719

  E 3: 1s  loss=0.3730  v_loss=0.3757
    Geometry:  b_cv=0.2162  e_cv=0.1648  spread=0.02493  a_max=0.670
    Experts:   cos=0.830±0.005  agr=0.000004  ortho=0.000186
    Disagree:  x_cos=0.0799±0.0019  ratio=0.002291  preserve=0.000013  norms=0.1513

  E 4: 1s  loss=0.3623  v_loss=0.3708
    Geometry:  b_cv=0.2187  e_cv=0.1615  spread=0.02314  a_max=0.670
    Experts:   cos=0.832±0.005  agr=0.000003  ortho=0.000115
    Disagree:  x_cos=0.0793±0.0020  ratio=0.003285  preserve=0.000011  norms=0.1422

  E 5: 1s  loss=0.3554  v_loss=0.3539
    Geometry:  b_cv=0.2141  e_cv=0.1621  spread=0.02139  a_max=0.669
    Experts:   cos=0.853±0.004  agr=0.000002  ortho=0.000079
    Disagree:  x_cos=0.0781±0.0021  ratio=0.001270  preserve=0.000011  norms=0.0980

  E 6: 1s  loss=0.3507  v_loss=0.3571
    Geometry:  b_cv=0.2124  e_cv=0.1633  spread=0.02019  a_max=0.669
    Experts:   cos=0.829±0.005  agr=0.000001  ortho=0.000058
    Disagree:  x_cos=0.0788±0.0022  ratio=0.001736  preserve=0.000010  norms=0.1789

  E 7: 1s  loss=0.3460  v_loss=0.3465
    Geometry:  b_cv=0.2059  e_cv=0.1607  spread=0.01903  a_max=0.669
    Experts:   cos=0.845±0.005  agr=0.000001  ortho=0.000045
    Disagree:  x_cos=0.0819±0.0023  ratio=0.001425  preserve=0.000008  norms=0.1536

  E 8: 1s  loss=0.3449  v_loss=0.3421
    Geometry:  b_cv=0.2060  e_cv=0.1592  spread=0.01841  a_max=0.670
    Experts:   cos=0.833±0.005  agr=0.000003  ortho=0.000035
    Disagree:  x_cos=0.0885±0.0021  ratio=0.001539  preserve=0.000017  norms=0.1313

  E 9: 1s  loss=0.3422  v_loss=0.3451
    Geometry:  b_cv=0.2040  e_cv=0.1626  spread=0.01793  a_max=0.669
    Experts:   cos=0.822±0.005  agr=0.000003  ortho=0.000031
    Disagree:  x_cos=0.0761±0.0024  ratio=0.001610  preserve=0.000037  norms=0.2032

  E10: 1s  loss=0.3416  v_loss=0.3497
    Geometry:  b_cv=0.2077  e_cv=0.1647  spread=0.01735  a_max=0.669
    Experts:   cos=0.782±0.007  agr=0.000003  ortho=0.000029
    Disagree:  x_cos=0.0825±0.0023  ratio=0.004691  preserve=0.000025  norms=0.2039

  E11: 1s  loss=0.3387  v_loss=0.3507
    Geometry:  b_cv=0.2019  e_cv=0.1640  spread=0.01701  a_max=0.668
    Experts:   cos=0.811±0.005  agr=0.000002  ortho=0.000024
    Disagree:  x_cos=0.0780±0.0023  ratio=0.000957  preserve=0.000015  norms=0.1889

  E12: 1s  loss=0.3391  v_loss=0.3381
    Geometry:  b_cv=0.2006  e_cv=0.1588  spread=0.01675  a_max=0.668
    Experts:   cos=0.778±0.006  agr=0.000003  ortho=0.000021
    Disagree:  x_cos=0.0729±0.0021  ratio=0.001148  preserve=0.000024  norms=0.1404

  E13: 1s  loss=0.3373  v_loss=0.3434
    Geometry:  b_cv=0.1987  e_cv=0.1635  spread=0.01671  a_max=0.668
    Experts:   cos=0.703±0.007  agr=0.000013  ortho=0.000021
    Disagree:  x_cos=0.0680±0.0026  ratio=0.003978  preserve=0.000085  norms=0.2265

  E14: 1s  loss=0.3383  v_loss=0.3351
    Geometry:  b_cv=0.2027  e_cv=0.1658  spread=0.01634  a_max=0.668
    Experts:   cos=0.779±0.005  agr=0.000007  ortho=0.000024
    Disagree:  x_cos=0.0849±0.0022  ratio=0.002337  preserve=0.000085  norms=0.1472

  E15: 1s  loss=0.3366  v_loss=0.3357
    Geometry:  b_cv=0.1999  e_cv=0.1612  spread=0.01584  a_max=0.668
    Experts:   cos=0.671±0.008  agr=0.000008  ortho=0.000023
    Disagree:  x_cos=0.0777±0.0024  ratio=0.011179  preserve=0.000061  norms=0.1758

  E16: 1s  loss=0.3363  v_loss=0.3467
    Geometry:  b_cv=0.1983  e_cv=0.1612  spread=0.01575  a_max=0.668
    Experts:   cos=0.737±0.005  agr=0.000010  ortho=0.000022
    Disagree:  x_cos=0.0839±0.0022  ratio=0.006047  preserve=0.000049  norms=0.1216

  E17: 1s  loss=0.3343  v_loss=0.3376
    Geometry:  b_cv=0.1974  e_cv=0.1655  spread=0.01591  a_max=0.668
    Experts:   cos=0.718±0.005  agr=0.000002  ortho=0.000020
    Disagree:  x_cos=0.0723±0.0023  ratio=0.002539  preserve=0.000042  norms=0.0947

  E18: 1s  loss=0.3354  v_loss=0.3457
    Geometry:  b_cv=0.1955  e_cv=0.1580  spread=0.01588  a_max=0.668
    Experts:   cos=0.763±0.005  agr=0.000007  ortho=0.000019
    Disagree:  x_cos=0.0796±0.0022  ratio=0.004057  preserve=0.000069  norms=0.1001

  E19: 1s  loss=0.3344  v_loss=0.3313
    Geometry:  b_cv=0.1962  e_cv=0.1602  spread=0.01560  a_max=0.668
    Experts:   cos=0.687±0.005  agr=0.000005  ortho=0.000018
    Disagree:  x_cos=0.0862±0.0024  ratio=0.005997  preserve=0.000030  norms=0.1218

  E20: 1s  loss=0.3331  v_loss=0.3651
    Geometry:  b_cv=0.1950  e_cv=0.1631  spread=0.01556  a_max=0.668
    Experts:   cos=0.729±0.005  agr=0.000007  ortho=0.000018
    Disagree:  x_cos=0.0826±0.0021  ratio=0.006963  preserve=0.000065  norms=0.0781

=================================================================
PHASE 3: GEOMETRIC VERIFICATION
=================================================================
  Passthrough:     1.000000 (target: 1.000)
  Emb CV:          0.1660 (consensus: 0.1771)
  Geo context CV:  0.2053
  Geo eff_dim:     30.5 / 128
  Expert cos:      0.729 ± 0.005
  Anchor max cos:  0.668
  Disagreement:
    Cross-expert:  0.0826 ± 0.0021
    Ratio:         0.006963 (target: 0.000000)
    Norm spread:   0.0781

=================================================================
PHASE 4: CLASSIFIER STABILITY TEST
=================================================================
  with_bank      : train=0.746  val=0.500  gap=0.246
  without_bank   : train=0.490  val=0.363  gap=0.126

=================================================================
SUMMARY
=================================================================
  Consensus CV:     0.1771
  Consensus eff_dim:109.5
  Student v_cos:    0.686
  Student v_cv:     0.182
  Bank params:      2,921,088
  Bank geo_eff_dim: 30.5
  Bank geo_cv:      0.2053

=================================================================
DONE
=================================================================