File size: 10,715 Bytes
f34d430 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 | =================================================================
RAPID PROTOTYPE v2: Differentiation-Centered Bank
=================================================================
Device: cuda
=================================================================
PHASE 0: EXTRACTION
=================================================================
Captions: 20,000
Extracting: bert...
Loading weights: 100%
199/199 [00:00<00:00, 4216.36it/s, Materializing param=pooler.dense.weight]
BertModel LOAD REPORT from: google-bert/bert-base-uncased
Key | Status | |
-------------------------------------------+------------+--+-
cls.predictions.bias | UNEXPECTED | |
cls.predictions.transform.dense.bias | UNEXPECTED | |
cls.predictions.transform.LayerNorm.weight | UNEXPECTED | |
cls.predictions.transform.LayerNorm.bias | UNEXPECTED | |
cls.seq_relationship.weight | UNEXPECTED | |
cls.predictions.transform.dense.weight | UNEXPECTED | |
cls.seq_relationship.bias | UNEXPECTED | |
Notes:
- UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
bert: 100%|██████████| 157/157 [00:23<00:00, 6.56it/s]
Shape: torch.Size([20000, 768])
Extracting: modern...
Loading weights: 100%
134/134 [00:00<00:00, 4047.07it/s, Materializing param=layers.21.mlp_norm.weight]
ModernBertModel LOAD REPORT from: answerdotai/ModernBERT-base
Key | Status | |
------------------+------------+--+-
head.dense.weight | UNEXPECTED | |
head.norm.weight | UNEXPECTED | |
decoder.bias | UNEXPECTED | |
Notes:
- UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
modern: 100%|██████████| 157/157 [00:35<00:00, 4.39it/s]
Shape: torch.Size([20000, 768])
=================================================================
PHASE 0b: GENERALIZED PROCRUSTES ALIGNMENT (no reference bias)
=================================================================
GPA iter 1: delta=1.19668072
GPA iter 3: delta=0.00029225
GPA iter 6: delta=0.00006347
GPA iter 9: delta=0.00002718
bert : cos_after=0.8541 cos_to_mean=0.9865
modern : cos_after=0.8577 cos_to_mean=0.9867
cos(consensus, bert): 0.9867
cos(consensus, modern): 0.9868
Equidistance range: 0.0001 (should be near 0)
Measuring consensus statistics...
CV: 0.1771
Mean cos: 0.0018
Eff dim: 109.5
Spectral: [0.0343, 0.0322, 0.0275, 0.0240, 0.0222...]
=================================================================
PHASE 1: TRAIN STUDENT
=================================================================
Student: 11,269,632 params
CV target: 0.1771
E1: 2s loss=2.9588 t_acc=0.362 t_cos=0.334 v_acc=0.494 v_cos=0.503 v_cv=0.223
E2: 2s loss=1.4268 t_acc=0.761 t_cos=0.543 v_acc=0.704 v_cos=0.588 v_cv=0.212
E3: 2s loss=0.9784 t_acc=0.887 t_cos=0.604 v_acc=0.822 v_cos=0.639 v_cv=0.182
E4: 2s loss=0.7289 t_acc=0.943 t_cos=0.641 v_acc=0.912 v_cos=0.676 v_cv=0.182
E5: 2s loss=0.5807 t_acc=0.968 t_cos=0.666 v_acc=0.920 v_cos=0.686 v_cv=0.182
Student saved. v_cos=0.686, v_cv=0.182
=================================================================
PHASE 2: TRAIN ALIGNMENT BANK (student frozen)
=================================================================
Pre-encoding through frozen student...
Student embeddings: torch.Size([18000, 768])
Expert 0 (bert): rotation + whitener + mean loaded, cos_after=0.8541
Expert 1 (modern): rotation + whitener + mean loaded, cos_after=0.8577
Anchors: 512 initialized from consensus embeddings
Targets: CV=0.1771, mean_cos=0.0018
Bank: 2,921,088 params
Bank targets: CV=0.1771, mean_cos=0.0018
Calibrated disagreement (n=2000):
cross_cos: 0.0794 ± 0.0035
disagree_ratio: median=0.000000 mean=0.000000 std=0.000000
expert_cos: 1.0000 ± 0.0000
E 1: 1s loss=0.4789 v_loss=0.4172
Geometry: b_cv=0.2688 e_cv=0.1603 spread=0.03940 a_max=0.652
Experts: cos=0.794±0.006 agr=0.000092 ortho=0.000388
Disagree: x_cos=0.0740±0.0009 ratio=0.004326 preserve=0.013135 norms=0.1626
E 2: 1s loss=0.4002 v_loss=0.3818
Geometry: b_cv=0.2229 e_cv=0.1588 spread=0.02779 a_max=0.668
Experts: cos=0.807±0.006 agr=0.000007 ortho=0.000288
Disagree: x_cos=0.0805±0.0014 ratio=0.003575 preserve=0.000024 norms=0.1703
E 3: 1s loss=0.3743 v_loss=0.3625
Geometry: b_cv=0.2189 e_cv=0.1606 spread=0.02500 a_max=0.670
Experts: cos=0.835±0.005 agr=0.000005 ortho=0.000152
Disagree: x_cos=0.0774±0.0018 ratio=0.002279 preserve=0.000016 norms=0.1066
E 4: 1s loss=0.3591 v_loss=0.3615
Geometry: b_cv=0.2100 e_cv=0.1643 spread=0.02302 a_max=0.670
Experts: cos=0.822±0.005 agr=0.000003 ortho=0.000094
Disagree: x_cos=0.0781±0.0021 ratio=0.001569 preserve=0.000020 norms=0.1137
E 5: 1s loss=0.3537 v_loss=0.3665
Geometry: b_cv=0.2118 e_cv=0.1664 spread=0.02133 a_max=0.670
Experts: cos=0.815±0.006 agr=0.000002 ortho=0.000066
Disagree: x_cos=0.0765±0.0021 ratio=0.001389 preserve=0.000026 norms=0.1669
E 6: 1s loss=0.3506 v_loss=0.3527
Geometry: b_cv=0.2097 e_cv=0.1600 spread=0.02009 a_max=0.670
Experts: cos=0.829±0.005 agr=0.000003 ortho=0.000048
Disagree: x_cos=0.0846±0.0024 ratio=0.001772 preserve=0.000021 norms=0.1363
E 7: 1s loss=0.3459 v_loss=0.3502
Geometry: b_cv=0.2055 e_cv=0.1628 spread=0.01906 a_max=0.670
Experts: cos=0.759±0.007 agr=0.000004 ortho=0.000040
Disagree: x_cos=0.0774±0.0022 ratio=0.003070 preserve=0.000049 norms=0.1964
E 8: 1s loss=0.3442 v_loss=0.3479
Geometry: b_cv=0.2078 e_cv=0.1643 spread=0.01817 a_max=0.669
Experts: cos=0.745±0.007 agr=0.000003 ortho=0.000033
Disagree: x_cos=0.0782±0.0023 ratio=0.001258 preserve=0.000021 norms=0.1772
E 9: 1s loss=0.3419 v_loss=0.3451
Geometry: b_cv=0.2015 e_cv=0.1646 spread=0.01756 a_max=0.670
Experts: cos=0.767±0.006 agr=0.000007 ortho=0.000030
Disagree: x_cos=0.0823±0.0024 ratio=0.001625 preserve=0.000049 norms=0.2007
E10: 1s loss=0.3433 v_loss=0.3433
Geometry: b_cv=0.2074 e_cv=0.1594 spread=0.01746 a_max=0.669
Experts: cos=0.762±0.005 agr=0.000006 ortho=0.000026
Disagree: x_cos=0.0766±0.0018 ratio=0.001418 preserve=0.000073 norms=0.0529
E11: 1s loss=0.3392 v_loss=0.3501
Geometry: b_cv=0.2021 e_cv=0.1609 spread=0.01705 a_max=0.669
Experts: cos=0.721±0.007 agr=0.000004 ortho=0.000026
Disagree: x_cos=0.0698±0.0022 ratio=0.006405 preserve=0.000037 norms=0.1509
E12: 1s loss=0.3383 v_loss=0.3534
Geometry: b_cv=0.1983 e_cv=0.1639 spread=0.01693 a_max=0.668
Experts: cos=0.753±0.005 agr=0.000014 ortho=0.000026
Disagree: x_cos=0.0743±0.0021 ratio=0.000903 preserve=0.000076 norms=0.0763
E13: 1s loss=0.3374 v_loss=0.3398
Geometry: b_cv=0.1996 e_cv=0.1603 spread=0.01660 a_max=0.669
Experts: cos=0.714±0.006 agr=0.000004 ortho=0.000022
Disagree: x_cos=0.0791±0.0021 ratio=0.006335 preserve=0.000060 norms=0.1257
E14: 1s loss=0.3376 v_loss=0.3415
Geometry: b_cv=0.1992 e_cv=0.1657 spread=0.01647 a_max=0.669
Experts: cos=0.704±0.006 agr=0.000006 ortho=0.000022
Disagree: x_cos=0.0824±0.0021 ratio=0.006577 preserve=0.000061 norms=0.0873
E15: 1s loss=0.3372 v_loss=0.3409
Geometry: b_cv=0.2003 e_cv=0.1615 spread=0.01635 a_max=0.669
Experts: cos=0.745±0.005 agr=0.000003 ortho=0.000019
Disagree: x_cos=0.0760±0.0020 ratio=0.002660 preserve=0.000045 norms=0.0958
E16: 1s loss=0.3355 v_loss=0.3328
Geometry: b_cv=0.1990 e_cv=0.1601 spread=0.01600 a_max=0.669
Experts: cos=0.689±0.005 agr=0.000004 ortho=0.000018
Disagree: x_cos=0.0814±0.0024 ratio=0.002029 preserve=0.000042 norms=0.1414
E17: 1s loss=0.3350 v_loss=0.3432
Geometry: b_cv=0.1945 e_cv=0.1604 spread=0.01603 a_max=0.668
Experts: cos=0.751±0.003 agr=0.000028 ortho=0.000020
Disagree: x_cos=0.0825±0.0023 ratio=0.001129 preserve=0.000155 norms=0.0187
E18: 1s loss=0.3372 v_loss=0.3336
Geometry: b_cv=0.2044 e_cv=0.1605 spread=0.01590 a_max=0.668
Experts: cos=0.720±0.003 agr=0.000004 ortho=0.000022
Disagree: x_cos=0.0799±0.0020 ratio=0.002103 preserve=0.000055 norms=0.0331
E19: 1s loss=0.3326 v_loss=0.3456
Geometry: b_cv=0.1948 e_cv=0.1654 spread=0.01562 a_max=0.668
Experts: cos=0.741±0.003 agr=0.000004 ortho=0.000021
Disagree: x_cos=0.0797±0.0019 ratio=0.003153 preserve=0.000054 norms=0.0169
E20: 1s loss=0.3351 v_loss=0.3460
Geometry: b_cv=0.1992 e_cv=0.1596 spread=0.01567 a_max=0.668
Experts: cos=0.725±0.005 agr=0.000002 ortho=0.000018
Disagree: x_cos=0.0776±0.0023 ratio=0.008188 preserve=0.000053 norms=0.0326
=================================================================
PHASE 3: GEOMETRIC VERIFICATION
=================================================================
Passthrough: 1.000000 (target: 1.000)
Emb CV: 0.1635 (consensus: 0.1771)
Geo context CV: 0.1892
Geo eff_dim: 30.7 / 128
Expert cos: 0.725 ± 0.005
Anchor max cos: 0.668
Disagreement:
Cross-expert: 0.0776 ± 0.0023
Ratio: 0.008188 (target: 0.000000)
Norm spread: 0.0326
=================================================================
PHASE 4: CLASSIFIER STABILITY TEST
=================================================================
Mode Dim Train Val Gap
--------------------------------------------------
raw_768 1536 0.498 0.357 0.141
raw+diff 3072 0.567 0.475 0.092
bank_enriched 1792 0.766 0.532 0.235
bank+diff 3584 0.722 0.670 0.052
geo_explicit 6 0.326 0.363 -0.037
=================================================================
SUMMARY
=================================================================
Consensus CV: 0.1771
Consensus eff_dim:109.5
Student v_cos: 0.686
Student v_cv: 0.182
Bank params: 2,921,088
Bank geo_eff_dim: 30.7
Bank geo_cv: 0.1892
=================================================================
DONE
================================================================= |