File size: 10,363 Bytes
b8f685d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 | =================================================================
RAPID PROTOTYPE v2: Differentiation-Centered Bank
=================================================================
Device: cuda
=================================================================
PHASE 0: EXTRACTION
=================================================================
Captions: 20,000
Extracting: bert...
Loading weights: 100%
199/199 [00:00<00:00, 4038.86it/s, Materializing param=pooler.dense.weight]
BertModel LOAD REPORT from: google-bert/bert-base-uncased
Key | Status | |
-------------------------------------------+------------+--+-
cls.predictions.bias | UNEXPECTED | |
cls.predictions.transform.dense.bias | UNEXPECTED | |
cls.predictions.transform.LayerNorm.weight | UNEXPECTED | |
cls.predictions.transform.LayerNorm.bias | UNEXPECTED | |
cls.seq_relationship.weight | UNEXPECTED | |
cls.predictions.transform.dense.weight | UNEXPECTED | |
cls.seq_relationship.bias | UNEXPECTED | |
Notes:
- UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
bert: 100%|██████████| 157/157 [00:23<00:00, 6.55it/s]
Shape: torch.Size([20000, 768])
Extracting: modern...
Loading weights: 100%
134/134 [00:00<00:00, 4016.84it/s, Materializing param=layers.21.mlp_norm.weight]
ModernBertModel LOAD REPORT from: answerdotai/ModernBERT-base
Key | Status | |
------------------+------------+--+-
head.dense.weight | UNEXPECTED | |
head.norm.weight | UNEXPECTED | |
decoder.bias | UNEXPECTED | |
Notes:
- UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
modern: 100%|██████████| 157/157 [00:35<00:00, 4.39it/s]
Shape: torch.Size([20000, 768])
=================================================================
PHASE 0b: GENERALIZED PROCRUSTES ALIGNMENT (no reference bias)
=================================================================
GPA iter 1: delta=1.19668072
GPA iter 3: delta=0.00029225
GPA iter 6: delta=0.00006347
GPA iter 9: delta=0.00002718
bert : cos_after=0.8541 cos_to_mean=0.9865
modern : cos_after=0.8577 cos_to_mean=0.9867
cos(consensus, bert): 0.9867
cos(consensus, modern): 0.9868
Equidistance range: 0.0001 (should be near 0)
Measuring consensus statistics...
CV: 0.1771
Mean cos: 0.0018
Eff dim: 109.5
Spectral: [0.0343, 0.0322, 0.0275, 0.0240, 0.0222...]
=================================================================
PHASE 1: TRAIN STUDENT
=================================================================
Student: 11,269,632 params
CV target: 0.1771
E1: 2s loss=2.9588 t_acc=0.362 t_cos=0.334 v_acc=0.494 v_cos=0.503 v_cv=0.223
E2: 2s loss=1.4268 t_acc=0.761 t_cos=0.543 v_acc=0.704 v_cos=0.588 v_cv=0.212
E3: 2s loss=0.9784 t_acc=0.887 t_cos=0.604 v_acc=0.822 v_cos=0.639 v_cv=0.182
E4: 2s loss=0.7289 t_acc=0.943 t_cos=0.641 v_acc=0.912 v_cos=0.676 v_cv=0.182
E5: 2s loss=0.5807 t_acc=0.968 t_cos=0.666 v_acc=0.920 v_cos=0.686 v_cv=0.182
Student saved. v_cos=0.686, v_cv=0.182
=================================================================
PHASE 2: TRAIN ALIGNMENT BANK (student frozen)
=================================================================
Pre-encoding through frozen student...
Student embeddings: torch.Size([18000, 768])
Expert 0 (bert): rotation + whitener + mean loaded, cos_after=0.8541
Expert 1 (modern): rotation + whitener + mean loaded, cos_after=0.8577
Anchors: 512 initialized from consensus embeddings
Targets: CV=0.1771, mean_cos=0.0018
Bank: 2,921,088 params
Bank targets: CV=0.1771, mean_cos=0.0018
Calibrated disagreement:
cross_cos: 0.0794 ± 0.0035
disagree_ratio: 0.000000
E 1: 1s loss=0.4775 v_loss=0.4350
Geometry: b_cv=0.2673 e_cv=0.1680 spread=0.03938 a_max=0.653
Experts: cos=0.791±0.006 agr=0.000142 ortho=0.000440
Disagree: x_cos=0.0833±0.0019 ratio=0.004748 preserve=0.014587 norms=0.1814
E 2: 1s loss=0.3981 v_loss=0.3833
Geometry: b_cv=0.2230 e_cv=0.1651 spread=0.02783 a_max=0.669
Experts: cos=0.809±0.006 agr=0.000008 ortho=0.000344
Disagree: x_cos=0.0817±0.0018 ratio=0.003509 preserve=0.000024 norms=0.1719
E 3: 1s loss=0.3730 v_loss=0.3757
Geometry: b_cv=0.2162 e_cv=0.1648 spread=0.02493 a_max=0.670
Experts: cos=0.830±0.005 agr=0.000004 ortho=0.000186
Disagree: x_cos=0.0799±0.0019 ratio=0.002291 preserve=0.000013 norms=0.1513
E 4: 1s loss=0.3623 v_loss=0.3708
Geometry: b_cv=0.2187 e_cv=0.1615 spread=0.02314 a_max=0.670
Experts: cos=0.832±0.005 agr=0.000003 ortho=0.000115
Disagree: x_cos=0.0793±0.0020 ratio=0.003285 preserve=0.000011 norms=0.1422
E 5: 1s loss=0.3554 v_loss=0.3539
Geometry: b_cv=0.2141 e_cv=0.1621 spread=0.02139 a_max=0.669
Experts: cos=0.853±0.004 agr=0.000002 ortho=0.000079
Disagree: x_cos=0.0781±0.0021 ratio=0.001270 preserve=0.000011 norms=0.0980
E 6: 1s loss=0.3507 v_loss=0.3571
Geometry: b_cv=0.2124 e_cv=0.1633 spread=0.02019 a_max=0.669
Experts: cos=0.829±0.005 agr=0.000001 ortho=0.000058
Disagree: x_cos=0.0788±0.0022 ratio=0.001736 preserve=0.000010 norms=0.1789
E 7: 1s loss=0.3460 v_loss=0.3465
Geometry: b_cv=0.2059 e_cv=0.1607 spread=0.01903 a_max=0.669
Experts: cos=0.845±0.005 agr=0.000001 ortho=0.000045
Disagree: x_cos=0.0819±0.0023 ratio=0.001425 preserve=0.000008 norms=0.1536
E 8: 1s loss=0.3449 v_loss=0.3421
Geometry: b_cv=0.2060 e_cv=0.1592 spread=0.01841 a_max=0.670
Experts: cos=0.833±0.005 agr=0.000003 ortho=0.000035
Disagree: x_cos=0.0885±0.0021 ratio=0.001539 preserve=0.000017 norms=0.1313
E 9: 1s loss=0.3422 v_loss=0.3451
Geometry: b_cv=0.2040 e_cv=0.1626 spread=0.01793 a_max=0.669
Experts: cos=0.822±0.005 agr=0.000003 ortho=0.000031
Disagree: x_cos=0.0761±0.0024 ratio=0.001610 preserve=0.000037 norms=0.2032
E10: 1s loss=0.3416 v_loss=0.3497
Geometry: b_cv=0.2077 e_cv=0.1647 spread=0.01735 a_max=0.669
Experts: cos=0.782±0.007 agr=0.000003 ortho=0.000029
Disagree: x_cos=0.0825±0.0023 ratio=0.004691 preserve=0.000025 norms=0.2039
E11: 1s loss=0.3387 v_loss=0.3507
Geometry: b_cv=0.2019 e_cv=0.1640 spread=0.01701 a_max=0.668
Experts: cos=0.811±0.005 agr=0.000002 ortho=0.000024
Disagree: x_cos=0.0780±0.0023 ratio=0.000957 preserve=0.000015 norms=0.1889
E12: 1s loss=0.3391 v_loss=0.3381
Geometry: b_cv=0.2006 e_cv=0.1588 spread=0.01675 a_max=0.668
Experts: cos=0.778±0.006 agr=0.000003 ortho=0.000021
Disagree: x_cos=0.0729±0.0021 ratio=0.001148 preserve=0.000024 norms=0.1404
E13: 1s loss=0.3373 v_loss=0.3434
Geometry: b_cv=0.1987 e_cv=0.1635 spread=0.01671 a_max=0.668
Experts: cos=0.703±0.007 agr=0.000013 ortho=0.000021
Disagree: x_cos=0.0680±0.0026 ratio=0.003978 preserve=0.000085 norms=0.2265
E14: 1s loss=0.3383 v_loss=0.3351
Geometry: b_cv=0.2027 e_cv=0.1658 spread=0.01634 a_max=0.668
Experts: cos=0.779±0.005 agr=0.000007 ortho=0.000024
Disagree: x_cos=0.0849±0.0022 ratio=0.002337 preserve=0.000085 norms=0.1472
E15: 1s loss=0.3366 v_loss=0.3357
Geometry: b_cv=0.1999 e_cv=0.1612 spread=0.01584 a_max=0.668
Experts: cos=0.671±0.008 agr=0.000008 ortho=0.000023
Disagree: x_cos=0.0777±0.0024 ratio=0.011179 preserve=0.000061 norms=0.1758
E16: 1s loss=0.3363 v_loss=0.3467
Geometry: b_cv=0.1983 e_cv=0.1612 spread=0.01575 a_max=0.668
Experts: cos=0.737±0.005 agr=0.000010 ortho=0.000022
Disagree: x_cos=0.0839±0.0022 ratio=0.006047 preserve=0.000049 norms=0.1216
E17: 1s loss=0.3343 v_loss=0.3376
Geometry: b_cv=0.1974 e_cv=0.1655 spread=0.01591 a_max=0.668
Experts: cos=0.718±0.005 agr=0.000002 ortho=0.000020
Disagree: x_cos=0.0723±0.0023 ratio=0.002539 preserve=0.000042 norms=0.0947
E18: 1s loss=0.3354 v_loss=0.3457
Geometry: b_cv=0.1955 e_cv=0.1580 spread=0.01588 a_max=0.668
Experts: cos=0.763±0.005 agr=0.000007 ortho=0.000019
Disagree: x_cos=0.0796±0.0022 ratio=0.004057 preserve=0.000069 norms=0.1001
E19: 1s loss=0.3344 v_loss=0.3313
Geometry: b_cv=0.1962 e_cv=0.1602 spread=0.01560 a_max=0.668
Experts: cos=0.687±0.005 agr=0.000005 ortho=0.000018
Disagree: x_cos=0.0862±0.0024 ratio=0.005997 preserve=0.000030 norms=0.1218
E20: 1s loss=0.3331 v_loss=0.3651
Geometry: b_cv=0.1950 e_cv=0.1631 spread=0.01556 a_max=0.668
Experts: cos=0.729±0.005 agr=0.000007 ortho=0.000018
Disagree: x_cos=0.0826±0.0021 ratio=0.006963 preserve=0.000065 norms=0.0781
=================================================================
PHASE 3: GEOMETRIC VERIFICATION
=================================================================
Passthrough: 1.000000 (target: 1.000)
Emb CV: 0.1660 (consensus: 0.1771)
Geo context CV: 0.2053
Geo eff_dim: 30.5 / 128
Expert cos: 0.729 ± 0.005
Anchor max cos: 0.668
Disagreement:
Cross-expert: 0.0826 ± 0.0021
Ratio: 0.006963 (target: 0.000000)
Norm spread: 0.0781
=================================================================
PHASE 4: CLASSIFIER STABILITY TEST
=================================================================
with_bank : train=0.746 val=0.500 gap=0.246
without_bank : train=0.490 val=0.363 gap=0.126
=================================================================
SUMMARY
=================================================================
Consensus CV: 0.1771
Consensus eff_dim:109.5
Student v_cos: 0.686
Student v_cv: 0.182
Bank params: 2,921,088
Bank geo_eff_dim: 30.5
Bank geo_cv: 0.2053
=================================================================
DONE
================================================================= |