| ================================================================= |
| RAPID PROTOTYPE v2: Differentiation-Centered Bank |
| ================================================================= |
| Device: cuda |
|
|
| ================================================================= |
| PHASE 0: EXTRACTION |
| ================================================================= |
| Captions: 20,000 |
|
|
| Extracting: bert... |
| Loading weights: 100% |
| 199/199 [00:00<00:00, 4038.86it/s, Materializing param=pooler.dense.weight] |
| BertModel LOAD REPORT from: google-bert/bert-base-uncased |
| Key | Status | | |
| -------------------------------------------+------------+--+- |
| cls.predictions.bias | UNEXPECTED | | |
| cls.predictions.transform.dense.bias | UNEXPECTED | | |
| cls.predictions.transform.LayerNorm.weight | UNEXPECTED | | |
| cls.predictions.transform.LayerNorm.bias | UNEXPECTED | | |
| cls.seq_relationship.weight | UNEXPECTED | | |
| cls.predictions.transform.dense.weight | UNEXPECTED | | |
| cls.seq_relationship.bias | UNEXPECTED | | |
|
|
| Notes: |
| - UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch. |
| bert: 100%|██████████| 157/157 [00:23<00:00, 6.55it/s] |
| Shape: torch.Size([20000, 768]) |
|
|
| Extracting: modern... |
| Loading weights: 100% |
| 134/134 [00:00<00:00, 4016.84it/s, Materializing param=layers.21.mlp_norm.weight] |
| ModernBertModel LOAD REPORT from: answerdotai/ModernBERT-base |
| Key | Status | | |
| ------------------+------------+--+- |
| head.dense.weight | UNEXPECTED | | |
| head.norm.weight | UNEXPECTED | | |
| decoder.bias | UNEXPECTED | | |
|
|
| Notes: |
| - UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch. |
| modern: 100%|██████████| 157/157 [00:35<00:00, 4.39it/s] |
| Shape: torch.Size([20000, 768]) |
|
|
| ================================================================= |
| PHASE 0b: GENERALIZED PROCRUSTES ALIGNMENT (no reference bias) |
| ================================================================= |
| GPA iter 1: delta=1.19668072 |
| GPA iter 3: delta=0.00029225 |
| GPA iter 6: delta=0.00006347 |
| GPA iter 9: delta=0.00002718 |
| bert : cos_after=0.8541 cos_to_mean=0.9865 |
| modern : cos_after=0.8577 cos_to_mean=0.9867 |
| cos(consensus, bert): 0.9867 |
| cos(consensus, modern): 0.9868 |
| Equidistance range: 0.0001 (should be near 0) |
|
|
| Measuring consensus statistics... |
| CV: 0.1771 |
| Mean cos: 0.0018 |
| Eff dim: 109.5 |
| Spectral: [0.0343, 0.0322, 0.0275, 0.0240, 0.0222...] |
|
|
| ================================================================= |
| PHASE 1: TRAIN STUDENT |
| ================================================================= |
| Student: 11,269,632 params |
| CV target: 0.1771 |
| E1: 2s loss=2.9588 t_acc=0.362 t_cos=0.334 v_acc=0.494 v_cos=0.503 v_cv=0.223 |
| E2: 2s loss=1.4268 t_acc=0.761 t_cos=0.543 v_acc=0.704 v_cos=0.588 v_cv=0.212 |
| E3: 2s loss=0.9784 t_acc=0.887 t_cos=0.604 v_acc=0.822 v_cos=0.639 v_cv=0.182 |
| E4: 2s loss=0.7289 t_acc=0.943 t_cos=0.641 v_acc=0.912 v_cos=0.676 v_cv=0.182 |
| E5: 2s loss=0.5807 t_acc=0.968 t_cos=0.666 v_acc=0.920 v_cos=0.686 v_cv=0.182 |
|
|
| Student saved. v_cos=0.686, v_cv=0.182 |
|
|
| ================================================================= |
| PHASE 2: TRAIN ALIGNMENT BANK (student frozen) |
| ================================================================= |
| Pre-encoding through frozen student... |
| Student embeddings: torch.Size([18000, 768]) |
| Expert 0 (bert): rotation + whitener + mean loaded, cos_after=0.8541 |
| Expert 1 (modern): rotation + whitener + mean loaded, cos_after=0.8577 |
| Anchors: 512 initialized from consensus embeddings |
| Targets: CV=0.1771, mean_cos=0.0018 |
| Bank: 2,921,088 params |
| Bank targets: CV=0.1771, mean_cos=0.0018 |
| Calibrated disagreement: |
| cross_cos: 0.0794 ± 0.0035 |
| disagree_ratio: 0.000000 |
|
|
| E 1: 1s loss=0.4775 v_loss=0.4350 |
| Geometry: b_cv=0.2673 e_cv=0.1680 spread=0.03938 a_max=0.653 |
| Experts: cos=0.791±0.006 agr=0.000142 ortho=0.000440 |
| Disagree: x_cos=0.0833±0.0019 ratio=0.004748 preserve=0.014587 norms=0.1814 |
|
|
| E 2: 1s loss=0.3981 v_loss=0.3833 |
| Geometry: b_cv=0.2230 e_cv=0.1651 spread=0.02783 a_max=0.669 |
| Experts: cos=0.809±0.006 agr=0.000008 ortho=0.000344 |
| Disagree: x_cos=0.0817±0.0018 ratio=0.003509 preserve=0.000024 norms=0.1719 |
|
|
| E 3: 1s loss=0.3730 v_loss=0.3757 |
| Geometry: b_cv=0.2162 e_cv=0.1648 spread=0.02493 a_max=0.670 |
| Experts: cos=0.830±0.005 agr=0.000004 ortho=0.000186 |
| Disagree: x_cos=0.0799±0.0019 ratio=0.002291 preserve=0.000013 norms=0.1513 |
|
|
| E 4: 1s loss=0.3623 v_loss=0.3708 |
| Geometry: b_cv=0.2187 e_cv=0.1615 spread=0.02314 a_max=0.670 |
| Experts: cos=0.832±0.005 agr=0.000003 ortho=0.000115 |
| Disagree: x_cos=0.0793±0.0020 ratio=0.003285 preserve=0.000011 norms=0.1422 |
|
|
| E 5: 1s loss=0.3554 v_loss=0.3539 |
| Geometry: b_cv=0.2141 e_cv=0.1621 spread=0.02139 a_max=0.669 |
| Experts: cos=0.853±0.004 agr=0.000002 ortho=0.000079 |
| Disagree: x_cos=0.0781±0.0021 ratio=0.001270 preserve=0.000011 norms=0.0980 |
|
|
| E 6: 1s loss=0.3507 v_loss=0.3571 |
| Geometry: b_cv=0.2124 e_cv=0.1633 spread=0.02019 a_max=0.669 |
| Experts: cos=0.829±0.005 agr=0.000001 ortho=0.000058 |
| Disagree: x_cos=0.0788±0.0022 ratio=0.001736 preserve=0.000010 norms=0.1789 |
|
|
| E 7: 1s loss=0.3460 v_loss=0.3465 |
| Geometry: b_cv=0.2059 e_cv=0.1607 spread=0.01903 a_max=0.669 |
| Experts: cos=0.845±0.005 agr=0.000001 ortho=0.000045 |
| Disagree: x_cos=0.0819±0.0023 ratio=0.001425 preserve=0.000008 norms=0.1536 |
|
|
| E 8: 1s loss=0.3449 v_loss=0.3421 |
| Geometry: b_cv=0.2060 e_cv=0.1592 spread=0.01841 a_max=0.670 |
| Experts: cos=0.833±0.005 agr=0.000003 ortho=0.000035 |
| Disagree: x_cos=0.0885±0.0021 ratio=0.001539 preserve=0.000017 norms=0.1313 |
|
|
| E 9: 1s loss=0.3422 v_loss=0.3451 |
| Geometry: b_cv=0.2040 e_cv=0.1626 spread=0.01793 a_max=0.669 |
| Experts: cos=0.822±0.005 agr=0.000003 ortho=0.000031 |
| Disagree: x_cos=0.0761±0.0024 ratio=0.001610 preserve=0.000037 norms=0.2032 |
|
|
| E10: 1s loss=0.3416 v_loss=0.3497 |
| Geometry: b_cv=0.2077 e_cv=0.1647 spread=0.01735 a_max=0.669 |
| Experts: cos=0.782±0.007 agr=0.000003 ortho=0.000029 |
| Disagree: x_cos=0.0825±0.0023 ratio=0.004691 preserve=0.000025 norms=0.2039 |
|
|
| E11: 1s loss=0.3387 v_loss=0.3507 |
| Geometry: b_cv=0.2019 e_cv=0.1640 spread=0.01701 a_max=0.668 |
| Experts: cos=0.811±0.005 agr=0.000002 ortho=0.000024 |
| Disagree: x_cos=0.0780±0.0023 ratio=0.000957 preserve=0.000015 norms=0.1889 |
|
|
| E12: 1s loss=0.3391 v_loss=0.3381 |
| Geometry: b_cv=0.2006 e_cv=0.1588 spread=0.01675 a_max=0.668 |
| Experts: cos=0.778±0.006 agr=0.000003 ortho=0.000021 |
| Disagree: x_cos=0.0729±0.0021 ratio=0.001148 preserve=0.000024 norms=0.1404 |
|
|
| E13: 1s loss=0.3373 v_loss=0.3434 |
| Geometry: b_cv=0.1987 e_cv=0.1635 spread=0.01671 a_max=0.668 |
| Experts: cos=0.703±0.007 agr=0.000013 ortho=0.000021 |
| Disagree: x_cos=0.0680±0.0026 ratio=0.003978 preserve=0.000085 norms=0.2265 |
|
|
| E14: 1s loss=0.3383 v_loss=0.3351 |
| Geometry: b_cv=0.2027 e_cv=0.1658 spread=0.01634 a_max=0.668 |
| Experts: cos=0.779±0.005 agr=0.000007 ortho=0.000024 |
| Disagree: x_cos=0.0849±0.0022 ratio=0.002337 preserve=0.000085 norms=0.1472 |
|
|
| E15: 1s loss=0.3366 v_loss=0.3357 |
| Geometry: b_cv=0.1999 e_cv=0.1612 spread=0.01584 a_max=0.668 |
| Experts: cos=0.671±0.008 agr=0.000008 ortho=0.000023 |
| Disagree: x_cos=0.0777±0.0024 ratio=0.011179 preserve=0.000061 norms=0.1758 |
|
|
| E16: 1s loss=0.3363 v_loss=0.3467 |
| Geometry: b_cv=0.1983 e_cv=0.1612 spread=0.01575 a_max=0.668 |
| Experts: cos=0.737±0.005 agr=0.000010 ortho=0.000022 |
| Disagree: x_cos=0.0839±0.0022 ratio=0.006047 preserve=0.000049 norms=0.1216 |
|
|
| E17: 1s loss=0.3343 v_loss=0.3376 |
| Geometry: b_cv=0.1974 e_cv=0.1655 spread=0.01591 a_max=0.668 |
| Experts: cos=0.718±0.005 agr=0.000002 ortho=0.000020 |
| Disagree: x_cos=0.0723±0.0023 ratio=0.002539 preserve=0.000042 norms=0.0947 |
|
|
| E18: 1s loss=0.3354 v_loss=0.3457 |
| Geometry: b_cv=0.1955 e_cv=0.1580 spread=0.01588 a_max=0.668 |
| Experts: cos=0.763±0.005 agr=0.000007 ortho=0.000019 |
| Disagree: x_cos=0.0796±0.0022 ratio=0.004057 preserve=0.000069 norms=0.1001 |
|
|
| E19: 1s loss=0.3344 v_loss=0.3313 |
| Geometry: b_cv=0.1962 e_cv=0.1602 spread=0.01560 a_max=0.668 |
| Experts: cos=0.687±0.005 agr=0.000005 ortho=0.000018 |
| Disagree: x_cos=0.0862±0.0024 ratio=0.005997 preserve=0.000030 norms=0.1218 |
|
|
| E20: 1s loss=0.3331 v_loss=0.3651 |
| Geometry: b_cv=0.1950 e_cv=0.1631 spread=0.01556 a_max=0.668 |
| Experts: cos=0.729±0.005 agr=0.000007 ortho=0.000018 |
| Disagree: x_cos=0.0826±0.0021 ratio=0.006963 preserve=0.000065 norms=0.0781 |
|
|
| ================================================================= |
| PHASE 3: GEOMETRIC VERIFICATION |
| ================================================================= |
| Passthrough: 1.000000 (target: 1.000) |
| Emb CV: 0.1660 (consensus: 0.1771) |
| Geo context CV: 0.2053 |
| Geo eff_dim: 30.5 / 128 |
| Expert cos: 0.729 ± 0.005 |
| Anchor max cos: 0.668 |
| Disagreement: |
| Cross-expert: 0.0826 ± 0.0021 |
| Ratio: 0.006963 (target: 0.000000) |
| Norm spread: 0.0781 |
|
|
| ================================================================= |
| PHASE 4: CLASSIFIER STABILITY TEST |
| ================================================================= |
| with_bank : train=0.746 val=0.500 gap=0.246 |
| without_bank : train=0.490 val=0.363 gap=0.126 |
|
|
| ================================================================= |
| SUMMARY |
| ================================================================= |
| Consensus CV: 0.1771 |
| Consensus eff_dim:109.5 |
| Student v_cos: 0.686 |
| Student v_cv: 0.182 |
| Bank params: 2,921,088 |
| Bank geo_eff_dim: 30.5 |
| Bank geo_cv: 0.2053 |
|
|
| ================================================================= |
| DONE |
| ================================================================= |