Create rapid_prototype_2_output_a50_n37.txt
Browse files
rapid_prototype_2_output_a50_n37.txt
ADDED
|
@@ -0,0 +1,134 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
=================================================================
|
| 2 |
+
RAPID PROTOTYPE v2: Differentiation-Centered Bank
|
| 3 |
+
=================================================================
|
| 4 |
+
Device: cuda
|
| 5 |
+
|
| 6 |
+
=================================================================
|
| 7 |
+
PHASE 0: EXTRACTION
|
| 8 |
+
=================================================================
|
| 9 |
+
Captions: 20,000
|
| 10 |
+
|
| 11 |
+
Extracting: bert...
|
| 12 |
+
Loading weights: 100%
|
| 13 |
+
199/199 [00:00<00:00, 4355.73it/s, Materializing param=pooler.dense.weight]
|
| 14 |
+
BertModel LOAD REPORT from: google-bert/bert-base-uncased
|
| 15 |
+
Key | Status | |
|
| 16 |
+
-------------------------------------------+------------+--+-
|
| 17 |
+
cls.predictions.transform.LayerNorm.weight | UNEXPECTED | |
|
| 18 |
+
cls.predictions.transform.LayerNorm.bias | UNEXPECTED | |
|
| 19 |
+
cls.predictions.transform.dense.weight | UNEXPECTED | |
|
| 20 |
+
cls.predictions.bias | UNEXPECTED | |
|
| 21 |
+
cls.seq_relationship.weight | UNEXPECTED | |
|
| 22 |
+
cls.seq_relationship.bias | UNEXPECTED | |
|
| 23 |
+
cls.predictions.transform.dense.bias | UNEXPECTED | |
|
| 24 |
+
|
| 25 |
+
Notes:
|
| 26 |
+
- UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
|
| 27 |
+
bert: 100%|██████████| 157/157 [00:24<00:00, 6.52it/s]
|
| 28 |
+
Shape: torch.Size([20000, 768])
|
| 29 |
+
|
| 30 |
+
Extracting: modern...
|
| 31 |
+
Loading weights: 100%
|
| 32 |
+
134/134 [00:00<00:00, 4050.54it/s, Materializing param=layers.21.mlp_norm.weight]
|
| 33 |
+
ModernBertModel LOAD REPORT from: answerdotai/ModernBERT-base
|
| 34 |
+
Key | Status | |
|
| 35 |
+
------------------+------------+--+-
|
| 36 |
+
head.norm.weight | UNEXPECTED | |
|
| 37 |
+
head.dense.weight | UNEXPECTED | |
|
| 38 |
+
decoder.bias | UNEXPECTED | |
|
| 39 |
+
|
| 40 |
+
Notes:
|
| 41 |
+
- UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
|
| 42 |
+
modern: 100%|██████████| 157/157 [00:36<00:00, 4.26it/s]
|
| 43 |
+
Shape: torch.Size([20000, 768])
|
| 44 |
+
|
| 45 |
+
=================================================================
|
| 46 |
+
PHASE 0b: PROCRUSTES ALIGNMENT + CONSENSUS STATISTICS
|
| 47 |
+
=================================================================
|
| 48 |
+
bert : cos 1.0000 → 1.0000
|
| 49 |
+
modern : cos -0.0025 → 0.4849
|
| 50 |
+
cos(consensus, bert): 0.9574
|
| 51 |
+
cos(consensus, modern): 0.9584
|
| 52 |
+
|
| 53 |
+
Measuring consensus statistics...
|
| 54 |
+
CV: 0.1316
|
| 55 |
+
Mean cos: 0.0009
|
| 56 |
+
Eff dim: 223.7
|
| 57 |
+
Spectral: [0.0203, 0.0193, 0.0167, 0.0147, 0.0144...]
|
| 58 |
+
|
| 59 |
+
=================================================================
|
| 60 |
+
PHASE 1: TRAIN STUDENT
|
| 61 |
+
=================================================================
|
| 62 |
+
Student: 11,269,632 params
|
| 63 |
+
CV target: 0.1316
|
| 64 |
+
E1: 3s loss=3.1731 t_acc=0.322 t_cos=0.287 v_acc=0.448 v_cos=0.437 v_cv=0.227
|
| 65 |
+
E2: 2s loss=1.6553 t_acc=0.713 t_cos=0.470 v_acc=0.649 v_cos=0.518 v_cv=0.197
|
| 66 |
+
E3: 2s loss=1.1581 t_acc=0.858 t_cos=0.531 v_acc=0.814 v_cos=0.566 v_cv=0.169
|
| 67 |
+
E4: 2s loss=0.8765 t_acc=0.922 t_cos=0.567 v_acc=0.867 v_cos=0.598 v_cv=0.167
|
| 68 |
+
E5: 2s loss=0.7007 t_acc=0.954 t_cos=0.593 v_acc=0.892 v_cos=0.612 v_cv=0.169
|
| 69 |
+
|
| 70 |
+
Student saved. v_cos=0.612, v_cv=0.169
|
| 71 |
+
|
| 72 |
+
=================================================================
|
| 73 |
+
PHASE 2: TRAIN ALIGNMENT BANK (student frozen)
|
| 74 |
+
=================================================================
|
| 75 |
+
Pre-encoding through frozen student...
|
| 76 |
+
Student embeddings: torch.Size([18000, 768])
|
| 77 |
+
Expert 0 (bert): rotation + whitener loaded, cos_after=1.0000
|
| 78 |
+
Expert 1 (modern): rotation + whitener loaded, cos_after=0.4849
|
| 79 |
+
Anchors: 512 initialized from consensus embeddings
|
| 80 |
+
Targets: CV=0.1316, mean_cos=0.0009
|
| 81 |
+
Bank: 2,918,784 params
|
| 82 |
+
Bank targets: CV=0.1316, mean_cos=0.0009
|
| 83 |
+
E 1: 1s loss=0.4916 v_loss=0.4484 agr=0.00001 ortho=0.00005 spread=0.02713 b_cv=0.2574 e_cv=0.1571 x_var=0.00238 a_max=0.632 exp=0.894±0.020
|
| 84 |
+
E 2: 1s loss=0.4055 v_loss=0.3992 agr=0.00001 ortho=0.00008 spread=0.02210 b_cv=0.2201 e_cv=0.1541 x_var=0.00050 a_max=0.653 exp=0.910±0.021
|
| 85 |
+
E 3: 1s loss=0.3751 v_loss=0.3778 agr=0.00001 ortho=0.00007 spread=0.02176 b_cv=0.2053 e_cv=0.1571 x_var=0.00058 a_max=0.654 exp=0.906±0.023
|
| 86 |
+
E 4: 1s loss=0.3637 v_loss=0.3686 agr=0.00001 ortho=0.00009 spread=0.02109 b_cv=0.1990 e_cv=0.1611 x_var=0.00060 a_max=0.655 exp=0.890±0.032
|
| 87 |
+
E 5: 1s loss=0.3575 v_loss=0.3670 agr=0.00001 ortho=0.00010 spread=0.02014 b_cv=0.1970 e_cv=0.1608 x_var=0.00050 a_max=0.655 exp=0.873±0.033
|
| 88 |
+
E 6: 1s loss=0.3530 v_loss=0.3679 agr=0.00001 ortho=0.00011 spread=0.01906 b_cv=0.1993 e_cv=0.1539 x_var=0.00045 a_max=0.656 exp=0.848±0.045
|
| 89 |
+
E 7: 1s loss=0.3504 v_loss=0.3512 agr=0.00001 ortho=0.00011 spread=0.01847 b_cv=0.1941 e_cv=0.1582 x_var=0.00043 a_max=0.656 exp=0.835±0.048
|
| 90 |
+
E 8: 1s loss=0.3472 v_loss=0.3467 agr=0.00002 ortho=0.00011 spread=0.01799 b_cv=0.1894 e_cv=0.1562 x_var=0.00041 a_max=0.656 exp=0.832±0.047
|
| 91 |
+
E 9: 1s loss=0.3454 v_loss=0.3396 agr=0.00002 ortho=0.00012 spread=0.01804 b_cv=0.1860 e_cv=0.1597 x_var=0.00056 a_max=0.656 exp=0.817±0.051
|
| 92 |
+
E10: 1s loss=0.3434 v_loss=0.3447 agr=0.00001 ortho=0.00012 spread=0.01758 b_cv=0.1873 e_cv=0.1536 x_var=0.00050 a_max=0.656 exp=0.795±0.055
|
| 93 |
+
E11: 1s loss=0.3418 v_loss=0.3351 agr=0.00001 ortho=0.00012 spread=0.01724 b_cv=0.1855 e_cv=0.1547 x_var=0.00053 a_max=0.656 exp=0.823±0.055
|
| 94 |
+
E12: 1s loss=0.3403 v_loss=0.3520 agr=0.00001 ortho=0.00012 spread=0.01733 b_cv=0.1794 e_cv=0.1578 x_var=0.00045 a_max=0.656 exp=0.824±0.054
|
| 95 |
+
E13: 1s loss=0.3408 v_loss=0.3619 agr=0.00001 ortho=0.00013 spread=0.01707 b_cv=0.1841 e_cv=0.1535 x_var=0.00049 a_max=0.656 exp=0.786±0.057
|
| 96 |
+
E14: 1s loss=0.3404 v_loss=0.3545 agr=0.00001 ortho=0.00015 spread=0.01681 b_cv=0.1808 e_cv=0.1594 x_var=0.00060 a_max=0.656 exp=0.787±0.055
|
| 97 |
+
E15: 1s loss=0.3379 v_loss=0.3419 agr=0.00001 ortho=0.00012 spread=0.01671 b_cv=0.1757 e_cv=0.1575 x_var=0.00058 a_max=0.656 exp=0.828±0.050
|
| 98 |
+
E16: 1s loss=0.3374 v_loss=0.3486 agr=0.00001 ortho=0.00013 spread=0.01662 b_cv=0.1798 e_cv=0.1546 x_var=0.00061 a_max=0.656 exp=0.812±0.050
|
| 99 |
+
E17: 1s loss=0.3367 v_loss=0.3522 agr=0.00001 ortho=0.00012 spread=0.01684 b_cv=0.1745 e_cv=0.1557 x_var=0.00048 a_max=0.656 exp=0.835±0.047
|
| 100 |
+
E18: 1s loss=0.3368 v_loss=0.3625 agr=0.00001 ortho=0.00010 spread=0.01685 b_cv=0.1778 e_cv=0.1545 x_var=0.00041 a_max=0.657 exp=0.815±0.051
|
| 101 |
+
E19: 1s loss=0.3373 v_loss=0.3441 agr=0.00001 ortho=0.00012 spread=0.01689 b_cv=0.1756 e_cv=0.1602 x_var=0.00047 a_max=0.656 exp=0.793±0.057
|
| 102 |
+
E20: 1s loss=0.3365 v_loss=0.3377 agr=0.00001 ortho=0.00012 spread=0.01701 b_cv=0.1780 e_cv=0.1556 x_var=0.00051 a_max=0.656 exp=0.794±0.052
|
| 103 |
+
|
| 104 |
+
=================================================================
|
| 105 |
+
PHASE 3: GEOMETRIC VERIFICATION
|
| 106 |
+
=================================================================
|
| 107 |
+
Passthrough: 1.000000 (target: 1.000)
|
| 108 |
+
Emb CV: 0.1592 (consensus: 0.1316)
|
| 109 |
+
Geo context CV: 0.1790
|
| 110 |
+
Geo eff_dim: 33.7 / 128
|
| 111 |
+
Expert cos: 0.794 ± 0.052
|
| 112 |
+
Anchor max cos: 0.656
|
| 113 |
+
Cross-expert: 0.032
|
| 114 |
+
|
| 115 |
+
=================================================================
|
| 116 |
+
PHASE 4: CLASSIFIER STABILITY TEST
|
| 117 |
+
=================================================================
|
| 118 |
+
with_bank : train=0.782 val=0.505 gap=0.277
|
| 119 |
+
without_bank : train=0.512 val=0.372 gap=0.140
|
| 120 |
+
|
| 121 |
+
=================================================================
|
| 122 |
+
SUMMARY
|
| 123 |
+
=================================================================
|
| 124 |
+
Consensus CV: 0.1316
|
| 125 |
+
Consensus eff_dim:223.7
|
| 126 |
+
Student v_cos: 0.612
|
| 127 |
+
Student v_cv: 0.169
|
| 128 |
+
Bank params: 2,918,784
|
| 129 |
+
Bank geo_eff_dim: 33.7
|
| 130 |
+
Bank geo_cv: 0.1790
|
| 131 |
+
|
| 132 |
+
=================================================================
|
| 133 |
+
DONE
|
| 134 |
+
=================================================================
|