AbstractPhil commited on
Commit
b1fb055
·
verified ·
1 Parent(s): c3bfc24

Create rapid_prototype_2_output_a50_n37.txt

Browse files
Files changed (1) hide show
  1. rapid_prototype_2_output_a50_n37.txt +134 -0
rapid_prototype_2_output_a50_n37.txt ADDED
@@ -0,0 +1,134 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ =================================================================
2
+ RAPID PROTOTYPE v2: Differentiation-Centered Bank
3
+ =================================================================
4
+ Device: cuda
5
+
6
+ =================================================================
7
+ PHASE 0: EXTRACTION
8
+ =================================================================
9
+ Captions: 20,000
10
+
11
+ Extracting: bert...
12
+ Loading weights: 100%
13
+  199/199 [00:00<00:00, 4355.73it/s, Materializing param=pooler.dense.weight]
14
+ BertModel LOAD REPORT from: google-bert/bert-base-uncased
15
+ Key | Status | |
16
+ -------------------------------------------+------------+--+-
17
+ cls.predictions.transform.LayerNorm.weight | UNEXPECTED | |
18
+ cls.predictions.transform.LayerNorm.bias | UNEXPECTED | |
19
+ cls.predictions.transform.dense.weight | UNEXPECTED | |
20
+ cls.predictions.bias | UNEXPECTED | |
21
+ cls.seq_relationship.weight | UNEXPECTED | |
22
+ cls.seq_relationship.bias | UNEXPECTED | |
23
+ cls.predictions.transform.dense.bias | UNEXPECTED | |
24
+
25
+ Notes:
26
+ - UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
27
+ bert: 100%|██████████| 157/157 [00:24<00:00, 6.52it/s]
28
+ Shape: torch.Size([20000, 768])
29
+
30
+ Extracting: modern...
31
+ Loading weights: 100%
32
+  134/134 [00:00<00:00, 4050.54it/s, Materializing param=layers.21.mlp_norm.weight]
33
+ ModernBertModel LOAD REPORT from: answerdotai/ModernBERT-base
34
+ Key | Status | |
35
+ ------------------+------------+--+-
36
+ head.norm.weight | UNEXPECTED | |
37
+ head.dense.weight | UNEXPECTED | |
38
+ decoder.bias | UNEXPECTED | |
39
+
40
+ Notes:
41
+ - UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
42
+ modern: 100%|██████████| 157/157 [00:36<00:00, 4.26it/s]
43
+ Shape: torch.Size([20000, 768])
44
+
45
+ =================================================================
46
+ PHASE 0b: PROCRUSTES ALIGNMENT + CONSENSUS STATISTICS
47
+ =================================================================
48
+ bert : cos 1.0000 → 1.0000
49
+ modern : cos -0.0025 → 0.4849
50
+ cos(consensus, bert): 0.9574
51
+ cos(consensus, modern): 0.9584
52
+
53
+ Measuring consensus statistics...
54
+ CV: 0.1316
55
+ Mean cos: 0.0009
56
+ Eff dim: 223.7
57
+ Spectral: [0.0203, 0.0193, 0.0167, 0.0147, 0.0144...]
58
+
59
+ =================================================================
60
+ PHASE 1: TRAIN STUDENT
61
+ =================================================================
62
+ Student: 11,269,632 params
63
+ CV target: 0.1316
64
+ E1: 3s loss=3.1731 t_acc=0.322 t_cos=0.287 v_acc=0.448 v_cos=0.437 v_cv=0.227
65
+ E2: 2s loss=1.6553 t_acc=0.713 t_cos=0.470 v_acc=0.649 v_cos=0.518 v_cv=0.197
66
+ E3: 2s loss=1.1581 t_acc=0.858 t_cos=0.531 v_acc=0.814 v_cos=0.566 v_cv=0.169
67
+ E4: 2s loss=0.8765 t_acc=0.922 t_cos=0.567 v_acc=0.867 v_cos=0.598 v_cv=0.167
68
+ E5: 2s loss=0.7007 t_acc=0.954 t_cos=0.593 v_acc=0.892 v_cos=0.612 v_cv=0.169
69
+
70
+ Student saved. v_cos=0.612, v_cv=0.169
71
+
72
+ =================================================================
73
+ PHASE 2: TRAIN ALIGNMENT BANK (student frozen)
74
+ =================================================================
75
+ Pre-encoding through frozen student...
76
+ Student embeddings: torch.Size([18000, 768])
77
+ Expert 0 (bert): rotation + whitener loaded, cos_after=1.0000
78
+ Expert 1 (modern): rotation + whitener loaded, cos_after=0.4849
79
+ Anchors: 512 initialized from consensus embeddings
80
+ Targets: CV=0.1316, mean_cos=0.0009
81
+ Bank: 2,918,784 params
82
+ Bank targets: CV=0.1316, mean_cos=0.0009
83
+ E 1: 1s loss=0.4916 v_loss=0.4484 agr=0.00001 ortho=0.00005 spread=0.02713 b_cv=0.2574 e_cv=0.1571 x_var=0.00238 a_max=0.632 exp=0.894±0.020
84
+ E 2: 1s loss=0.4055 v_loss=0.3992 agr=0.00001 ortho=0.00008 spread=0.02210 b_cv=0.2201 e_cv=0.1541 x_var=0.00050 a_max=0.653 exp=0.910±0.021
85
+ E 3: 1s loss=0.3751 v_loss=0.3778 agr=0.00001 ortho=0.00007 spread=0.02176 b_cv=0.2053 e_cv=0.1571 x_var=0.00058 a_max=0.654 exp=0.906±0.023
86
+ E 4: 1s loss=0.3637 v_loss=0.3686 agr=0.00001 ortho=0.00009 spread=0.02109 b_cv=0.1990 e_cv=0.1611 x_var=0.00060 a_max=0.655 exp=0.890±0.032
87
+ E 5: 1s loss=0.3575 v_loss=0.3670 agr=0.00001 ortho=0.00010 spread=0.02014 b_cv=0.1970 e_cv=0.1608 x_var=0.00050 a_max=0.655 exp=0.873±0.033
88
+ E 6: 1s loss=0.3530 v_loss=0.3679 agr=0.00001 ortho=0.00011 spread=0.01906 b_cv=0.1993 e_cv=0.1539 x_var=0.00045 a_max=0.656 exp=0.848±0.045
89
+ E 7: 1s loss=0.3504 v_loss=0.3512 agr=0.00001 ortho=0.00011 spread=0.01847 b_cv=0.1941 e_cv=0.1582 x_var=0.00043 a_max=0.656 exp=0.835±0.048
90
+ E 8: 1s loss=0.3472 v_loss=0.3467 agr=0.00002 ortho=0.00011 spread=0.01799 b_cv=0.1894 e_cv=0.1562 x_var=0.00041 a_max=0.656 exp=0.832±0.047
91
+ E 9: 1s loss=0.3454 v_loss=0.3396 agr=0.00002 ortho=0.00012 spread=0.01804 b_cv=0.1860 e_cv=0.1597 x_var=0.00056 a_max=0.656 exp=0.817±0.051
92
+ E10: 1s loss=0.3434 v_loss=0.3447 agr=0.00001 ortho=0.00012 spread=0.01758 b_cv=0.1873 e_cv=0.1536 x_var=0.00050 a_max=0.656 exp=0.795±0.055
93
+ E11: 1s loss=0.3418 v_loss=0.3351 agr=0.00001 ortho=0.00012 spread=0.01724 b_cv=0.1855 e_cv=0.1547 x_var=0.00053 a_max=0.656 exp=0.823±0.055
94
+ E12: 1s loss=0.3403 v_loss=0.3520 agr=0.00001 ortho=0.00012 spread=0.01733 b_cv=0.1794 e_cv=0.1578 x_var=0.00045 a_max=0.656 exp=0.824±0.054
95
+ E13: 1s loss=0.3408 v_loss=0.3619 agr=0.00001 ortho=0.00013 spread=0.01707 b_cv=0.1841 e_cv=0.1535 x_var=0.00049 a_max=0.656 exp=0.786±0.057
96
+ E14: 1s loss=0.3404 v_loss=0.3545 agr=0.00001 ortho=0.00015 spread=0.01681 b_cv=0.1808 e_cv=0.1594 x_var=0.00060 a_max=0.656 exp=0.787±0.055
97
+ E15: 1s loss=0.3379 v_loss=0.3419 agr=0.00001 ortho=0.00012 spread=0.01671 b_cv=0.1757 e_cv=0.1575 x_var=0.00058 a_max=0.656 exp=0.828±0.050
98
+ E16: 1s loss=0.3374 v_loss=0.3486 agr=0.00001 ortho=0.00013 spread=0.01662 b_cv=0.1798 e_cv=0.1546 x_var=0.00061 a_max=0.656 exp=0.812±0.050
99
+ E17: 1s loss=0.3367 v_loss=0.3522 agr=0.00001 ortho=0.00012 spread=0.01684 b_cv=0.1745 e_cv=0.1557 x_var=0.00048 a_max=0.656 exp=0.835±0.047
100
+ E18: 1s loss=0.3368 v_loss=0.3625 agr=0.00001 ortho=0.00010 spread=0.01685 b_cv=0.1778 e_cv=0.1545 x_var=0.00041 a_max=0.657 exp=0.815±0.051
101
+ E19: 1s loss=0.3373 v_loss=0.3441 agr=0.00001 ortho=0.00012 spread=0.01689 b_cv=0.1756 e_cv=0.1602 x_var=0.00047 a_max=0.656 exp=0.793±0.057
102
+ E20: 1s loss=0.3365 v_loss=0.3377 agr=0.00001 ortho=0.00012 spread=0.01701 b_cv=0.1780 e_cv=0.1556 x_var=0.00051 a_max=0.656 exp=0.794±0.052
103
+
104
+ =================================================================
105
+ PHASE 3: GEOMETRIC VERIFICATION
106
+ =================================================================
107
+ Passthrough: 1.000000 (target: 1.000)
108
+ Emb CV: 0.1592 (consensus: 0.1316)
109
+ Geo context CV: 0.1790
110
+ Geo eff_dim: 33.7 / 128
111
+ Expert cos: 0.794 ± 0.052
112
+ Anchor max cos: 0.656
113
+ Cross-expert: 0.032
114
+
115
+ =================================================================
116
+ PHASE 4: CLASSIFIER STABILITY TEST
117
+ =================================================================
118
+ with_bank : train=0.782 val=0.505 gap=0.277
119
+ without_bank : train=0.512 val=0.372 gap=0.140
120
+
121
+ =================================================================
122
+ SUMMARY
123
+ =================================================================
124
+ Consensus CV: 0.1316
125
+ Consensus eff_dim:223.7
126
+ Student v_cos: 0.612
127
+ Student v_cv: 0.169
128
+ Bank params: 2,918,784
129
+ Bank geo_eff_dim: 33.7
130
+ Bank geo_cv: 0.1790
131
+
132
+ =================================================================
133
+ DONE
134
+ =================================================================