AbstractPhil commited on
Commit
8693cef
·
verified ·
1 Parent(s): b180bad

Update rapid_prototype_output_bigger_bank.txt

Browse files
rapid_prototype_output_bigger_bank.txt CHANGED
@@ -10,7 +10,7 @@ PHASE 0: EXTRACTION
10
 
11
  Extracting: bert...
12
  Loading weights: 100%
13
-  199/199 [00:00<00:00, 4157.84it/s, Materializing param=pooler.dense.weight]
14
  BertModel LOAD REPORT from: google-bert/bert-base-uncased
15
  Key | Status | |
16
  -------------------------------------------+------------+--+-
@@ -29,7 +29,7 @@ Notes:
29
 
30
  Extracting: modern...
31
  Loading weights: 100%
32
-  134/134 [00:00<00:00, 3957.73it/s, Materializing param=layers.21.mlp_norm.weight]
33
  ModernBertModel LOAD REPORT from: answerdotai/ModernBERT-base
34
  Key | Status | |
35
  ------------------+------------+--+-
@@ -71,42 +71,42 @@ PHASE 2: TRAIN ALIGNMENT BANK (student frozen)
71
  Student embeddings: torch.Size([18000, 768])
72
  Expert 0 (bert): rotation loaded, cos_after=1.0000
73
  Expert 1 (modern): rotation loaded, cos_after=0.4849
74
- Anchors: 128 initialized from consensus embeddings
75
- Bank: 1,305,152 params
76
- E 1: 1s loss=0.3517 v_loss=0.3123 expert_agr=0.00029 ortho=0.00018 spread=0.02575 cv=0.2469 anchor_max=0.560 expert_cos=0.657±0.061
77
- E 2: 1s loss=0.2758 v_loss=0.2718 expert_agr=0.00017 ortho=0.00027 spread=0.02053 cv=0.2049 anchor_max=0.587 expert_cos=0.634±0.064
78
- E 3: 1s loss=0.2505 v_loss=0.2533 expert_agr=0.00016 ortho=0.00023 spread=0.01885 cv=0.1988 anchor_max=0.590 expert_cos=0.619±0.067
79
- E 4: 1s loss=0.2391 v_loss=0.2338 expert_agr=0.00017 ortho=0.00022 spread=0.01824 cv=0.1965 anchor_max=0.593 expert_cos=0.627±0.069
80
- E 5: 1s loss=0.2327 v_loss=0.2376 expert_agr=0.00018 ortho=0.00023 spread=0.01752 cv=0.1933 anchor_max=0.595 expert_cos=0.613±0.070
81
- E 6: 1s loss=0.2274 v_loss=0.2334 expert_agr=0.00017 ortho=0.00022 spread=0.01680 cv=0.1882 anchor_max=0.597 expert_cos=0.610±0.070
82
- E 7: 1s loss=0.2249 v_loss=0.2297 expert_agr=0.00018 ortho=0.00021 spread=0.01603 cv=0.1880 anchor_max=0.597 expert_cos=0.614±0.074
83
- E 8: 1s loss=0.2221 v_loss=0.2236 expert_agr=0.00017 ortho=0.00021 spread=0.01550 cv=0.1857 anchor_max=0.598 expert_cos=0.601±0.074
84
- E 9: 1s loss=0.2206 v_loss=0.2351 expert_agr=0.00019 ortho=0.00021 spread=0.01500 cv=0.1850 anchor_max=0.599 expert_cos=0.588±0.077
85
- E10: 1s loss=0.2200 v_loss=0.2142 expert_agr=0.00024 ortho=0.00022 spread=0.01470 cv=0.1831 anchor_max=0.599 expert_cos=0.605±0.075
86
- E11: 1s loss=0.2181 v_loss=0.2254 expert_agr=0.00028 ortho=0.00021 spread=0.01445 cv=0.1782 anchor_max=0.599 expert_cos=0.629±0.072
87
- E12: 1s loss=0.2179 v_loss=0.2212 expert_agr=0.00021 ortho=0.00023 spread=0.01419 cv=0.1812 anchor_max=0.599 expert_cos=0.634±0.070
88
- E13: 1s loss=0.2170 v_loss=0.2259 expert_agr=0.00022 ortho=0.00023 spread=0.01387 cv=0.1792 anchor_max=0.599 expert_cos=0.614±0.072
89
- E14: 1s loss=0.2152 v_loss=0.2106 expert_agr=0.00016 ortho=0.00020 spread=0.01400 cv=0.1750 anchor_max=0.599 expert_cos=0.647±0.069
90
- E15: 1s loss=0.2151 v_loss=0.2301 expert_agr=0.00021 ortho=0.00021 spread=0.01368 cv=0.1766 anchor_max=0.599 expert_cos=0.639±0.073
91
- E16: 1s loss=0.2149 v_loss=0.2134 expert_agr=0.00022 ortho=0.00023 spread=0.01338 cv=0.1754 anchor_max=0.599 expert_cos=0.631±0.070
92
- E17: 1s loss=0.2151 v_loss=0.2110 expert_agr=0.00019 ortho=0.00022 spread=0.01341 cv=0.1778 anchor_max=0.599 expert_cos=0.642±0.073
93
- E18: 1s loss=0.2146 v_loss=0.2114 expert_agr=0.00023 ortho=0.00022 spread=0.01306 cv=0.1734 anchor_max=0.599 expert_cos=0.630±0.069
94
- E19: 1s loss=0.2147 v_loss=0.2127 expert_agr=0.00020 ortho=0.00023 spread=0.01300 cv=0.1768 anchor_max=0.599 expert_cos=0.610±0.072
95
- E20: 1s loss=0.2151 v_loss=0.2211 expert_agr=0.00019 ortho=0.00020 spread=0.01300 cv=0.1779 anchor_max=0.599 expert_cos=0.626±0.072
96
 
97
  =================================================================
98
  PHASE 3: GEOMETRIC VERIFICATION
99
  =================================================================
100
  Passthrough integrity: 1.000000 (should be ~1.000)
101
- Geo context CV: 0.1691
102
- Geo context eff_dim: 21.9
103
  Geo context shape: torch.Size([2000, 64])
104
 
105
  =================================================================
106
  PHASE 4: CLASSIFIER STABILITY TEST
107
  =================================================================
108
- with_bank : train_acc=0.499 val_acc=0.390 gap=0.109
109
- without_bank : train_acc=0.442 val_acc=0.372 gap=0.070
110
 
111
  =================================================================
112
  DONE
 
10
 
11
  Extracting: bert...
12
  Loading weights: 100%
13
+  199/199 [00:00<00:00, 4263.57it/s, Materializing param=pooler.dense.weight]
14
  BertModel LOAD REPORT from: google-bert/bert-base-uncased
15
  Key | Status | |
16
  -------------------------------------------+------------+--+-
 
29
 
30
  Extracting: modern...
31
  Loading weights: 100%
32
+  134/134 [00:00<00:00, 4171.61it/s, Materializing param=layers.21.mlp_norm.weight]
33
  ModernBertModel LOAD REPORT from: answerdotai/ModernBERT-base
34
  Key | Status | |
35
  ------------------+------------+--+-
 
71
  Student embeddings: torch.Size([18000, 768])
72
  Expert 0 (bert): rotation loaded, cos_after=1.0000
73
  Expert 1 (modern): rotation loaded, cos_after=0.4849
74
+ Anchors: 512 initialized from consensus embeddings
75
+ Bank: 1,649,216 params
76
+ E 1: 1s loss=0.4828 v_loss=0.4384 expert_agr=0.00011 ortho=0.00007 spread=0.02783 cv=0.2529 anchor_max=0.625 expert_cos=0.827±0.045
77
+ E 2: 1s loss=0.3974 v_loss=0.3863 expert_agr=0.00007 ortho=0.00009 spread=0.02224 cv=0.2140 anchor_max=0.653 expert_cos=0.832±0.052
78
+ E 3: 1s loss=0.3647 v_loss=0.3696 expert_agr=0.00006 ortho=0.00008 spread=0.02128 cv=0.2076 anchor_max=0.656 expert_cos=0.788±0.056
79
+ E 4: 1s loss=0.3518 v_loss=0.3461 expert_agr=0.00005 ortho=0.00010 spread=0.02102 cv=0.2051 anchor_max=0.656 expert_cos=0.800±0.053
80
+ E 5: 1s loss=0.3428 v_loss=0.3584 expert_agr=0.00005 ortho=0.00011 spread=0.02022 cv=0.1954 anchor_max=0.657 expert_cos=0.787±0.055
81
+ E 6: 1s loss=0.3386 v_loss=0.3424 expert_agr=0.00005 ortho=0.00014 spread=0.01978 cv=0.1941 anchor_max=0.656 expert_cos=0.755±0.061
82
+ E 7: 1s loss=0.3365 v_loss=0.3332 expert_agr=0.00004 ortho=0.00013 spread=0.01886 cv=0.1947 anchor_max=0.656 expert_cos=0.763±0.060
83
+ E 8: 1s loss=0.3337 v_loss=0.3413 expert_agr=0.00004 ortho=0.00013 spread=0.01852 cv=0.1900 anchor_max=0.656 expert_cos=0.792±0.053
84
+ E 9: 1s loss=0.3332 v_loss=0.3489 expert_agr=0.00004 ortho=0.00014 spread=0.01789 cv=0.1942 anchor_max=0.655 expert_cos=0.735±0.060
85
+ E10: 1s loss=0.3314 v_loss=0.3395 expert_agr=0.00004 ortho=0.00013 spread=0.01814 cv=0.1903 anchor_max=0.655 expert_cos=0.721±0.065
86
+ E11: 1s loss=0.3286 v_loss=0.3470 expert_agr=0.00004 ortho=0.00013 spread=0.01801 cv=0.1832 anchor_max=0.655 expert_cos=0.775±0.060
87
+ E12: 1s loss=0.3285 v_loss=0.3399 expert_agr=0.00004 ortho=0.00016 spread=0.01787 cv=0.1861 anchor_max=0.654 expert_cos=0.761±0.058
88
+ E13: 1s loss=0.3275 v_loss=0.3392 expert_agr=0.00004 ortho=0.00015 spread=0.01772 cv=0.1839 anchor_max=0.655 expert_cos=0.729±0.065
89
+ E14: 1s loss=0.3267 v_loss=0.3351 expert_agr=0.00004 ortho=0.00013 spread=0.01753 cv=0.1809 anchor_max=0.654 expert_cos=0.735±0.065
90
+ E15: 1s loss=0.3260 v_loss=0.3497 expert_agr=0.00004 ortho=0.00014 spread=0.01737 cv=0.1809 anchor_max=0.654 expert_cos=0.772±0.061
91
+ E16: 1s loss=0.3266 v_loss=0.3403 expert_agr=0.00003 ortho=0.00015 spread=0.01730 cv=0.1845 anchor_max=0.654 expert_cos=0.715±0.070
92
+ E17: 1s loss=0.3245 v_loss=0.3252 expert_agr=0.00003 ortho=0.00014 spread=0.01732 cv=0.1788 anchor_max=0.654 expert_cos=0.744±0.067
93
+ E18: 1s loss=0.3237 v_loss=0.3254 expert_agr=0.00003 ortho=0.00014 spread=0.01707 cv=0.1782 anchor_max=0.654 expert_cos=0.721±0.064
94
+ E19: 1s loss=0.3231 v_loss=0.3327 expert_agr=0.00003 ortho=0.00015 spread=0.01721 cv=0.1784 anchor_max=0.653 expert_cos=0.690±0.072
95
+ E20: 1s loss=0.3237 v_loss=0.3267 expert_agr=0.00003 ortho=0.00013 spread=0.01693 cv=0.1800 anchor_max=0.653 expert_cos=0.723±0.071
96
 
97
  =================================================================
98
  PHASE 3: GEOMETRIC VERIFICATION
99
  =================================================================
100
  Passthrough integrity: 1.000000 (should be ~1.000)
101
+ Geo context CV: 0.1651
102
+ Geo context eff_dim: 21.5
103
  Geo context shape: torch.Size([2000, 64])
104
 
105
  =================================================================
106
  PHASE 4: CLASSIFIER STABILITY TEST
107
  =================================================================
108
+ with_bank : train_acc=0.481 val_acc=0.390 gap=0.091
109
+ without_bank : train_acc=0.443 val_acc=0.330 gap=0.113
110
 
111
  =================================================================
112
  DONE