Update analysis_bert_large_clip-vit-b+bigG+dino2-l-16.txt
Browse files
analysis_bert_large_clip-vit-b+bigG+dino2-l-16.txt
CHANGED
|
@@ -0,0 +1,309 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Loading BERT-large...
|
| 2 |
+
config.json: 100%
|
| 3 |
+
571/571 [00:00<00:00, 70.6kB/s]
|
| 4 |
+
model.safetensors: 65%
|
| 5 |
+
871M/1.34G [00:04<00:13, 34.8MB/s]
|
| 6 |
+
Loading weights: 100%
|
| 7 |
+
391/391 [00:00<00:00, 1112.55it/s, Materializing param=pooler.dense.weight]
|
| 8 |
+
BertModel LOAD REPORT from: google-bert/bert-large-uncased
|
| 9 |
+
Key | Status | |
|
| 10 |
+
-------------------------------------------+------------+--+-
|
| 11 |
+
cls.predictions.transform.LayerNorm.bias | UNEXPECTED | |
|
| 12 |
+
cls.predictions.transform.dense.weight | UNEXPECTED | |
|
| 13 |
+
cls.predictions.transform.dense.bias | UNEXPECTED | |
|
| 14 |
+
cls.seq_relationship.bias | UNEXPECTED | |
|
| 15 |
+
cls.predictions.transform.LayerNorm.weight | UNEXPECTED | |
|
| 16 |
+
cls.predictions.bias | UNEXPECTED | |
|
| 17 |
+
cls.seq_relationship.weight | UNEXPECTED | |
|
| 18 |
+
|
| 19 |
+
Notes:
|
| 20 |
+
- UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch.
|
| 21 |
+
|
| 22 |
+
======================================================================
|
| 23 |
+
MODEL: BERT-large (1024d, 24L, 16H)
|
| 24 |
+
======================================================================
|
| 25 |
+
|
| 26 |
+
--- WEIGHT CATALOG ---
|
| 27 |
+
embedding : 3 matrices, 31,780,864 params, shapes={'(2, 1024)', '(512, 1024)', '(30522, 1024)'}
|
| 28 |
+
mlp_down : 24 matrices, 100,663,296 params, shapes={'(1024, 4096)'}
|
| 29 |
+
mlp_up : 24 matrices, 100,663,296 params, shapes={'(4096, 1024)'}
|
| 30 |
+
pooler : 1 matrices, 1,048,576 params, shapes={'(1024, 1024)'}
|
| 31 |
+
self_attn_k : 24 matrices, 25,165,824 params, shapes={'(1024, 1024)'}
|
| 32 |
+
self_attn_o : 24 matrices, 25,165,824 params, shapes={'(1024, 1024)'}
|
| 33 |
+
self_attn_q : 24 matrices, 25,165,824 params, shapes={'(1024, 1024)'}
|
| 34 |
+
self_attn_v : 24 matrices, 25,165,824 params, shapes={'(1024, 1024)'}
|
| 35 |
+
TOTAL : 334,819,328 params (2D only)
|
| 36 |
+
|
| 37 |
+
--- SVD EFFECTIVE RANK ---
|
| 38 |
+
Type StableRank PR Active% Rank90 Condition
|
| 39 |
+
mlp_down 52.17 882.95 1.000 838.0 23.0
|
| 40 |
+
mlp_up 27.37 856.14 1.000 832.8 33.8
|
| 41 |
+
self_attn_k 37.72 597.14 0.949 642.1 16649.8
|
| 42 |
+
self_attn_o 125.04 662.72 0.976 660.3 20582.9
|
| 43 |
+
self_attn_q 50.84 606.30 0.956 643.2 60065.9
|
| 44 |
+
self_attn_v 113.04 653.41 0.974 658.8 59710.1
|
| 45 |
+
|
| 46 |
+
--- SPARSITY TOPOLOGY ---
|
| 47 |
+
Type <0.0001 <0.001 <0.01 <0.1
|
| 48 |
+
embedding 0.0018 0.0184 0.1815 0.9699
|
| 49 |
+
mlp_down 0.0025 0.0251 0.2466 0.9954
|
| 50 |
+
mlp_up 0.0023 0.0231 0.2283 0.9944
|
| 51 |
+
pooler 0.0028 0.0280 0.2741 0.9981
|
| 52 |
+
self_attn_k 0.0022 0.0221 0.2178 0.9913
|
| 53 |
+
self_attn_o 0.0031 0.0308 0.2997 0.9990
|
| 54 |
+
self_attn_q 0.0022 0.0218 0.2149 0.9907
|
| 55 |
+
self_attn_v 0.0029 0.0294 0.2852 0.9989
|
| 56 |
+
FULL MODEL 0.0024 0.0242 0.2373 0.9925
|
| 57 |
+
|
| 58 |
+
--- Q/K/V SPARSITY COMPARISON (<0.1 threshold) ---
|
| 59 |
+
self_attn_q : 99.1%
|
| 60 |
+
self_attn_k : 99.1%
|
| 61 |
+
self_attn_v : 99.9%
|
| 62 |
+
|
| 63 |
+
--- QK SIMILARITY MANIFOLD ---
|
| 64 |
+
Layer StableRk PR Pos Neg SymDev TopEig
|
| 65 |
+
0 6.42 266.22 457 567 1.0866 20.61
|
| 66 |
+
1 3.60 194.32 454 570 1.0641 29.77
|
| 67 |
+
2 6.14 215.22 474 550 1.1773 23.79
|
| 68 |
+
3 6.11 162.90 468 556 1.1421 34.35
|
| 69 |
+
4 5.74 237.65 455 569 1.1145 30.60
|
| 70 |
+
5 6.30 255.58 460 564 1.1704 26.16
|
| 71 |
+
... (24 layers total)
|
| 72 |
+
23 4.07 206.01 525 499 0.8791 43.56
|
| 73 |
+
|
| 74 |
+
Positive eig fraction: layer 0 = 0.446, last = 0.513
|
| 75 |
+
|
| 76 |
+
--- MLP DEAD NEURONS ---
|
| 77 |
+
Dead (<1% mean): 0/98304 (0.00%)
|
| 78 |
+
Weak (<10% mean): 0/98304 (0.00%)
|
| 79 |
+
|
| 80 |
+
--- CROSS-LAYER CORRELATION (adjacent pairs) ---
|
| 81 |
+
self_attn_q : adj_mean=0.0002, adj_range=[-0.0035, 0.0036]
|
| 82 |
+
self_attn_k : adj_mean=0.0003, adj_range=[-0.0036, 0.0033]
|
| 83 |
+
mlp_up : adj_mean=0.0315, adj_range=[0.0239, 0.0494]
|
| 84 |
+
|
| 85 |
+
|
| 86 |
+
Loading CLIP-ViT-B/16 (LAION)...
|
| 87 |
+
open_clip_model.safetensors: 100%
|
| 88 |
+
599M/599M [00:03<00:00, 218MB/s]
|
| 89 |
+
|
| 90 |
+
======================================================================
|
| 91 |
+
MODEL: CLIP-ViT-B/16 LAION (768d, 12L, 12H)
|
| 92 |
+
======================================================================
|
| 93 |
+
|
| 94 |
+
--- WEIGHT CATALOG ---
|
| 95 |
+
embedding : 1 matrices, 151,296 params, shapes={'(197, 768)'}
|
| 96 |
+
mlp_down : 12 matrices, 28,311,552 params, shapes={'(768, 3072)'}
|
| 97 |
+
mlp_up : 12 matrices, 28,311,552 params, shapes={'(3072, 768)'}
|
| 98 |
+
projection : 1 matrices, 393,216 params, shapes={'(768, 512)'}
|
| 99 |
+
self_attn_o : 12 matrices, 7,077,888 params, shapes={'(768, 768)'}
|
| 100 |
+
self_attn_qkv : 12 matrices, 21,233,664 params, shapes={'(2304, 768)'}
|
| 101 |
+
TOTAL : 85,479,168 params (2D only)
|
| 102 |
+
|
| 103 |
+
--- SVD EFFECTIVE RANK ---
|
| 104 |
+
Type StableRank PR Active% Rank90 Condition
|
| 105 |
+
mlp_down 125.16 644.07 1.000 601.9 43.5
|
| 106 |
+
mlp_up 59.69 631.06 0.993 603.4 372.0
|
| 107 |
+
self_attn_o 77.37 515.53 0.967 491.8 37372.2
|
| 108 |
+
self_attn_qkv 94.39 552.43 0.929 546.8 18558.2
|
| 109 |
+
|
| 110 |
+
--- SPARSITY TOPOLOGY ---
|
| 111 |
+
Type <0.0001 <0.001 <0.01 <0.1
|
| 112 |
+
embedding 0.0202 0.2072 0.8578 0.9983
|
| 113 |
+
mlp_down 0.0145 0.0992 0.5794 0.9999
|
| 114 |
+
mlp_up 0.0101 0.0797 0.5233 0.9999
|
| 115 |
+
projection 0.0058 0.0573 0.5237 1.0000
|
| 116 |
+
self_attn_o 0.0066 0.0655 0.5525 0.9999
|
| 117 |
+
self_attn_qkv 0.0535 0.1189 0.5087 0.9999
|
| 118 |
+
FULL MODEL 0.0221 0.0949 0.5413 0.9999
|
| 119 |
+
|
| 120 |
+
--- Q/K/V SPARSITY COMPARISON (<0.1 threshold) ---
|
| 121 |
+
self_attn_qkv : 100.0%
|
| 122 |
+
|
| 123 |
+
--- QK SIMILARITY MANIFOLD ---
|
| 124 |
+
Layer StableRk PR Pos Neg SymDev TopEig
|
| 125 |
+
0 2.06 59.23 386 382 1.0944 5.51
|
| 126 |
+
1 3.51 82.73 447 321 0.8367 8.79
|
| 127 |
+
2 8.48 108.22 401 367 0.9786 4.70
|
| 128 |
+
3 22.88 193.84 406 362 1.0676 2.31
|
| 129 |
+
4 20.20 196.57 401 367 1.1014 2.38
|
| 130 |
+
5 26.05 249.44 384 384 1.1135 1.80
|
| 131 |
+
... (12 layers total)
|
| 132 |
+
11 49.71 360.27 413 355 1.3842 0.53
|
| 133 |
+
|
| 134 |
+
Positive eig fraction: layer 0 = 0.503, last = 0.538
|
| 135 |
+
|
| 136 |
+
--- MLP DEAD NEURONS ---
|
| 137 |
+
Dead (<1% mean): 1316/36864 (3.57%)
|
| 138 |
+
Weak (<10% mean): 1356/36864 (3.68%)
|
| 139 |
+
|
| 140 |
+
--- CROSS-LAYER CORRELATION (adjacent pairs) ---
|
| 141 |
+
self_attn_qkv : adj_mean=-0.0004, adj_range=[-0.0024, 0.0013]
|
| 142 |
+
mlp_up : adj_mean=0.0075, adj_range=[0.0000, 0.0304]
|
| 143 |
+
|
| 144 |
+
|
| 145 |
+
Loading DINOv2-large...
|
| 146 |
+
config.json: 100%
|
| 147 |
+
549/549 [00:00<00:00, 69.5kB/s]
|
| 148 |
+
model.safetensors: 94%
|
| 149 |
+
1.15G/1.22G [00:06<00:03, 20.6MB/s]
|
| 150 |
+
Loading weights: 100%
|
| 151 |
+
439/439 [00:00<00:00, 1139.02it/s, Materializing param=layernorm.weight]
|
| 152 |
+
|
| 153 |
+
======================================================================
|
| 154 |
+
MODEL: DINOv2-large (1024d, 24L, 16H)
|
| 155 |
+
======================================================================
|
| 156 |
+
|
| 157 |
+
--- WEIGHT CATALOG ---
|
| 158 |
+
embedding : 1 matrices, 1,024 params, shapes={'(1, 1024)'}
|
| 159 |
+
mlp_down : 24 matrices, 100,663,296 params, shapes={'(1024, 4096)'}
|
| 160 |
+
mlp_up : 24 matrices, 100,663,296 params, shapes={'(4096, 1024)'}
|
| 161 |
+
self_attn_k : 24 matrices, 25,165,824 params, shapes={'(1024, 1024)'}
|
| 162 |
+
self_attn_o : 24 matrices, 25,165,824 params, shapes={'(1024, 1024)'}
|
| 163 |
+
self_attn_q : 24 matrices, 25,165,824 params, shapes={'(1024, 1024)'}
|
| 164 |
+
self_attn_v : 24 matrices, 25,165,824 params, shapes={'(1024, 1024)'}
|
| 165 |
+
TOTAL : 301,990,912 params (2D only)
|
| 166 |
+
|
| 167 |
+
--- SVD EFFECTIVE RANK ---
|
| 168 |
+
Type StableRank PR Active% Rank90 Condition
|
| 169 |
+
mlp_down 94.40 810.58 1.000 805.1 39.8
|
| 170 |
+
mlp_up 58.43 764.26 0.979 769.8 50.2
|
| 171 |
+
self_attn_k 55.47 485.95 0.827 533.2 1024763.2
|
| 172 |
+
self_attn_o 85.58 642.50 0.955 636.4 83125.7
|
| 173 |
+
self_attn_q 57.74 477.74 0.826 536.0 630324.9
|
| 174 |
+
self_attn_v 94.84 590.99 0.932 610.2 490421.1
|
| 175 |
+
|
| 176 |
+
--- SPARSITY TOPOLOGY ---
|
| 177 |
+
Type <0.0001 <0.001 <0.01 <0.1
|
| 178 |
+
embedding 1.0000 1.0000 1.0000 1.0000
|
| 179 |
+
mlp_down 0.0072 0.0714 0.6036 0.9999
|
| 180 |
+
mlp_up 0.0078 0.0687 0.5577 0.9999
|
| 181 |
+
self_attn_k 0.0081 0.0774 0.5406 0.9998
|
| 182 |
+
self_attn_o 0.0069 0.0687 0.5753 1.0000
|
| 183 |
+
self_attn_q 0.0088 0.0793 0.5452 0.9997
|
| 184 |
+
self_attn_v 0.0088 0.0861 0.5810 1.0000
|
| 185 |
+
FULL MODEL 0.0077 0.0727 0.5740 0.9999
|
| 186 |
+
|
| 187 |
+
--- Q/K/V SPARSITY COMPARISON (<0.1 threshold) ---
|
| 188 |
+
self_attn_q : 100.0%
|
| 189 |
+
self_attn_k : 100.0%
|
| 190 |
+
self_attn_v : 100.0%
|
| 191 |
+
|
| 192 |
+
--- QK SIMILARITY MANIFOLD ---
|
| 193 |
+
Layer StableRk PR Pos Neg SymDev TopEig
|
| 194 |
+
0 1.23 5.71 510 514 1.3859 12.89
|
| 195 |
+
1 5.40 35.56 515 509 1.0933 3.52
|
| 196 |
+
2 4.28 74.13 531 493 1.0389 4.57
|
| 197 |
+
3 4.49 80.31 559 465 1.0370 6.89
|
| 198 |
+
4 7.19 121.15 524 500 1.0951 4.28
|
| 199 |
+
5 7.72 117.31 551 473 0.9584 5.87
|
| 200 |
+
... (24 layers total)
|
| 201 |
+
23 6.71 341.20 561 463 1.1911 2.44
|
| 202 |
+
|
| 203 |
+
Positive eig fraction: layer 0 = 0.498, last = 0.548
|
| 204 |
+
|
| 205 |
+
--- MLP DEAD NEURONS ---
|
| 206 |
+
Dead (<1% mean): 0/98304 (0.00%)
|
| 207 |
+
Weak (<10% mean): 0/98304 (0.00%)
|
| 208 |
+
|
| 209 |
+
--- CROSS-LAYER CORRELATION (adjacent pairs) ---
|
| 210 |
+
self_attn_q : adj_mean=-0.0003, adj_range=[-0.0027, 0.0035]
|
| 211 |
+
self_attn_k : adj_mean=-0.0002, adj_range=[-0.0026, 0.0030]
|
| 212 |
+
mlp_up : adj_mean=0.0058, adj_range=[0.0006, 0.0217]
|
| 213 |
+
|
| 214 |
+
|
| 215 |
+
Loading CLIP-ViT-bigG/14 (LAION)...
|
| 216 |
+
open_clip_model.safetensors: 100%
|
| 217 |
+
10.2G/10.2G [00:29<00:00, 377MB/s]
|
| 218 |
+
|
| 219 |
+
======================================================================
|
| 220 |
+
MODEL: CLIP-ViT-bigG/14 LAION (1664d, 48L, 16H)
|
| 221 |
+
======================================================================
|
| 222 |
+
|
| 223 |
+
--- WEIGHT CATALOG ---
|
| 224 |
+
embedding : 1 matrices, 427,648 params, shapes={'(257, 1664)'}
|
| 225 |
+
mlp_down : 48 matrices, 654,311,424 params, shapes={'(1664, 8192)'}
|
| 226 |
+
mlp_up : 48 matrices, 654,311,424 params, shapes={'(8192, 1664)'}
|
| 227 |
+
projection : 1 matrices, 2,129,920 params, shapes={'(1664, 1280)'}
|
| 228 |
+
self_attn_o : 48 matrices, 132,907,008 params, shapes={'(1664, 1664)'}
|
| 229 |
+
self_attn_qkv : 48 matrices, 398,721,024 params, shapes={'(4992, 1664)'}
|
| 230 |
+
TOTAL : 1,842,808,448 params (2D only)
|
| 231 |
+
|
| 232 |
+
--- SVD EFFECTIVE RANK ---
|
| 233 |
+
Type StableRank PR Active% Rank90 Condition
|
| 234 |
+
mlp_down 58.27 757.89 0.644 855.5 5983209984.0
|
| 235 |
+
mlp_up 23.11 992.74 0.804 1045.1 6682717.5
|
| 236 |
+
self_attn_o 48.31 547.82 0.531 593.5 5320487424.0
|
| 237 |
+
self_attn_qkv 102.36 834.12 0.757 890.4 1150494.6
|
| 238 |
+
|
| 239 |
+
--- SPARSITY TOPOLOGY ---
|
| 240 |
+
Type <0.0001 <0.001 <0.01 <0.1
|
| 241 |
+
embedding 0.0255 0.2521 0.9654 0.9991
|
| 242 |
+
mlp_down 0.3578 0.4691 0.6310 0.9473
|
| 243 |
+
mlp_up 0.1763 0.3691 0.7113 1.0000
|
| 244 |
+
projection 0.0047 0.0469 0.4397 1.0000
|
| 245 |
+
self_attn_o 0.3510 0.4770 0.6900 0.9838
|
| 246 |
+
self_attn_qkv 0.1685 0.2917 0.7124 0.9999
|
| 247 |
+
FULL MODEL 0.2514 0.3952 0.6812 0.9801
|
| 248 |
+
|
| 249 |
+
--- Q/K/V SPARSITY COMPARISON (<0.1 threshold) ---
|
| 250 |
+
self_attn_qkv : 100.0%
|
| 251 |
+
|
| 252 |
+
--- QK SIMILARITY MANIFOLD ---
|
| 253 |
+
Layer StableRk PR Pos Neg SymDev TopEig
|
| 254 |
+
0 1.18 9.24 829 835 1.0608 13.79
|
| 255 |
+
1 2.50 32.71 834 830 1.0916 3.81
|
| 256 |
+
2 1.63 11.28 831 833 0.8739 2.24
|
| 257 |
+
3 2.06 13.32 832 832 1.2697 2.45
|
| 258 |
+
4 1.96 23.28 836 828 1.1835 6.06
|
| 259 |
+
5 3.96 41.52 839 825 1.0728 4.42
|
| 260 |
+
... (48 layers total)
|
| 261 |
+
47 32.79 637.78 968 696 1.2396 1.92
|
| 262 |
+
|
| 263 |
+
Positive eig fraction: layer 0 = 0.498, last = 0.582
|
| 264 |
+
|
| 265 |
+
--- MLP DEAD NEURONS ---
|
| 266 |
+
Dead (<1% mean): 0/393216 (0.00%)
|
| 267 |
+
Weak (<10% mean): 24163/393216 (6.14%)
|
| 268 |
+
|
| 269 |
+
--- CROSS-LAYER CORRELATION (adjacent pairs) ---
|
| 270 |
+
self_attn_qkv : adj_mean=0.0000, adj_range=[-0.0029, 0.0017]
|
| 271 |
+
mlp_up : adj_mean=0.0552, adj_range=[-0.0053, 0.2689]
|
| 272 |
+
|
| 273 |
+
|
| 274 |
+
======================================================================
|
| 275 |
+
CROSS-MODEL COMPARISON
|
| 276 |
+
======================================================================
|
| 277 |
+
|
| 278 |
+
--- Q SPARSITY (<0.1 threshold) ---
|
| 279 |
+
Model Q K V QKV
|
| 280 |
+
BERT-large (1024d, 24L, 16H) 99.1% 99.1% 99.9% -
|
| 281 |
+
CLIP-ViT-B/16 LAION (768d, 12L, 12H) - - - 100.0%
|
| 282 |
+
DINOv2-large (1024d, 24L, 16H) 100.0% 100.0% 100.0% -
|
| 283 |
+
CLIP-ViT-bigG/14 LAION (1664d, 48L, 16H) - - - 100.0%
|
| 284 |
+
T5-Small (512d, 6L, 8H) [reference] 93.7% 19.2% 12.1% -
|
| 285 |
+
T5-Base (768d, 12L, 12H) [reference] 99.4% 30.0% 16.2% -
|
| 286 |
+
|
| 287 |
+
--- SVD STABLE RANK (mean across layers) ---
|
| 288 |
+
Model Q K V MLP_up
|
| 289 |
+
BERT-large (1024d, 24L, 16H) 50.8 37.7 113.0 27.4
|
| 290 |
+
CLIP-ViT-B/16 LAION (768d, 12L, 12H) - - - 59.7
|
| 291 |
+
DINOv2-large (1024d, 24L, 16H) 57.7 55.5 94.8 58.4
|
| 292 |
+
CLIP-ViT-bigG/14 LAION (1664d, 48L, 16H) - - - 23.1
|
| 293 |
+
|
| 294 |
+
--- QK MANIFOLD: POSITIVE EIGENVALUE FRACTION ---
|
| 295 |
+
Model First Last Trend
|
| 296 |
+
BERT-large (1024d, 24L, 16H) 0.446 0.513 +0.066
|
| 297 |
+
CLIP-ViT-B/16 LAION (768d, 12L, 12H) 0.503 0.538 +0.035
|
| 298 |
+
DINOv2-large (1024d, 24L, 16H) 0.498 0.548 +0.050
|
| 299 |
+
CLIP-ViT-bigG/14 LAION (1664d, 48L, 16H) 0.498 0.582 +0.084
|
| 300 |
+
|
| 301 |
+
--- MLP DEAD NEURONS (<1% of mean) ---
|
| 302 |
+
BERT-large (1024d, 24L, 16H) : 0/98304 (0.00%)
|
| 303 |
+
CLIP-ViT-B/16 LAION (768d, 12L, 12H) : 1316/36864 (3.57%)
|
| 304 |
+
DINOv2-large (1024d, 24L, 16H) : 0/98304 (0.00%)
|
| 305 |
+
CLIP-ViT-bigG/14 LAION (1664d, 48L, 16H) : 0/393216 (0.00%)
|
| 306 |
+
|
| 307 |
+
======================================================================
|
| 308 |
+
BATTERY COMPLETE
|
| 309 |
+
======================================================================
|