Rename advanced_geometric_analysis_output.txt to analysis/advanced_geometric_analysis_output.txt
01b93a0 verified | ================================================================= | |
| BASE TIER DEEP MODEL ANALYSIS | |
| ================================================================= | |
| Device: cuda | |
| ================================================================= | |
| LOADING MODELS | |
| ================================================================= | |
| Loading CLIP ViT-L/14... | |
| Loading weights: 100% | |
| 391/391 [00:00<00:00, 1169.09it/s, Materializing param=vision_model.pre_layrnorm.weight] | |
| CLIPVisionModel LOAD REPORT from: openai/clip-vit-large-patch14 | |
| Key | Status | | | |
| -------------------------------------------------------------+------------+--+- | |
| text_model.encoder.layers.{0...11}.self_attn.k_proj.bias | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.mlp.fc1.bias | UNEXPECTED | | | |
| text_projection.weight | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.self_attn.out_proj.bias | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.self_attn.out_proj.weight | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.layer_norm1.bias | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.self_attn.k_proj.weight | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.self_attn.q_proj.bias | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.mlp.fc1.weight | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.layer_norm2.weight | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.mlp.fc2.bias | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.layer_norm1.weight | UNEXPECTED | | | |
| text_model.final_layer_norm.weight | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.layer_norm2.bias | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.mlp.fc2.weight | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.self_attn.v_proj.bias | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.self_attn.v_proj.weight | UNEXPECTED | | | |
| logit_scale | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.self_attn.q_proj.weight | UNEXPECTED | | | |
| text_model.embeddings.position_embedding.weight | UNEXPECTED | | | |
| text_model.final_layer_norm.bias | UNEXPECTED | | | |
| vision_model.embeddings.position_ids | UNEXPECTED | | | |
| text_model.embeddings.position_ids | UNEXPECTED | | | |
| visual_projection.weight | UNEXPECTED | | | |
| text_model.embeddings.token_embedding.weight | UNEXPECTED | | | |
| Notes: | |
| - UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch. | |
| Loaded: 303,179,776 params | |
| Loading DINOv2 ViT-B/14... | |
| Loading weights: 100% | |
| 223/223 [00:00<00:00, 1085.46it/s, Materializing param=layernorm.weight] | |
| Loaded: 86,580,480 params | |
| Loading SigLIP ViT-B/16-384... | |
| Loading weights: 100% | |
| 208/208 [00:00<00:00, 983.51it/s, Materializing param=vision_model.post_layernorm.weight] | |
| SiglipVisionModel LOAD REPORT from: google/siglip-base-patch16-384 | |
| Key | Status | | | |
| -------------------------------------------------------------+------------+--+- | |
| text_model.encoder.layers.{0...11}.self_attn.k_proj.bias | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.mlp.fc1.bias | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.self_attn.out_proj.bias | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.self_attn.out_proj.weight | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.layer_norm1.bias | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.self_attn.k_proj.weight | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.self_attn.q_proj.bias | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.mlp.fc1.weight | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.layer_norm2.weight | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.mlp.fc2.bias | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.layer_norm1.weight | UNEXPECTED | | | |
| text_model.final_layer_norm.weight | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.layer_norm2.bias | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.mlp.fc2.weight | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.self_attn.v_proj.bias | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.self_attn.v_proj.weight | UNEXPECTED | | | |
| logit_scale | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.self_attn.q_proj.weight | UNEXPECTED | | | |
| text_model.embeddings.position_embedding.weight | UNEXPECTED | | | |
| text_model.final_layer_norm.bias | UNEXPECTED | | | |
| text_model.head.bias | UNEXPECTED | | | |
| text_model.head.weight | UNEXPECTED | | | |
| text_model.embeddings.token_embedding.weight | UNEXPECTED | | | |
| logit_bias | UNEXPECTED | | | |
| Notes: | |
| - UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch. | |
| Loaded: 93,176,064 params | |
| ================================================================= | |
| SCAN 1: ARCHITECTURE COMPARISON | |
| ================================================================= | |
| clip_l14: | |
| hidden_size : 1,024 | |
| intermediate_size : 4,096 | |
| num_layers : 24 | |
| num_heads : 16 | |
| patch_size : 14 | |
| image_size : 224 | |
| total_params : 303,179,776 | |
| head_dim : 64 | |
| dinov2_b14: | |
| hidden_size : 768 | |
| num_layers : 12 | |
| num_heads : 12 | |
| patch_size : 14 | |
| image_size : 518 | |
| total_params : 86,580,480 | |
| head_dim : 64 | |
| siglip_b16: | |
| hidden_size : 768 | |
| intermediate_size : 3,072 | |
| num_layers : 12 | |
| num_heads : 12 | |
| patch_size : 16 | |
| image_size : 384 | |
| total_params : 93,176,064 | |
| head_dim : 64 | |
| ================================================================= | |
| SCAN 2: PARAMETER INVENTORY | |
| ================================================================= | |
| clip_l14: | |
| embeddings : 866,304 ( 3 tensors) | |
| embeddings.class_embedding: [1024] | |
| patch_embedding.weight: [1024, 3, 14, 14] | |
| position_embedding.weight: [257, 1024] | |
| encoder_other : 100,761,600 (192 tensors) | |
| k_proj.weight: [1024, 1024] | |
| k_proj.bias: [1024] | |
| v_proj.weight: [1024, 1024] | |
| final_norm : 2,048 ( 2 tensors) | |
| post_layernorm.weight: [1024] | |
| post_layernorm.bias: [1024] | |
| layernorm : 98,304 (96 tensors) | |
| layer_norm1.weight: [1024] | |
| layer_norm1.bias: [1024] | |
| layer_norm2.weight: [1024] | |
| mlp : 201,449,472 (96 tensors) | |
| fc1.weight: [4096, 1024] | |
| fc1.bias: [4096] | |
| fc2.weight: [1024, 4096] | |
| other : 2,048 ( 2 tensors) | |
| pre_layrnorm.weight: [1024] | |
| pre_layrnorm.bias: [1024] | |
| dinov2_b14: | |
| attn_other : 14,174,208 (48 tensors) | |
| key.weight: [768, 768] | |
| key.bias: [768] | |
| value.weight: [768, 768] | |
| attn_out : 7,087,104 (24 tensors) | |
| dense.weight: [768, 768] | |
| dense.bias: [768] | |
| dense.weight: [768, 768] | |
| attn_qkv : 7,087,104 (24 tensors) | |
| query.weight: [768, 768] | |
| query.bias: [768] | |
| query.weight: [768, 768] | |
| embeddings : 1,506,048 ( 5 tensors) | |
| embeddings.cls_token: [1, 1, 768] | |
| embeddings.mask_token: [1, 768] | |
| embeddings.position_embeddings: [1, 1370, 768] | |
| encoder_other : 18,432 (24 tensors) | |
| layer_scale1.lambda1: [768] | |
| layer_scale2.lambda1: [768] | |
| layer_scale1.lambda1: [768] | |
| final_norm : 1,536 ( 2 tensors) | |
| layernorm.weight: [768] | |
| layernorm.bias: [768] | |
| layernorm : 36,864 (48 tensors) | |
| norm1.weight: [768] | |
| norm1.bias: [768] | |
| norm2.weight: [768] | |
| mlp : 56,669,184 (48 tensors) | |
| fc1.weight: [3072, 768] | |
| fc1.bias: [3072] | |
| fc2.weight: [768, 3072] | |
| siglip_b16: | |
| embeddings : 1,032,960 ( 3 tensors) | |
| patch_embedding.weight: [768, 3, 16, 16] | |
| patch_embedding.bias: [768] | |
| position_embedding.weight: [576, 768] | |
| encoder_other : 28,348,416 (96 tensors) | |
| k_proj.weight: [768, 768] | |
| k_proj.bias: [768] | |
| v_proj.weight: [768, 768] | |
| final_norm : 3,072 ( 4 tensors) | |
| post_layernorm.weight: [768] | |
| post_layernorm.bias: [768] | |
| layernorm.weight: [768] | |
| head : 7,085,568 ( 9 tensors) | |
| head.probe: [1, 1, 768] | |
| attention.in_proj_weight: [2304, 768] | |
| attention.in_proj_bias: [2304] | |
| layernorm : 36,864 (48 tensors) | |
| layer_norm1.weight: [768] | |
| layer_norm1.bias: [768] | |
| layer_norm2.weight: [768] | |
| mlp : 56,669,184 (48 tensors) | |
| fc1.weight: [3072, 768] | |
| fc1.bias: [3072] | |
| fc2.weight: [768, 3072] | |
| ================================================================= | |
| SCAN 3: WEIGHT STATISTICS | |
| ================================================================= | |
| clip_l14 — key weight matrices: | |
| param shape norm std sv_max eff_rank | |
| --------------------------------------------------------------------------------------------------------- | |
| vision_model.embeddings.patch_embedding.weight [1024, 3, 14, 14] 13.0285 0.01679 N/A N/A | |
| vision_model.embeddings.position_embedding.weight [257, 1024] 11.4993 0.01826 10.6920 22.3 | |
| ion_model.encoder.layers.0.self_attn.k_proj.weight [1024, 1024] 10.3266 0.01008 4.1652 148.5 | |
| ion_model.encoder.layers.0.self_attn.v_proj.weight [1024, 1024] 10.1910 0.00995 2.0620 428.9 | |
| ion_model.encoder.layers.0.self_attn.q_proj.weight [1024, 1024] 10.4647 0.01022 4.4222 129.4 | |
| n_model.encoder.layers.0.self_attn.out_proj.weight [1024, 1024] 12.6910 0.01239 1.4843 557.1 | |
| vision_model.encoder.layers.0.mlp.fc1.weight [4096, 1024] 22.4378 0.01096 11.0751 281.7 | |
| vision_model.encoder.layers.0.mlp.fc2.weight [1024, 4096] 17.9603 0.00877 5.2350 455.7 | |
| ion_model.encoder.layers.1.self_attn.k_proj.weight [1024, 1024] 13.8288 0.01350 3.8866 179.2 | |
| ion_model.encoder.layers.1.self_attn.v_proj.weight [1024, 1024] 12.8884 0.01259 1.5682 562.6 | |
| ion_model.encoder.layers.1.self_attn.q_proj.weight [1024, 1024] 13.2903 0.01298 3.9366 173.5 | |
| n_model.encoder.layers.1.self_attn.out_proj.weight [1024, 1024] 12.8612 0.01256 1.4611 597.0 | |
| vision_model.encoder.layers.1.mlp.fc1.weight [4096, 1024] 27.3329 0.01335 9.2804 505.0 | |
| vision_model.encoder.layers.1.mlp.fc2.weight [1024, 4096] 21.3601 0.01043 3.7564 687.7 | |
| ion_model.encoder.layers.2.self_attn.k_proj.weight [1024, 1024] 15.0059 0.01465 3.4512 223.1 | |
| ion_model.encoder.layers.2.self_attn.v_proj.weight [1024, 1024] 13.2355 0.01293 1.4198 639.6 | |
| ion_model.encoder.layers.2.self_attn.q_proj.weight [1024, 1024] 14.7787 0.01443 3.2682 226.4 | |
| n_model.encoder.layers.2.self_attn.out_proj.weight [1024, 1024] 12.4446 0.01215 1.2628 669.7 | |
| vision_model.encoder.layers.2.mlp.fc1.weight [4096, 1024] 26.4932 0.01293 9.4832 565.7 | |
| vision_model.encoder.layers.2.mlp.fc2.weight [1024, 4096] 21.3475 0.01043 3.8495 707.4 | |
| ion_model.encoder.layers.3.self_attn.k_proj.weight [1024, 1024] 15.6926 0.01532 3.1154 270.4 | |
| ion_model.encoder.layers.3.self_attn.v_proj.weight [1024, 1024] 14.4002 0.01406 1.4850 639.5 | |
| ion_model.encoder.layers.3.self_attn.q_proj.weight [1024, 1024] 15.6837 0.01532 3.0972 281.4 | |
| n_model.encoder.layers.3.self_attn.out_proj.weight [1024, 1024] 12.9667 0.01266 1.3714 665.4 | |
| vision_model.encoder.layers.3.mlp.fc1.weight [4096, 1024] 28.0314 0.01369 7.8429 635.9 | |
| vision_model.encoder.layers.3.mlp.fc2.weight [1024, 4096] 22.7624 0.01112 3.4051 768.6 | |
| ion_model.encoder.layers.4.self_attn.k_proj.weight [1024, 1024] 15.8693 0.01550 2.8455 288.1 | |
| ion_model.encoder.layers.4.self_attn.v_proj.weight [1024, 1024] 13.7542 0.01343 1.1932 674.9 | |
| ion_model.encoder.layers.4.self_attn.q_proj.weight [1024, 1024] 15.8018 0.01543 2.8635 295.8 | |
| n_model.encoder.layers.4.self_attn.out_proj.weight [1024, 1024] 12.9162 0.01261 1.2183 694.6 | |
| vision_model.encoder.layers.4.mlp.fc1.weight [4096, 1024] 29.7806 0.01454 6.4180 705.5 | |
| vision_model.encoder.layers.4.mlp.fc2.weight [1024, 4096] 24.5424 0.01198 3.2353 774.9 | |
| ion_model.encoder.layers.5.self_attn.k_proj.weight [1024, 1024] 15.2627 0.01491 2.2813 369.0 | |
| ion_model.encoder.layers.5.self_attn.v_proj.weight [1024, 1024] 14.1761 0.01384 1.6230 629.3 | |
| ion_model.encoder.layers.5.self_attn.q_proj.weight [1024, 1024] 15.2910 0.01493 2.2908 383.0 | |
| n_model.encoder.layers.5.self_attn.out_proj.weight [1024, 1024] 13.4318 0.01312 1.4986 673.0 | |
| vision_model.encoder.layers.5.mlp.fc1.weight [4096, 1024] 30.9118 0.01510 4.5842 771.7 | |
| vision_model.encoder.layers.5.mlp.fc2.weight [1024, 4096] 25.5090 0.01246 2.8728 803.2 | |
| ion_model.encoder.layers.6.self_attn.k_proj.weight [1024, 1024] 16.2386 0.01586 2.3373 434.2 | |
| ion_model.encoder.layers.6.self_attn.v_proj.weight [1024, 1024] 15.0351 0.01468 1.5573 616.6 | |
| ion_model.encoder.layers.6.self_attn.q_proj.weight [1024, 1024] 16.7687 0.01638 2.4588 443.6 | |
| n_model.encoder.layers.6.self_attn.out_proj.weight [1024, 1024] 14.1036 0.01377 1.3362 659.2 | |
| vision_model.encoder.layers.6.mlp.fc1.weight [4096, 1024] 30.5849 0.01494 4.0565 804.8 | |
| vision_model.encoder.layers.6.mlp.fc2.weight [1024, 4096] 25.7987 0.01260 2.7815 821.5 | |
| ion_model.encoder.layers.7.self_attn.k_proj.weight [1024, 1024] 16.1936 0.01581 1.8904 512.6 | |
| ion_model.encoder.layers.7.self_attn.v_proj.weight [1024, 1024] 14.7217 0.01438 1.3952 636.8 | |
| ion_model.encoder.layers.7.self_attn.q_proj.weight [1024, 1024] 16.2836 0.01590 2.2271 507.9 | |
| n_model.encoder.layers.7.self_attn.out_proj.weight [1024, 1024] 14.2476 0.01391 1.3998 648.3 | |
| vision_model.encoder.layers.7.mlp.fc1.weight [4096, 1024] 30.9270 0.01510 4.1601 832.6 | |
| vision_model.encoder.layers.7.mlp.fc2.weight [1024, 4096] 26.5679 0.01297 2.1988 854.7 | |
| ion_model.encoder.layers.8.self_attn.k_proj.weight [1024, 1024] 16.4682 0.01608 1.8729 542.4 | |
| ion_model.encoder.layers.8.self_attn.v_proj.weight [1024, 1024] 14.8673 0.01452 1.3464 646.9 | |
| ion_model.encoder.layers.8.self_attn.q_proj.weight [1024, 1024] 16.7130 0.01632 1.9112 526.1 | |
| n_model.encoder.layers.8.self_attn.out_proj.weight [1024, 1024] 14.3406 0.01400 1.1506 672.7 | |
| vision_model.encoder.layers.8.mlp.fc1.weight [4096, 1024] 31.1442 0.01521 4.1953 857.3 | |
| vision_model.encoder.layers.8.mlp.fc2.weight [1024, 4096] 27.3500 0.01336 2.1856 882.1 | |
| ion_model.encoder.layers.9.self_attn.k_proj.weight [1024, 1024] 16.2309 0.01585 1.8300 564.9 | |
| ion_model.encoder.layers.9.self_attn.v_proj.weight [1024, 1024] 15.1476 0.01479 1.4036 643.1 | |
| ion_model.encoder.layers.9.self_attn.q_proj.weight [1024, 1024] 16.5288 0.01614 1.9110 561.6 | |
| n_model.encoder.layers.9.self_attn.out_proj.weight [1024, 1024] 14.5335 0.01419 1.1607 688.9 | |
| vision_model.encoder.layers.9.mlp.fc1.weight [4096, 1024] 31.3509 0.01531 4.6304 857.9 | |
| vision_model.encoder.layers.9.mlp.fc2.weight [1024, 4096] 27.2579 0.01331 2.2316 888.1 | |
| on_model.encoder.layers.10.self_attn.k_proj.weight [1024, 1024] 16.7808 0.01639 1.8287 597.6 | |
| on_model.encoder.layers.10.self_attn.v_proj.weight [1024, 1024] 14.7162 0.01437 1.3223 666.2 | |
| on_model.encoder.layers.10.self_attn.q_proj.weight [1024, 1024] 17.0940 0.01669 2.0012 585.2 | |
| _model.encoder.layers.10.self_attn.out_proj.weight [1024, 1024] 14.2325 0.01390 1.1821 698.7 | |
| vision_model.encoder.layers.10.mlp.fc1.weight [4096, 1024] 31.7419 0.01550 5.2608 865.0 | |
| vision_model.encoder.layers.10.mlp.fc2.weight [1024, 4096] 28.0947 0.01372 2.1578 896.3 | |
| on_model.encoder.layers.11.self_attn.k_proj.weight [1024, 1024] 16.9787 0.01658 1.8848 603.9 | |
| on_model.encoder.layers.11.self_attn.v_proj.weight [1024, 1024] 14.6692 0.01433 1.2199 676.8 | |
| on_model.encoder.layers.11.self_attn.q_proj.weight [1024, 1024] 17.2452 0.01684 2.0200 591.2 | |
| _model.encoder.layers.11.self_attn.out_proj.weight [1024, 1024] 14.1674 0.01384 1.0924 702.1 | |
| vision_model.encoder.layers.11.mlp.fc1.weight [4096, 1024] 31.8994 0.01558 5.6641 868.9 | |
| vision_model.encoder.layers.11.mlp.fc2.weight [1024, 4096] 28.3883 0.01386 2.1991 900.8 | |
| on_model.encoder.layers.12.self_attn.k_proj.weight [1024, 1024] 17.4435 0.01703 1.8913 627.2 | |
| on_model.encoder.layers.12.self_attn.v_proj.weight [1024, 1024] 14.2397 0.01391 1.2073 697.7 | |
| on_model.encoder.layers.12.self_attn.q_proj.weight [1024, 1024] 17.4122 0.01700 1.8893 615.5 | |
| _model.encoder.layers.12.self_attn.out_proj.weight [1024, 1024] 13.7667 0.01344 1.1256 717.4 | |
| vision_model.encoder.layers.12.mlp.fc1.weight [4096, 1024] 32.2181 0.01573 5.9889 861.6 | |
| vision_model.encoder.layers.12.mlp.fc2.weight [1024, 4096] 28.2699 0.01381 2.1308 909.6 | |
| on_model.encoder.layers.13.self_attn.k_proj.weight [1024, 1024] 17.0104 0.01661 2.0448 626.3 | |
| on_model.encoder.layers.13.self_attn.v_proj.weight [1024, 1024] 14.7396 0.01439 1.2518 678.9 | |
| on_model.encoder.layers.13.self_attn.q_proj.weight [1024, 1024] 17.0819 0.01668 1.9264 615.2 | |
| _model.encoder.layers.13.self_attn.out_proj.weight [1024, 1024] 14.1694 0.01384 1.1199 714.2 | |
| vision_model.encoder.layers.13.mlp.fc1.weight [4096, 1024] 31.6421 0.01545 6.0148 879.0 | |
| vision_model.encoder.layers.13.mlp.fc2.weight [1024, 4096] 29.1049 0.01421 2.0250 922.2 | |
| on_model.encoder.layers.14.self_attn.k_proj.weight [1024, 1024] 17.5037 0.01709 2.0815 633.5 | |
| on_model.encoder.layers.14.self_attn.v_proj.weight [1024, 1024] 14.3347 0.01400 1.1217 688.9 | |
| on_model.encoder.layers.14.self_attn.q_proj.weight [1024, 1024] 17.6811 0.01727 2.0083 609.9 | |
| _model.encoder.layers.14.self_attn.out_proj.weight [1024, 1024] 13.9998 0.01367 1.2365 724.1 | |
| vision_model.encoder.layers.14.mlp.fc1.weight [4096, 1024] 31.5815 0.01542 6.2445 881.5 | |
| vision_model.encoder.layers.14.mlp.fc2.weight [1024, 4096] 30.2788 0.01479 2.1365 926.9 | |
| on_model.encoder.layers.15.self_attn.k_proj.weight [1024, 1024] 17.6716 0.01726 1.9023 657.7 | |
| on_model.encoder.layers.15.self_attn.v_proj.weight [1024, 1024] 14.2458 0.01391 1.0955 709.5 | |
| on_model.encoder.layers.15.self_attn.q_proj.weight [1024, 1024] 17.6763 0.01726 1.8757 639.0 | |
| _model.encoder.layers.15.self_attn.out_proj.weight [1024, 1024] 14.0235 0.01369 1.1093 722.6 | |
| vision_model.encoder.layers.15.mlp.fc1.weight [4096, 1024] 31.4302 0.01535 6.1131 884.8 | |
| vision_model.encoder.layers.15.mlp.fc2.weight [1024, 4096] 30.4680 0.01488 2.0513 931.6 | |
| on_model.encoder.layers.16.self_attn.k_proj.weight [1024, 1024] 17.3406 0.01693 1.8225 670.2 | |
| on_model.encoder.layers.16.self_attn.v_proj.weight [1024, 1024] 14.7329 0.01439 1.1737 708.3 | |
| on_model.encoder.layers.16.self_attn.q_proj.weight [1024, 1024] 17.3623 0.01696 1.8747 649.2 | |
| _model.encoder.layers.16.self_attn.out_proj.weight [1024, 1024] 14.5208 0.01418 1.1226 731.1 | |
| vision_model.encoder.layers.16.mlp.fc1.weight [4096, 1024] 30.9220 0.01510 5.8892 892.7 | |
| vision_model.encoder.layers.16.mlp.fc2.weight [1024, 4096] 32.0740 0.01566 1.9386 938.4 | |
| on_model.encoder.layers.17.self_attn.k_proj.weight [1024, 1024] 17.5623 0.01715 1.7163 681.4 | |
| on_model.encoder.layers.17.self_attn.v_proj.weight [1024, 1024] 14.7063 0.01436 1.1468 716.8 | |
| on_model.encoder.layers.17.self_attn.q_proj.weight [1024, 1024] 17.5951 0.01718 1.7824 656.7 | |
| _model.encoder.layers.17.self_attn.out_proj.weight [1024, 1024] 14.5419 0.01420 1.1008 723.9 | |
| vision_model.encoder.layers.17.mlp.fc1.weight [4096, 1024] 30.7107 0.01500 5.6539 898.6 | |
| vision_model.encoder.layers.17.mlp.fc2.weight [1024, 4096] 32.6687 0.01595 1.9743 936.2 | |
| on_model.encoder.layers.18.self_attn.k_proj.weight [1024, 1024] 17.2542 0.01685 1.6047 694.5 | |
| on_model.encoder.layers.18.self_attn.v_proj.weight [1024, 1024] 15.1093 0.01476 1.1787 722.1 | |
| on_model.encoder.layers.18.self_attn.q_proj.weight [1024, 1024] 17.4201 0.01701 2.0498 665.3 | |
| _model.encoder.layers.18.self_attn.out_proj.weight [1024, 1024] 14.9225 0.01457 1.1230 724.2 | |
| vision_model.encoder.layers.18.mlp.fc1.weight [4096, 1024] 30.7589 0.01502 5.5648 903.0 | |
| vision_model.encoder.layers.18.mlp.fc2.weight [1024, 4096] 32.9469 0.01609 2.0394 930.9 | |
| on_model.encoder.layers.19.self_attn.k_proj.weight [1024, 1024] 17.3751 0.01697 1.6815 694.4 | |
| on_model.encoder.layers.19.self_attn.v_proj.weight [1024, 1024] 15.4226 0.01506 1.2927 720.0 | |
| on_model.encoder.layers.19.self_attn.q_proj.weight [1024, 1024] 17.4778 0.01707 1.8012 668.5 | |
| _model.encoder.layers.19.self_attn.out_proj.weight [1024, 1024] 15.2020 0.01485 1.1774 716.2 | |
| vision_model.encoder.layers.19.mlp.fc1.weight [4096, 1024] 31.0316 0.01515 5.3302 907.1 | |
| vision_model.encoder.layers.19.mlp.fc2.weight [1024, 4096] 33.5712 0.01639 2.2840 924.0 | |
| on_model.encoder.layers.20.self_attn.k_proj.weight [1024, 1024] 16.9316 0.01653 1.6881 696.4 | |
| on_model.encoder.layers.20.self_attn.v_proj.weight [1024, 1024] 15.8929 0.01552 1.4046 717.9 | |
| on_model.encoder.layers.20.self_attn.q_proj.weight [1024, 1024] 17.0880 0.01669 1.8350 668.7 | |
| _model.encoder.layers.20.self_attn.out_proj.weight [1024, 1024] 15.7753 0.01541 1.3378 703.6 | |
| vision_model.encoder.layers.20.mlp.fc1.weight [4096, 1024] 31.6061 0.01543 5.5200 909.0 | |
| vision_model.encoder.layers.20.mlp.fc2.weight [1024, 4096] 36.0967 0.01763 2.9445 910.8 | |
| on_model.encoder.layers.21.self_attn.k_proj.weight [1024, 1024] 16.5182 0.01613 1.7138 697.5 | |
| on_model.encoder.layers.21.self_attn.v_proj.weight [1024, 1024] 16.3243 0.01594 1.6063 722.9 | |
| on_model.encoder.layers.21.self_attn.q_proj.weight [1024, 1024] 17.6180 0.01721 4.2360 632.9 | |
| _model.encoder.layers.21.self_attn.out_proj.weight [1024, 1024] 16.1730 0.01579 1.4016 700.0 | |
| vision_model.encoder.layers.21.mlp.fc1.weight [4096, 1024] 31.9009 0.01558 5.4005 913.5 | |
| vision_model.encoder.layers.21.mlp.fc2.weight [1024, 4096] 37.0141 0.01808 3.0204 903.5 | |
| on_model.encoder.layers.22.self_attn.k_proj.weight [1024, 1024] 15.4876 0.01512 7.1209 551.2 | |
| on_model.encoder.layers.22.self_attn.v_proj.weight [1024, 1024] 15.7824 0.01541 1.2318 717.9 | |
| on_model.encoder.layers.22.self_attn.q_proj.weight [1024, 1024] 33.3101 0.03253 17.2954 256.4 | |
| _model.encoder.layers.22.self_attn.out_proj.weight [1024, 1024] 15.4298 0.01507 1.4900 709.2 | |
| vision_model.encoder.layers.22.mlp.fc1.weight [4096, 1024] 31.3137 0.01529 5.0053 916.6 | |
| vision_model.encoder.layers.22.mlp.fc2.weight [1024, 4096] 35.9646 0.01756 2.6698 891.2 | |
| on_model.encoder.layers.23.self_attn.k_proj.weight [1024, 1024] 14.7596 0.01441 1.4858 677.5 | |
| on_model.encoder.layers.23.self_attn.v_proj.weight [1024, 1024] 18.9487 0.01850 1.2860 701.6 | |
| on_model.encoder.layers.23.self_attn.q_proj.weight [1024, 1024] 14.9754 0.01462 1.5721 669.1 | |
| _model.encoder.layers.23.self_attn.out_proj.weight [1024, 1024] 18.2947 0.01787 1.3815 680.1 | |
| vision_model.encoder.layers.23.mlp.fc1.weight [4096, 1024] 31.6551 0.01546 4.1330 893.4 | |
| vision_model.encoder.layers.23.mlp.fc2.weight [1024, 4096] 30.3341 0.01481 3.8653 633.1 | |
| dinov2_b14 — key weight matrices: | |
| param shape norm std sv_max eff_rank | |
| --------------------------------------------------------------------------------------------------------- | |
| embeddings.position_embeddings [1, 1370, 768] 9.6808 0.00944 N/A N/A | |
| embeddings.patch_embeddings.projection.weight [768, 3, 14, 14] 5.0943 0.00758 N/A N/A | |
| encoder.layer.0.attention.attention.query.weight [768, 768] 15.0653 0.01962 8.2803 124.1 | |
| encoder.layer.0.attention.attention.key.weight [768, 768] 14.5449 0.01894 7.0240 146.6 | |
| encoder.layer.0.attention.attention.value.weight [768, 768] 10.8999 0.01419 2.6902 312.2 | |
| encoder.layer.0.attention.output.dense.weight [768, 768] 9.6377 0.01255 1.4304 442.7 | |
| encoder.layer.0.mlp.fc1.weight [3072, 768] 26.1807 0.01704 6.9235 362.1 | |
| encoder.layer.0.mlp.fc2.weight [768, 3072] 22.1737 0.01444 2.6781 528.8 | |
| encoder.layer.1.attention.attention.query.weight [768, 768] 15.5110 0.02020 3.0665 294.9 | |
| encoder.layer.1.attention.attention.key.weight [768, 768] 15.7740 0.02054 3.6581 290.2 | |
| encoder.layer.1.attention.attention.value.weight [768, 768] 12.0716 0.01572 1.4580 485.9 | |
| encoder.layer.1.attention.output.dense.weight [768, 768] 11.2017 0.01459 1.3664 491.9 | |
| encoder.layer.1.mlp.fc1.weight [3072, 768] 23.5162 0.01531 3.9140 569.8 | |
| encoder.layer.1.mlp.fc2.weight [768, 3072] 20.7634 0.01352 2.1780 621.9 | |
| encoder.layer.2.attention.attention.query.weight [768, 768] 13.6969 0.01783 2.0466 409.0 | |
| encoder.layer.2.attention.attention.key.weight [768, 768] 13.8587 0.01805 2.1433 409.6 | |
| encoder.layer.2.attention.attention.value.weight [768, 768] 11.3287 0.01475 1.1252 608.8 | |
| encoder.layer.2.attention.output.dense.weight [768, 768] 10.8751 0.01416 1.1264 614.7 | |
| encoder.layer.2.mlp.fc1.weight [3072, 768] 22.1368 0.01441 2.6436 620.0 | |
| encoder.layer.2.mlp.fc2.weight [768, 3072] 19.6161 0.01277 1.8037 617.4 | |
| encoder.layer.3.attention.attention.query.weight [768, 768] 17.3785 0.02263 10.1944 274.2 | |
| encoder.layer.3.attention.attention.key.weight [768, 768] 14.0313 0.01827 2.6146 362.7 | |
| encoder.layer.3.attention.attention.value.weight [768, 768] 11.7833 0.01534 1.5936 436.7 | |
| encoder.layer.3.attention.output.dense.weight [768, 768] 10.9542 0.01426 1.1789 484.3 | |
| encoder.layer.3.mlp.fc1.weight [3072, 768] 22.3524 0.01455 2.4559 632.9 | |
| encoder.layer.3.mlp.fc2.weight [768, 3072] 18.9526 0.01234 1.7526 620.3 | |
| encoder.layer.4.attention.attention.query.weight [768, 768] 13.6447 0.01777 1.4802 440.8 | |
| encoder.layer.4.attention.attention.key.weight [768, 768] 13.5677 0.01767 1.8610 427.6 | |
| encoder.layer.4.attention.attention.value.weight [768, 768] 11.5948 0.01510 1.3135 471.1 | |
| encoder.layer.4.attention.output.dense.weight [768, 768] 10.8005 0.01406 0.9923 515.5 | |
| encoder.layer.4.mlp.fc1.weight [3072, 768] 22.5944 0.01471 2.1506 646.5 | |
| encoder.layer.4.mlp.fc2.weight [768, 3072] 18.8099 0.01225 1.7677 625.3 | |
| encoder.layer.5.attention.attention.query.weight [768, 768] 13.7255 0.01787 1.6372 413.2 | |
| encoder.layer.5.attention.attention.key.weight [768, 768] 13.3573 0.01739 1.6523 441.7 | |
| encoder.layer.5.attention.attention.value.weight [768, 768] 11.7572 0.01531 1.2932 484.0 | |
| encoder.layer.5.attention.output.dense.weight [768, 768] 10.9291 0.01423 1.0229 510.2 | |
| encoder.layer.5.mlp.fc1.weight [3072, 768] 23.4727 0.01528 2.3335 654.6 | |
| encoder.layer.5.mlp.fc2.weight [768, 3072] 18.7835 0.01223 1.9061 642.5 | |
| encoder.layer.6.attention.attention.query.weight [768, 768] 14.0862 0.01834 1.5297 461.8 | |
| encoder.layer.6.attention.attention.key.weight [768, 768] 13.8269 0.01800 1.9102 465.2 | |
| encoder.layer.6.attention.attention.value.weight [768, 768] 11.5155 0.01499 1.3197 476.7 | |
| encoder.layer.6.attention.output.dense.weight [768, 768] 10.8719 0.01416 1.1720 512.2 | |
| encoder.layer.6.mlp.fc1.weight [3072, 768] 23.9825 0.01561 2.5247 661.2 | |
| encoder.layer.6.mlp.fc2.weight [768, 3072] 18.9719 0.01235 1.7798 648.4 | |
| encoder.layer.7.attention.attention.query.weight [768, 768] 14.4370 0.01880 1.6765 436.5 | |
| encoder.layer.7.attention.attention.key.weight [768, 768] 13.9417 0.01815 2.3330 451.6 | |
| encoder.layer.7.attention.attention.value.weight [768, 768] 11.7075 0.01524 1.3177 466.2 | |
| encoder.layer.7.attention.output.dense.weight [768, 768] 11.0254 0.01436 1.2727 504.1 | |
| encoder.layer.7.mlp.fc1.weight [3072, 768] 23.5657 0.01534 2.2714 656.7 | |
| encoder.layer.7.mlp.fc2.weight [768, 3072] 18.6998 0.01217 1.7242 656.8 | |
| encoder.layer.8.attention.attention.query.weight [768, 768] 14.4875 0.01886 1.6457 457.0 | |
| encoder.layer.8.attention.attention.key.weight [768, 768] 14.0343 0.01827 1.9319 464.9 | |
| encoder.layer.8.attention.attention.value.weight [768, 768] 11.7632 0.01532 1.2228 483.8 | |
| encoder.layer.8.attention.output.dense.weight [768, 768] 11.1382 0.01450 1.7787 515.8 | |
| encoder.layer.8.mlp.fc1.weight [3072, 768] 24.9183 0.01622 6.6240 625.9 | |
| encoder.layer.8.mlp.fc2.weight [768, 3072] 19.3520 0.01260 1.6640 675.1 | |
| encoder.layer.9.attention.attention.query.weight [768, 768] 14.1629 0.01844 1.7359 464.0 | |
| encoder.layer.9.attention.attention.key.weight [768, 768] 13.9257 0.01813 1.9286 475.4 | |
| encoder.layer.9.attention.attention.value.weight [768, 768] 12.0954 0.01575 1.2698 494.0 | |
| encoder.layer.9.attention.output.dense.weight [768, 768] 11.6187 0.01513 1.4691 523.1 | |
| encoder.layer.9.mlp.fc1.weight [3072, 768] 24.2356 0.01578 2.8893 679.9 | |
| encoder.layer.9.mlp.fc2.weight [768, 3072] 20.2806 0.01320 1.7777 687.3 | |
| encoder.layer.10.attention.attention.query.weight [768, 768] 14.1126 0.01838 1.7836 478.3 | |
| encoder.layer.10.attention.attention.key.weight [768, 768] 13.7915 0.01796 1.8475 493.9 | |
| encoder.layer.10.attention.attention.value.weight [768, 768] 12.5603 0.01635 1.2524 510.0 | |
| encoder.layer.10.attention.output.dense.weight [768, 768] 12.1455 0.01581 1.5136 547.3 | |
| encoder.layer.10.mlp.fc1.weight [3072, 768] 24.6123 0.01602 4.2760 689.5 | |
| encoder.layer.10.mlp.fc2.weight [768, 3072] 21.8431 0.01422 2.4292 672.4 | |
| encoder.layer.11.attention.attention.query.weight [768, 768] 13.8379 0.01802 1.8647 457.7 | |
| encoder.layer.11.attention.attention.key.weight [768, 768] 13.7512 0.01791 2.4709 482.9 | |
| encoder.layer.11.attention.attention.value.weight [768, 768] 13.2831 0.01730 1.5197 562.9 | |
| encoder.layer.11.attention.output.dense.weight [768, 768] 13.0485 0.01699 2.6470 605.2 | |
| encoder.layer.11.mlp.fc1.weight [3072, 768] 24.5670 0.01599 3.6963 678.9 | |
| encoder.layer.11.mlp.fc2.weight [768, 3072] 22.6176 0.01473 2.3999 711.8 | |
| siglip_b16 — key weight matrices: | |
| param shape norm std sv_max eff_rank | |
| --------------------------------------------------------------------------------------------------------- | |
| vision_model.embeddings.patch_embedding.weight [768, 3, 16, 16] 13.1067 0.01707 N/A N/A | |
| vision_model.embeddings.position_embedding.weight [576, 768] 93.0842 0.13996 67.7360 20.9 | |
| ion_model.encoder.layers.0.self_attn.k_proj.weight [768, 768] 25.3506 0.03301 6.7521 216.8 | |
| ion_model.encoder.layers.0.self_attn.v_proj.weight [768, 768] 11.1804 0.01456 1.4769 475.2 | |
| ion_model.encoder.layers.0.self_attn.q_proj.weight [768, 768] 25.4386 0.03312 9.0594 216.9 | |
| n_model.encoder.layers.0.self_attn.out_proj.weight [768, 768] 12.0512 0.01569 3.9438 442.9 | |
| vision_model.encoder.layers.0.mlp.fc1.weight [3072, 768] 32.2698 0.02101 6.9756 495.1 | |
| vision_model.encoder.layers.0.mlp.fc2.weight [768, 3072] 30.5055 0.01986 4.9364 513.4 | |
| ion_model.encoder.layers.1.self_attn.k_proj.weight [768, 768] 21.2316 0.02765 4.2540 266.5 | |
| ion_model.encoder.layers.1.self_attn.v_proj.weight [768, 768] 14.2354 0.01854 1.8996 469.1 | |
| ion_model.encoder.layers.1.self_attn.q_proj.weight [768, 768] 21.5655 0.02808 3.8437 281.0 | |
| n_model.encoder.layers.1.self_attn.out_proj.weight [768, 768] 13.4008 0.01745 1.8366 490.3 | |
| vision_model.encoder.layers.1.mlp.fc1.weight [3072, 768] 33.1464 0.02158 3.4009 600.7 | |
| vision_model.encoder.layers.1.mlp.fc2.weight [768, 3072] 27.6425 0.01800 2.5865 625.9 | |
| ion_model.encoder.layers.2.self_attn.k_proj.weight [768, 768] 19.0127 0.02476 2.1765 406.8 | |
| ion_model.encoder.layers.2.self_attn.v_proj.weight [768, 768] 14.1224 0.01839 1.5604 507.2 | |
| ion_model.encoder.layers.2.self_attn.q_proj.weight [768, 768] 19.0933 0.02486 2.1423 408.5 | |
| n_model.encoder.layers.2.self_attn.out_proj.weight [768, 768] 13.2740 0.01728 1.5355 501.0 | |
| vision_model.encoder.layers.2.mlp.fc1.weight [3072, 768] 34.2425 0.02229 3.3269 659.9 | |
| vision_model.encoder.layers.2.mlp.fc2.weight [768, 3072] 27.3498 0.01781 2.2268 657.0 | |
| ion_model.encoder.layers.3.self_attn.k_proj.weight [768, 768] 16.7138 0.02176 2.1664 414.9 | |
| ion_model.encoder.layers.3.self_attn.v_proj.weight [768, 768] 15.6367 0.02036 1.6104 490.8 | |
| ion_model.encoder.layers.3.self_attn.q_proj.weight [768, 768] 17.9006 0.02331 2.0858 420.8 | |
| n_model.encoder.layers.3.self_attn.out_proj.weight [768, 768] 14.5315 0.01892 1.3975 522.0 | |
| vision_model.encoder.layers.3.mlp.fc1.weight [3072, 768] 34.7201 0.02260 4.1337 662.5 | |
| vision_model.encoder.layers.3.mlp.fc2.weight [768, 3072] 27.9364 0.01819 2.6334 671.7 | |
| ion_model.encoder.layers.4.self_attn.k_proj.weight [768, 768] 16.6270 0.02165 1.8333 469.6 | |
| ion_model.encoder.layers.4.self_attn.v_proj.weight [768, 768] 15.5543 0.02025 1.6095 487.9 | |
| ion_model.encoder.layers.4.self_attn.q_proj.weight [768, 768] 17.3636 0.02261 1.9774 446.8 | |
| n_model.encoder.layers.4.self_attn.out_proj.weight [768, 768] 14.7915 0.01926 1.5039 516.9 | |
| vision_model.encoder.layers.4.mlp.fc1.weight [3072, 768] 34.3811 0.02229 5.8771 661.6 | |
| vision_model.encoder.layers.4.mlp.fc2.weight [768, 3072] 29.3207 0.01909 2.5450 680.0 | |
| ion_model.encoder.layers.5.self_attn.k_proj.weight [768, 768] 16.8992 0.02200 2.0586 469.6 | |
| ion_model.encoder.layers.5.self_attn.v_proj.weight [768, 768] 15.2796 0.01990 1.5355 505.4 | |
| ion_model.encoder.layers.5.self_attn.q_proj.weight [768, 768] 17.6242 0.02295 1.9778 463.6 | |
| n_model.encoder.layers.5.self_attn.out_proj.weight [768, 768] 14.5603 0.01896 1.3099 538.5 | |
| vision_model.encoder.layers.5.mlp.fc1.weight [3072, 768] 33.7324 0.02196 4.8606 667.3 | |
| vision_model.encoder.layers.5.mlp.fc2.weight [768, 3072] 30.0129 0.01954 3.0083 689.7 | |
| ion_model.encoder.layers.6.self_attn.k_proj.weight [768, 768] 17.1745 0.02236 1.9155 473.6 | |
| ion_model.encoder.layers.6.self_attn.v_proj.weight [768, 768] 15.3063 0.01993 1.4017 510.6 | |
| ion_model.encoder.layers.6.self_attn.q_proj.weight [768, 768] 17.6908 0.02304 2.1205 457.5 | |
| n_model.encoder.layers.6.self_attn.out_proj.weight [768, 768] 14.6406 0.01906 1.2968 541.3 | |
| vision_model.encoder.layers.6.mlp.fc1.weight [3072, 768] 32.7128 0.02129 5.2047 674.9 | |
| vision_model.encoder.layers.6.mlp.fc2.weight [768, 3072] 32.1632 0.02094 2.6609 695.0 | |
| ion_model.encoder.layers.7.self_attn.k_proj.weight [768, 768] 17.4925 0.02278 1.6792 501.7 | |
| ion_model.encoder.layers.7.self_attn.v_proj.weight [768, 768] 15.3744 0.02002 1.2941 525.3 | |
| ion_model.encoder.layers.7.self_attn.q_proj.weight [768, 768] 17.5108 0.02280 1.8597 492.8 | |
| n_model.encoder.layers.7.self_attn.out_proj.weight [768, 768] 14.7239 0.01917 1.2469 553.5 | |
| vision_model.encoder.layers.7.mlp.fc1.weight [3072, 768] 32.4363 0.02112 5.1176 676.7 | |
| vision_model.encoder.layers.7.mlp.fc2.weight [768, 3072] 33.3117 0.02169 2.5620 703.2 | |
| ion_model.encoder.layers.8.self_attn.k_proj.weight [768, 768] 17.2571 0.02247 1.5921 514.9 | |
| ion_model.encoder.layers.8.self_attn.v_proj.weight [768, 768] 15.3517 0.01999 1.2797 528.9 | |
| ion_model.encoder.layers.8.self_attn.q_proj.weight [768, 768] 17.2066 0.02240 1.8249 500.0 | |
| n_model.encoder.layers.8.self_attn.out_proj.weight [768, 768] 14.9553 0.01947 1.1161 558.9 | |
| vision_model.encoder.layers.8.mlp.fc1.weight [3072, 768] 32.0244 0.02085 5.3609 677.1 | |
| vision_model.encoder.layers.8.mlp.fc2.weight [768, 3072] 35.6540 0.02321 2.7284 707.4 | |
| ion_model.encoder.layers.9.self_attn.k_proj.weight [768, 768] 16.6423 0.02167 1.4625 526.7 | |
| ion_model.encoder.layers.9.self_attn.v_proj.weight [768, 768] 15.9838 0.02081 1.3547 539.6 | |
| ion_model.encoder.layers.9.self_attn.q_proj.weight [768, 768] 16.5648 0.02157 1.6295 509.9 | |
| n_model.encoder.layers.9.self_attn.out_proj.weight [768, 768] 15.7026 0.02045 1.4963 555.7 | |
| vision_model.encoder.layers.9.mlp.fc1.weight [3072, 768] 33.1471 0.02157 7.5875 672.6 | |
| vision_model.encoder.layers.9.mlp.fc2.weight [768, 3072] 35.0755 0.02284 2.9263 703.1 | |
| on_model.encoder.layers.10.self_attn.k_proj.weight [768, 768] 15.8790 0.02068 1.3777 536.5 | |
| on_model.encoder.layers.10.self_attn.v_proj.weight [768, 768] 17.2492 0.02246 1.6574 542.5 | |
| on_model.encoder.layers.10.self_attn.q_proj.weight [768, 768] 15.7094 0.02046 1.5116 520.5 | |
| _model.encoder.layers.10.self_attn.out_proj.weight [768, 768] 16.9406 0.02206 1.7366 538.1 | |
| vision_model.encoder.layers.10.mlp.fc1.weight [3072, 768] 35.5715 0.02315 9.1395 677.0 | |
| vision_model.encoder.layers.10.mlp.fc2.weight [768, 3072] 37.5724 0.02446 3.6449 694.6 | |
| on_model.encoder.layers.11.self_attn.k_proj.weight [768, 768] 16.1542 0.02103 1.7355 529.3 | |
| on_model.encoder.layers.11.self_attn.v_proj.weight [768, 768] 18.5693 0.02418 2.1755 525.9 | |
| on_model.encoder.layers.11.self_attn.q_proj.weight [768, 768] 15.6457 0.02037 1.7939 517.4 | |
| _model.encoder.layers.11.self_attn.out_proj.weight [768, 768] 18.4060 0.02397 2.5140 515.2 | |
| vision_model.encoder.layers.11.mlp.fc1.weight [3072, 768] 35.7601 0.02328 8.1755 683.3 | |
| vision_model.encoder.layers.11.mlp.fc2.weight [768, 3072] 38.3973 0.02500 5.6203 628.3 | |
| vision_model.head.attention.in_proj_weight [2304, 768] 19.4668 0.01463 3.2690 654.7 | |
| vision_model.head.attention.out_proj.weight [768, 768] 16.4625 0.02144 1.2549 673.8 | |
| vision_model.head.mlp.fc1.weight [3072, 768] 50.3037 0.03239 12.1273 598.4 | |
| vision_model.head.mlp.fc2.weight [768, 3072] 32.4487 0.02113 4.7967 605.0 | |
| ================================================================= | |
| SCAN 4: PATCH EMBEDDING WEIGHTS | |
| ================================================================= | |
| clip_l14: vision_model.embeddings.patch_embedding.weight | |
| Shape: [1024, 3, 14, 14] | |
| = 1024 filters × 3 channels × 14×14 kernel | |
| Spectral: sv_max=1.6538 sv_min=0.003914 eff_rank=240.4/588 | |
| Norm: 13.0285 Mean: -0.000047 Std: 0.016790 | |
| Filter norms: mean=0.3947 std=0.0998 min=0.0126 max=0.6173 | |
| dinov2_b14: embeddings.patch_embeddings.projection.weight | |
| Shape: [768, 3, 14, 14] | |
| = 768 filters × 3 channels × 14×14 kernel | |
| Spectral: sv_max=0.5876 sv_min=0.001076 eff_rank=238.0/588 | |
| Norm: 5.0943 Mean: 0.000002 Std: 0.007581 | |
| Filter norms: mean=0.1724 std=0.0639 min=0.0037 max=0.2703 | |
| siglip_b16: vision_model.embeddings.patch_embedding.weight | |
| Shape: [768, 3, 16, 16] | |
| = 768 filters × 3 channels × 16×16 kernel | |
| Spectral: sv_max=1.4635 sv_min=0.000005 eff_rank=306.1/768 | |
| Norm: 13.1067 Mean: -0.000114 Std: 0.017066 | |
| Filter norms: mean=0.4322 std=0.1923 min=0.0375 max=0.6987 | |
| Patch embedding Procrustes alignment: | |
| clip_l14 × dinov2_b14: raw_cos=0.0005 (d_min=768, d_feat=588) | |
| clip_l14 × siglip_b16: raw_cos=0.0002 (d_min=768, d_feat=588) | |
| dinov2_b14 × siglip_b16: raw_cos=-0.0013 (d_min=768, d_feat=588) | |
| ================================================================= | |
| SCAN 5: ATTENTION HEAD GEOMETRY | |
| ================================================================= | |
| clip_l14 (24 layers): | |
| layer Q_norm K_norm V_norm QK_cos QV_cos KV_cos | |
| 0 10.327 10.191 10.465 -0.0050 0.4956 0.0016 | |
| 1 13.829 12.888 13.290 0.0010 0.3957 0.0059 | |
| 2 15.006 13.236 14.779 -0.0013 0.4683 0.0019 | |
| ... | |
| 12 17.444 14.240 17.412 0.0023 0.2518 0.0017 | |
| 22 15.488 15.782 33.310 -0.0000 0.1002 0.0010 | |
| 23 14.760 18.949 14.975 -0.0015 0.2212 -0.0026 | |
| dinov2_b14 (12 layers): | |
| layer Q_norm K_norm V_norm QK_cos QV_cos KV_cos | |
| 0 15.065 14.545 10.900 0.5069 0.0064 -0.0028 | |
| 1 15.511 15.774 12.072 0.4234 0.0012 0.0031 | |
| 2 13.697 13.859 11.329 0.5255 -0.0000 0.0004 | |
| ... | |
| 6 14.086 13.827 11.516 0.1433 -0.0004 -0.0013 | |
| 10 14.113 13.791 12.560 0.1317 -0.0011 -0.0007 | |
| 11 13.838 13.751 13.283 0.2065 -0.0000 0.0021 | |
| siglip_b16 (12 layers): | |
| layer Q_norm K_norm V_norm QK_cos QV_cos KV_cos | |
| 0 25.351 11.180 25.439 0.0025 0.7388 0.0022 | |
| 1 21.232 14.235 21.566 -0.0022 0.4919 -0.0048 | |
| 2 19.013 14.122 19.093 0.0000 0.2704 0.0009 | |
| ... | |
| 6 17.175 15.306 17.691 -0.0027 0.2154 0.0000 | |
| 10 15.879 17.249 15.709 0.0006 0.1501 -0.0001 | |
| 11 16.154 18.569 15.646 0.0010 0.1470 -0.0006 | |
| ================================================================= | |
| SCAN 6: CROSS-MODEL WEIGHT ALIGNMENT | |
| ================================================================= | |
| Cross-model Q weight cosine at equivalent depth fractions: | |
| depth clip×dino clip×siglip dino×siglip | |
| clip_l14: 24 layers | |
| dinov2_b14: 12 layers | |
| siglip_b16: 12 layers | |
| 0% 0.0010 -0.0014 -0.0005 | |
| 25% -0.0006 0.0012 -0.0000 | |
| 50% -0.0014 -0.0007 -0.0009 | |
| 75% 0.0004 -0.0010 -0.0004 | |
| 100% -0.0004 -0.0006 -0.0006 | |
| ================================================================= | |
| SCAN 7: MLP WEIGHT SPECTRUM | |
| ================================================================= | |
| clip_l14 MLPs (48 weight matrices): | |
| mlp.fc1.weight [4096, 1024] eff_rank= 281.7/1024 sv_max=11.075 sv_10=1.9901 | |
| mlp.fc2.weight [1024, 4096] eff_rank= 455.7/1024 sv_max=5.235 sv_10=2.0529 | |
| mlp.fc1.weight [4096, 1024] eff_rank= 505.0/1024 sv_max=9.280 sv_10=2.4024 | |
| mlp.fc2.weight [1024, 4096] eff_rank= 687.7/1024 sv_max=3.756 sv_10=1.5893 | |
| mlp.fc1.weight [4096, 1024] eff_rank= 565.7/1024 sv_max=9.483 sv_10=2.2923 | |
| mlp.fc2.weight [1024, 4096] eff_rank= 707.4/1024 sv_max=3.850 sv_10=1.7550 | |
| ... (42 more) | |
| dinov2_b14 MLPs (24 weight matrices): | |
| mlp.fc1.weight [3072, 768] eff_rank= 362.1/768 sv_max=6.923 sv_10=2.3532 | |
| mlp.fc2.weight [768, 3072] eff_rank= 528.8/768 sv_max=2.678 sv_10=1.7801 | |
| mlp.fc1.weight [3072, 768] eff_rank= 569.8/768 sv_max=3.914 sv_10=1.8138 | |
| mlp.fc2.weight [768, 3072] eff_rank= 621.9/768 sv_max=2.178 sv_10=1.4481 | |
| mlp.fc1.weight [3072, 768] eff_rank= 620.0/768 sv_max=2.644 sv_10=1.6269 | |
| mlp.fc2.weight [768, 3072] eff_rank= 617.4/768 sv_max=1.804 sv_10=1.5185 | |
| ... (18 more) | |
| siglip_b16 MLPs (26 weight matrices): | |
| mlp.fc1.weight [3072, 768] eff_rank= 495.1/768 sv_max=6.976 sv_10=2.8588 | |
| mlp.fc2.weight [768, 3072] eff_rank= 513.4/768 sv_max=4.936 sv_10=2.9769 | |
| mlp.fc1.weight [3072, 768] eff_rank= 600.7/768 sv_max=3.401 sv_10=2.4675 | |
| mlp.fc2.weight [768, 3072] eff_rank= 625.9/768 sv_max=2.586 sv_10=1.9950 | |
| mlp.fc1.weight [3072, 768] eff_rank= 659.9/768 sv_max=3.327 sv_10=2.3830 | |
| mlp.fc2.weight [768, 3072] eff_rank= 657.0/768 sv_max=2.227 sv_10=1.9165 | |
| ... (20 more) | |
| ================================================================= | |
| SCAN 8: POSITION EMBEDDINGS | |
| ================================================================= | |
| clip_l14: vision_model.embeddings.position_embedding.weight | |
| Shape: [257, 1024] | |
| Norm: 11.4993 Mean: -0.013006 Std: 0.018257 | |
| Self-sim: diag_mean=1.0000 off_diag_mean=0.8616 | |
| Adjacent pos cos: mean=0.9259 | |
| Spectral: eff_rank=22.3/257 sv1%=86.5% | |
| dinov2_b14: embeddings.position_embeddings | |
| Shape: [1, 1370, 768] | |
| Norm: 9.6808 Mean: -0.000044 Std: 0.009438 | |
| Self-sim: diag_mean=1.0000 off_diag_mean=0.0899 | |
| Adjacent pos cos: mean=0.9136 | |
| Spectral: eff_rank=47.1/768 sv1%=12.2% | |
| siglip_b16: vision_model.embeddings.position_embedding.weight | |
| Shape: [576, 768] | |
| Norm: 93.0842 Mean: -0.000226 Std: 0.139955 | |
| Self-sim: diag_mean=1.0000 off_diag_mean=0.0637 | |
| Adjacent pos cos: mean=0.8524 | |
| Spectral: eff_rank=20.9/576 sv1%=53.0% | |
| ================================================================= | |
| SCAN 9: LAYERNORM WEIGHT/BIAS PATTERNS | |
| ================================================================= | |
| clip_l14 (50 LayerNorms): | |
| vision_model.pre_layrnorm w: mean=0.4305 std=0.4963 b: mean=-0.00095 std=0.0838 | |
| 0.layer_norm1 w: mean=0.3185 std=0.3345 b: mean=-0.00761 std=0.1239 | |
| 0.layer_norm2 w: mean=0.8141 std=0.5963 b: mean=0.00049 std=0.1555 | |
| 1.layer_norm1 w: mean=0.4587 std=0.2367 b: mean=-0.00064 std=0.1305 | |
| FINAL: vision_model.post_layernorm.weight | |
| weight: mean=1.0044 std=0.0635 min=0.6363 max=1.8433 | |
| bias: mean=0.15815 std=0.1796 | |
| dinov2_b14 (25 LayerNorms): | |
| 0.norm1 w: mean=0.6640 std=0.6603 b: mean=0.00617 std=0.2407 | |
| 0.norm2 w: mean=1.4833 std=0.8402 b: mean=0.03759 std=0.3599 | |
| 1.norm1 w: mean=1.2084 std=0.7074 b: mean=0.02352 std=0.3012 | |
| 1.norm2 w: mean=1.3366 std=0.3585 b: mean=0.01015 std=0.3414 | |
| FINAL: layernorm.weight | |
| weight: mean=2.0223 std=1.2662 min=-0.0481 max=15.5085 | |
| bias: mean=0.00439 std=0.4535 | |
| siglip_b16 (26 LayerNorms): | |
| 0.layer_norm1 w: mean=0.4845 std=0.2820 b: mean=0.00176 std=0.1890 | |
| 0.layer_norm2 w: mean=1.6044 std=1.1956 b: mean=-0.00058 std=0.3523 | |
| 1.layer_norm1 w: mean=0.7414 std=0.2573 b: mean=0.02384 std=0.2548 | |
| 1.layer_norm2 w: mean=1.0720 std=0.2015 b: mean=0.02144 std=0.2104 | |
| FINAL: vision_model.head.layernorm.weight | |
| weight: mean=0.9334 std=0.1508 min=0.4374 max=2.4978 | |
| bias: mean=0.14603 std=0.3553 | |
| ================================================================= | |
| SCAN 10: PENTACHORON CV ON WEIGHT GEOMETRY | |
| ================================================================= | |
| Patch embedding filter CV (rows = output filters): | |
| clip_l14 filters=1024 CV=0.0484 | |
| dinov2_b14 filters=768 CV=0.0444 | |
| siglip_b16 filters=768 CV=0.0398 | |
| QKV weight row CV per layer: | |
| model layer Q_cv K_cv V_cv QK_diff | |
| clip_l14 0 0.2023 0.0364 0.2803 0.1659 | |
| clip_l14 1 0.1546 0.0273 0.1394 0.1273 | |
| clip_l14 ... | |
| clip_l14 12 0.0290 0.0236 0.0318 0.0054 | |
| clip_l14 22 0.1283 0.0206 0.1969 0.1077 | |
| clip_l14 23 0.0259 0.0203 0.0248 0.0056 | |
| dinov2_b14 0 0.2148 0.1172 0.0515 0.0977 | |
| dinov2_b14 1 0.0682 0.0656 0.0254 0.0026 | |
| dinov2_b14 ... | |
| dinov2_b14 6 0.0371 0.0330 0.0329 0.0042 | |
| dinov2_b14 10 0.0360 0.0314 0.0273 0.0045 | |
| dinov2_b14 11 0.0357 0.0358 0.0222 0.0001 | |
| siglip_b16 0 0.1282 0.0318 0.1980 0.0964 | |
| siglip_b16 1 0.0631 0.0328 0.0528 0.0303 | |
| siglip_b16 ... | |
| siglip_b16 6 0.0325 0.0298 0.0366 0.0027 | |
| siglip_b16 10 0.0267 0.0235 0.0292 0.0032 | |
| siglip_b16 11 0.0252 0.0284 0.0264 0.0032 | |
| MLP weight row CV (first and last layers): | |
| clip_l14 first_mlp CV=1.2233 last_mlp CV=0.0276 | |
| dinov2_b14 first_mlp CV=0.0830 last_mlp CV=0.0126 | |
| siglip_b16 first_mlp CV=0.0635 last_mlp CV=0.0248 | |
| Position embedding CV: | |
| clip_l14 positions=257 CV=0.3435 | |
| dinov2_b14 positions=1370 CV=0.3179 | |
| siglip_b16 positions=576 CV=0.3001 | |
| ================================================================= | |
| SCAN 11: CROSS-MODEL CV BAND COMPARISON | |
| ================================================================= | |
| Q weight CV distribution per model: | |
| clip_l14 Q: mean=0.0586 std=0.0535 range=[0.0233, 0.2490] | |
| K: mean=0.0247 std=0.0039 range=[0.0189, 0.0380] | |
| V: mean=0.0621 std=0.0616 range=[0.0238, 0.2787] | |
| In CV band [0.18-0.25]: Q=1/24 K=0/24 V=0/24 | |
| dinov2_b14 Q: mean=0.0582 std=0.0464 range=[0.0324, 0.2018] | |
| K: mean=0.0507 std=0.0262 range=[0.0323, 0.1322] | |
| V: mean=0.0322 std=0.0121 range=[0.0209, 0.0703] | |
| In CV band [0.18-0.25]: Q=1/12 K=0/12 V=0/12 | |
| siglip_b16 Q: mean=0.0437 std=0.0324 range=[0.0226, 0.1422] | |
| K: mean=0.0272 std=0.0026 range=[0.0235, 0.0325] | |
| V: mean=0.0507 std=0.0480 range=[0.0294, 0.2063] | |
| In CV band [0.18-0.25]: Q=0/12 K=0/12 V=1/12 | |
| Cross-model concatenated Q weight CV (same-depth rows mixed): | |
| clip_l14 × dinov2_b mean=0.0541 std=0.0425 range=[0.0259, 0.1762] | |
| clip_l14 × siglip_b mean=0.0483 std=0.0352 range=[0.0224, 0.1451] | |
| dinov2_b × siglip_b mean=0.0392 std=0.0241 range=[0.0259, 0.1150] | |
| clip_l14 × dinov2_b × siglip_b mean=0.0419 std=0.0236 range=[0.0220, 0.1077] | |
| ================================================================= | |
| WEIGHT ANALYSIS COMPLETE — STARTING ACTIVATION ANALYSIS | |
| ================================================================= | |
| ================================================================= | |
| SCAN 12: PER-LAYER ACTIVATION EXTRACTION | |
| ================================================================= | |
| Streaming images from rafaelpadilla/coco2017... | |
| Resolving data files: 100% | |
| 39/39 [00:00<00:00, 5093.50it/s] | |
| Loading weights: 100% | |
| 391/391 [00:00<00:00, 1154.38it/s, Materializing param=vision_model.pre_layrnorm.weight] | |
| CLIPVisionModel LOAD REPORT from: openai/clip-vit-large-patch14 | |
| Key | Status | | | |
| -------------------------------------------------------------+------------+--+- | |
| text_model.encoder.layers.{0...11}.self_attn.k_proj.bias | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.mlp.fc1.bias | UNEXPECTED | | | |
| text_projection.weight | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.self_attn.out_proj.bias | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.self_attn.out_proj.weight | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.layer_norm1.bias | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.self_attn.k_proj.weight | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.self_attn.q_proj.bias | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.mlp.fc1.weight | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.layer_norm2.weight | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.mlp.fc2.bias | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.layer_norm1.weight | UNEXPECTED | | | |
| text_model.final_layer_norm.weight | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.layer_norm2.bias | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.mlp.fc2.weight | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.self_attn.v_proj.bias | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.self_attn.v_proj.weight | UNEXPECTED | | | |
| logit_scale | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.self_attn.q_proj.weight | UNEXPECTED | | | |
| text_model.embeddings.position_embedding.weight | UNEXPECTED | | | |
| text_model.final_layer_norm.bias | UNEXPECTED | | | |
| vision_model.embeddings.position_ids | UNEXPECTED | | | |
| text_model.embeddings.position_ids | UNEXPECTED | | | |
| visual_projection.weight | UNEXPECTED | | | |
| text_model.embeddings.token_embedding.weight | UNEXPECTED | | | |
| Notes: | |
| - UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch. | |
| Loading weights: 100% | |
| 223/223 [00:00<00:00, 1104.50it/s, Materializing param=layernorm.weight] | |
| Loading weights: 100% | |
| 208/208 [00:00<00:00, 1008.25it/s, Materializing param=vision_model.post_layernorm.weight] | |
| SiglipVisionModel LOAD REPORT from: google/siglip-base-patch16-384 | |
| Key | Status | | | |
| -------------------------------------------------------------+------------+--+- | |
| text_model.encoder.layers.{0...11}.self_attn.k_proj.bias | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.mlp.fc1.bias | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.self_attn.out_proj.bias | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.self_attn.out_proj.weight | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.layer_norm1.bias | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.self_attn.k_proj.weight | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.self_attn.q_proj.bias | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.mlp.fc1.weight | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.layer_norm2.weight | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.mlp.fc2.bias | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.layer_norm1.weight | UNEXPECTED | | | |
| text_model.final_layer_norm.weight | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.layer_norm2.bias | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.mlp.fc2.weight | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.self_attn.v_proj.bias | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.self_attn.v_proj.weight | UNEXPECTED | | | |
| logit_scale | UNEXPECTED | | | |
| text_model.encoder.layers.{0...11}.self_attn.q_proj.weight | UNEXPECTED | | | |
| text_model.embeddings.position_embedding.weight | UNEXPECTED | | | |
| text_model.final_layer_norm.bias | UNEXPECTED | | | |
| text_model.head.bias | UNEXPECTED | | | |
| text_model.head.weight | UNEXPECTED | | | |
| text_model.embeddings.token_embedding.weight | UNEXPECTED | | | |
| logit_bias | UNEXPECTED | | | |
| Notes: | |
| - UNEXPECTED :can be ignored when loading from different task/architecture; not ok if you expect identical arch. | |
| Captured 256 images (streamed) | |
| clip_l14: 25 layers, d=1024, N=256 | |
| dinov2_b14: 13 layers, d=768, N=256 | |
| siglip_b16: 13 layers, d=768, N=256 | |
| ================================================================= | |
| SCAN 13: WITHIN-MODEL DEPTH PROGRESSION | |
| ================================================================= | |
| Layer-to-layer Procrustes within each model (layer N vs layer N+1): | |
| clip_l14 (25 layers): | |
| L→L+1 pre_cos post_cos sv_min sv_max | |
| 0→1 0.0032 0.0032 0.0000 0.0000 | |
| 1→2 0.7617 0.9290 0.0000 1.9914 | |
| 2→3 0.7570 0.9819 0.0000 1.4528 | |
| ... | |
| 12→13 0.7829 1.0000 0.0000 1.0040 | |
| 22→23 0.8049 1.0000 0.0000 1.0040 | |
| 23→24 0.8713 1.0000 0.0000 1.0040 | |
| dinov2_b14 (13 layers): | |
| L→L+1 pre_cos post_cos sv_min sv_max | |
| 0→1 -0.0045 0.0336 0.0000 8.5919 | |
| 1→2 0.5373 0.7513 0.0000 6.8159 | |
| 2→3 0.7910 0.9438 0.0000 4.8996 | |
| ... | |
| 6→7 0.4866 0.9833 0.0000 1.1664 | |
| 10→11 0.5688 1.0000 0.0000 1.0041 | |
| 11→12 0.2910 1.0000 0.0000 1.0041 | |
| siglip_b16 (13 layers): | |
| L→L+1 pre_cos post_cos sv_min sv_max | |
| 0→1 0.4324 0.9515 0.0000 3.0734 | |
| 1→2 0.6613 0.9999 0.0000 1.0977 | |
| 2→3 0.7007 1.0000 0.0000 1.0043 | |
| ... | |
| 6→7 0.7311 1.0000 0.0000 1.0041 | |
| 10→11 0.7719 1.0000 0.0000 1.0040 | |
| 11→12 0.6503 1.0000 0.0000 1.0040 | |
| ================================================================= | |
| SCAN 14: CROSS-MODEL PROCRUSTES (per depth fraction) | |
| ================================================================= | |
| Layers: clip=25 dino=13 siglip=13 | |
| frac clip×dino clip×dino clip×sig clip×sig dino×sig dino×sig | |
| pre POST pre POST pre POST | |
| ------------------------------------------------------------------- | |
| 0% 0.0003 1.0000 -0.0033 0.0565 0.0005 0.0565 | |
| 10% -0.0121 0.4873 -0.0077 0.9434 0.0138 0.4668 | |
| 20% 0.0147 0.6777 -0.0027 0.9736 -0.0066 0.6628 | |
| 30% -0.0185 0.6158 -0.0049 0.9891 -0.0032 0.6149 | |
| 40% -0.0052 0.7267 -0.0078 0.9906 -0.0028 0.7246 | |
| 50% 0.0126 0.8709 -0.0007 0.9910 -0.0107 0.8671 | |
| 60% -0.0027 0.9491 -0.0013 0.9926 -0.0062 0.9442 | |
| 70% 0.0008 0.9575 0.0098 0.9932 -0.0041 0.9522 | |
| 80% 0.0023 0.9746 0.0226 0.9963 -0.0069 0.9716 | |
| 90% 0.0080 0.9878 0.0069 0.9954 0.0077 0.9836 | |
| 100% -0.0060 0.9996 -0.0069 0.9990 -0.0001 0.9986 | |
| Final output (pooled, L2-normed) Procrustes: | |
| clip_l14 × dinov2_b14: pre=0.0011 POST=1.0000 sv_range=[0.0000, 1.0039] | |
| clip_l14 × siglip_b16: pre=0.0043 POST=0.9997 sv_range=[0.0000, 1.0908] | |
| dinov2_b14 × siglip_b16: pre=-0.0051 POST=0.9997 sv_range=[0.0000, 1.0908] | |
| ================================================================= | |
| SCAN 15: ACTIVATION CV PER LAYER | |
| ================================================================= | |
| model layer CV norm_μ norm_σ eff_dim | |
| ------------------------------------------------------- | |
| clip_l14 0 0.0000 14.500 0.0000 1.0 | |
| clip_l14 1 0.0251 13.234 0.0430 27.2 | |
| clip_l14 ... | |
| clip_l14 4 0.3692 12.597 0.0534 51.5 | |
| clip_l14 8 0.3492 11.341 0.1047 88.2 | |
| clip_l14 12 0.2947 7.901 0.2503 124.8 | |
| clip_l14 16 0.3086 6.690 0.2409 131.7 | |
| clip_l14 20 0.2089 12.478 0.4620 145.1 | |
| clip_l14 23 0.1700 17.310 0.8507 154.2 | |
| clip_l14 24 0.1157 16.085 1.0605 155.4 | |
| dinov2_b14 0 0.0000 0.799 0.0000 0.8 | |
| dinov2_b14 1 0.5707 2.022 0.0211 23.2 | |
| dinov2_b14 ... | |
| dinov2_b14 4 0.3945 2.703 0.0420 69.8 | |
| dinov2_b14 6 0.4672 3.437 0.1381 66.1 | |
| dinov2_b14 8 0.3323 9.531 0.2926 91.4 | |
| dinov2_b14 11 0.4401 24.276 2.4562 103.8 | |
| dinov2_b14 12 0.1060 39.796 5.1513 158.0 | |
| siglip_b16 0 0.9458 4.681 1.2630 55.1 | |
| siglip_b16 1 0.5489 10.083 1.5282 88.1 | |
| siglip_b16 ... | |
| siglip_b16 4 0.2990 9.593 0.5404 117.7 | |
| siglip_b16 6 0.2661 8.269 0.6763 119.6 | |
| siglip_b16 8 0.2456 9.616 1.1296 125.7 | |
| siglip_b16 11 0.2992 21.631 3.8725 136.2 | |
| siglip_b16 12 0.4064 37.213 6.2822 132.1 | |
| ================================================================= | |
| SCAN 16: PER-IMAGE AGREEMENT ANALYSIS | |
| ================================================================= | |
| clip_l14 × dinov2_b14: | |
| Raw per-image cos: mean=-0.0030 std=0.0346 min=-0.0936 max=0.0807 | |
| Distribution: [('0.0-0.1', 123)] | |
| clip_l14 × siglip_b16: | |
| Raw per-image cos: mean=0.0329 std=0.0312 min=-0.0503 max=0.1347 | |
| Distribution: [('0.0-0.1', 220), ('0.1-0.2', 3)] | |
| dinov2_b14 × siglip_b16: | |
| Raw per-image cos: mean=0.0030 std=0.0355 min=-0.0853 max=0.0960 | |
| Distribution: [('0.0-0.1', 145)] | |
| ================================================================= | |
| FULL ANALYSIS COMPLETE | |
| ================================================================= |