juddddd commited on
Commit
c545634
Β·
verified Β·
1 Parent(s): d95a3f5

Upload folder using huggingface_hub

Browse files
v2_corrected_evaluation/IDENTITY_V2_REPORT_20260122_144008.md ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Identity Reconstruction V2: Fixed Evaluation
2
+
3
+ **Date:** 2026-01-22T14:40:08.858456
4
+
5
+ ## Critical Fix
6
+
7
+ The previous version had inverted PASS/FAIL logic:
8
+ - ❌ Old: "PASS if curve is steep" (shape-based)
9
+ - βœ“ New: "PASS if preserved_rate >= 50% at K >= L/2" (performance-based)
10
+
11
+ ## Results
12
+
13
+ ### Collapsed (No Regularization)
14
+
15
+ | Checkpoint | c6a8102da1ad8c77 |
16
+ |------------|---|
17
+ | tau range | [2.4, 9.8] |
18
+ | tau mean | 6.6 |
19
+
20
+ | K | Preserved Rate | Mean Retention |
21
+ |---|----------------|----------------|
22
+ | 0 | 100% βœ“ | 100.0% |
23
+ | 64 | 20% βœ— | 32.2% |
24
+ | 128 | 0% βœ— | 15.1% |
25
+ | 256 | 20% βœ— | 34.6% |
26
+ | 512 | 20% βœ— | 28.7% |
27
+ | 1,024 | 0% βœ— | 8.9% |
28
+ | 2,048 | 0% βœ— | 23.7% |
29
+ | 4,096 | 20% βœ— | 29.6% |
30
+ | 8,192 | 20% βœ— | 27.9% |
31
+
32
+ **Verdict:** FAIL
33
+ **Basin Width:** 0 (0.0% of L=4096)
34
+ **Explanation:** Identity collapses by K=0. No meaningful long-range coherence.
35
+
36
+ ### Regularized
37
+
38
+ | Checkpoint | c3ca1ce88b2083bf |
39
+ |------------|---|
40
+ | tau range | [0.5, 6931.1] |
41
+ | tau mean | 433.9 |
42
+
43
+ | K | Preserved Rate | Mean Retention |
44
+ |---|----------------|----------------|
45
+ | 0 | 100% βœ“ | 100.0% |
46
+ | 64 | 100% βœ“ | 88.2% |
47
+ | 128 | 100% βœ“ | 77.6% |
48
+ | 256 | 100% βœ“ | 67.4% |
49
+ | 512 | 40% βœ— | 49.0% |
50
+ | 1,024 | 0% βœ— | 29.8% |
51
+ | 2,048 | 0% βœ— | 19.5% |
52
+ | 4,096 | 0% βœ— | 15.5% |
53
+ | 8,192 | 0% βœ— | 17.6% |
54
+
55
+ **Verdict:** FAIL
56
+ **Basin Width:** 256 (6.2% of L=4096)
57
+ **Explanation:** Identity collapses by K=256. No meaningful long-range coherence.
58
+
59
+ ## Comparison
60
+
61
+ | Metric | Collapsed | Regularized |
62
+ |--------|-----------|-------------|
63
+ | Verdict | FAIL | FAIL |
64
+ | Basin Width | 0 | 256 |
65
+ | Basin Width Ratio | 0.0% | 6.2% |
66
+
67
+ **Improvement:** YES
68
+ **Improvement Factor:** infx
69
+
70
+ ## Per-Oscillator Half-Lives
71
+
72
+ ### Collapsed
73
+ ```
74
+ [8.191648388447712, 5.5110275180164185, 8.868783359291054, 7.578944232474908, 2.7534187831011967, 9.80497881309404, 8.08911761592282, 8.288514442215634, 3.0249090614043674, 5.603087503164535, 4.966384193860649, 9.414119910788811, 7.150920960645315, 8.582092906166643, 5.54731359061865, 3.8179097742782147, 6.436678296126676, 2.510538048833402, 8.62104937594066, 7.053315192976514, 8.064701920682985, 4.836207745038946, 9.765584195159231, 9.144968970577576, 8.227067976590098, 3.557109662815741, 5.733768029816274, 2.35043012629783, 3.234315936540382, 7.464391625939635, 7.958097247262541, 9.740077859473676]
75
+ ```
76
+
77
+ ### Regularized
78
+ ```
79
+ [0.5986606020374855, 0.5028913517239931, 1.0160728924654787, 0.5986606020347446, 0.47764157767363097, 1.0160728924446252, 0.5986606020368003, 0.5986606020354299, 0.48694254072470705, 0.5028913517239931, 0.5028913517227875, 1.016072892460572, 0.47764157767363097, 0.7330948926419327, 0.5028913517179652, 0.5025812461450115, 0.7330948926535658, 6931.125226233421, 0.5028913517227875, 0.7330948926435946, 0.5986606020368003, 0.5028913517203764, 1.0160728924507585, 1.0160728924507585, 0.5986606020368003, 0.5025812461377809, 0.4869425407241158, 6931.125226233421, 0.4869425407264809, 4.8104604306517835, 0.5986606020368003, 1.0160728924507585]
80
+ ```
81
+
82
+ ## Honest Assessment
83
+
84
+ The regularizer improves basin width, but the improvement is **marginal**.
85
+ Basin width is still far below the sequence length.
86
+
87
+ ---
88
+
89
+ *Report generated by identity_reconstruction_experiment_v2.py*
v2_corrected_evaluation/identity_v2_20260122_144008.json ADDED
The diff for this file is too large to render. See raw diff