File size: 8,213 Bytes
873d9dc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
---
license: mit
---
# geolip-svae-implicit-solver-experiments

Empirical artifacts from the **projective-axis** discovery in trained
sphere-solver batteries (geolip-svae lineage, 2026-04-24 session).

---

## TL;DR

Every trained sphere-solver tested produces an M tensor whose rows,
when antipodal pairs are collapsed, form a uniformly-distributed
codebook on **ℝP^(D-1)**. The "32 points on a sphere" reading is a
mislabel. The trained geometry is projective.

Verified across **19 trained models** spanning D=3, D=4, D=5.

This means the "polygonal omega" we were searching for already exists
as the projective reader applied to sphere-trained M. We don't need a
new normalizer or architecture. The trained sphere-solver IS the
polygonal codebook; we just read it through antipodal-collapse.

---

## The data

### Cross-D pattern at V=32

| D | Pairs collapsed | Axes | Deviation from uniform ℝP^(D-1) | Effective rank |
|---|-----------------|------|----------------------------------|----------------|
| 3 | 10 (62.5%)      | 22   | -0.004                           | 2.96 / 3 (99%) |
| 4 |  6 (37.5%)      | 26   | +0.002                           | 3.96 / 4 (99%) |
| 5 |  3 (18.7%)      | 29   | +0.016                           | 4.94 / 5 (99%) |

Pair-fraction halves with each D step. Axis count climbs toward V=32.
Deviation stays within Β±0.05 of uniform projective baseline at every D.

### Per-noise codebook differentiation (h2-64, V=32 D=4, 16 batteries)

All 16 single-noise batteries projective-clean. Antipodal pair count
varies systematically with training distribution:

- 5 pairs (5 batteries): gaussian, checker, salt_pepper, poisson, rayleigh
  β€” central-tendency distributions
- 6 pairs (3 batteries): uniform, cauchy, exponential
  β€” heavy-tailed or symmetric
- 7 pairs (5 batteries): uniform_scaled, laplace, periodic, mixed, structural
  β€” mid-complexity
- 8 pairs (3 batteries): block, gradient, lognormal
  β€” structured / asymmetric

13 of 16 batteries show positive deviation (axes slightly more spread
than uniform β€” the trainer prefers discriminative spread over perfect
uniformity).

---

## Method (named "projective collapse")

1. Run gaussian inputs through trained sphere-solver, collect M [B, V, D]
2. Average across samples β†’ canonical M_avg [V, D]
3. Identify antipodal pairs via mutual-strongest matching:
   - For each row i, find row j with most-negative cosine
   - Pair (i, j) if cos(i, j) < -0.9 AND j's most-negative is i
   - Greedy: strongest pairs claim first
4. For each pair, take (row_i - row_j) / 2, renormalize β†’ axis vector
   - Canonical sign: first nonzero coordinate positive
5. Unpaired rows kept as-is with sign canonicalization
6. Compute pairwise angles wrapped to [0, Ο€/2] via min(ΞΈ, Ο€-ΞΈ)
   β€” this is the projective angle on ℝP^(D-1)
7. Compare distribution mean against empirical uniform-ℝP^(D-1) baseline

**Verdict thresholds:**
- PROJECTIVE-CLEAN: |deviation| < 0.05, full rank, silhouette < 0.4,
  secondary antipodal ≀ 3
- PROJECTIVE-MOSTLY: deviation and rank pass, other thresholds slip
- STRUCTURED / DEGENERATE: failures

---

## Repo contents

### `implicit_solver_reports/`

Probe results from the four projective re-probes:

- **`A0_projective_reprobe.json` / `.png`** β€” G-Cand (D=3, V=32)
  - 10 pairs, 22 axes, deviation -0.004 β†’ PROJECTIVE-CLEAN
- **`A1_projective_reprobe_h2a.json` / `.png`** β€” H2a (D=4, V=32)
  - 6 pairs, 26 axes, deviation +0.002 β†’ PROJECTIVE-CLEAN
- **`A2_projective_h2_64_singles.json` / `.png`** β€” h2-64 batteries 0-15
  - All 16 PROJECTIVE-CLEAN, axis count range 24-27
- **`A3_d5_spherical/`** β€” D=5 spherical training + integrated probe
  - `A3_results.json` / `A3_summary.png` β€” three D=5 configs at V ∈ {16, 32, 64}
  - `A3a_V16_D5_*/epoch_1_checkpoint.pt` β€” V=16 D=5 trained model
  - `A3b_V32_D5_*/epoch_1_checkpoint.pt` β€” V=32 D=5 trained model
  - `A3c_V64_D5_*/epoch_1_checkpoint.pt` β€” V=64 D=5 trained model

### `phaseQ_reports/`

Q-sweep training artifacts (10 candidates at 1000 batches):

- **`Q_rank02_h64_V32_D4_*`** β€” H2a (the canonical D=4 sphere-solver
  used in A1 probe). 40,227 params, MSE 0.00205.
- **`Q_rank09_h64_V32_D3_*`** β€” G-Cand (the D=3 model probed in A0).
  28,899 params, MSE 0.028.
- 8 other rank-ordered configs from the H2 / G-class characterization

Each variant directory contains `epoch_1_checkpoint.pt` and the
training report JSON.

### `phaseR_reports/`

Sphere-packing test (3 configs, hypothesis falsified β€” see notes below):

- V=16, D=4 β€” predicted H2-LIKE, observed HYBRID (stab 0.74)
- V=8, D=4 β€” predicted H2-LIKE, observed DIFFUSE (failed to converge)
- V=20, D=3 β€” predicted H2-LIKE, observed HYBRID with 6/10 antipodal

Polytope-vertex-count packing was NOT a sufficient predictor of
H2-LIKE static-row behavior. The geometric pattern that actually holds
is the projective-axis structure, not polytope alignment.

---

## How to load a checkpoint

```python
import torch
from huggingface_hub import hf_hub_download

ckpt_path = hf_hub_download(
    repo_id="AbstractPhil/geolip-svae-implicit-solver-experiments",
    filename="implicit_solver_reports/A3_d5_spherical/A3b_V32_D5_h64_dp0_nx0_adam/epoch_1_checkpoint.pt",
)
ckpt = torch.load(ckpt_path, map_location='cpu', weights_only=False)
state_dict = ckpt['model_state']
```

To rebuild the model architecture, you need the same training config
used to train it (V, D, hidden, depth, n_cross, etc.). The
`ablation_configs.py` and `ablation_trainer.py` from the geolip-svae
working set are the source of truth.

---

## How to read a probe result

```python
import json
from huggingface_hub import hf_hub_download

p = hf_hub_download(
    repo_id="AbstractPhil/geolip-svae-implicit-solver-experiments",
    filename="implicit_solver_reports/A2_projective_h2_64_singles.json",
)
with open(p) as f:
    data = json.load(f)

# data['results_per_battery'] β€” per-battery probe metrics (16 batteries)
# data['aggregate'] β€” summary statistics across all 16
```

Each per-battery entry contains:
- `pairs`, `n_axes`, `unpaired` β€” collapse counts
- `proj_angle_mean`, `uniform_baseline`, `deviation` β€” uniformity test
- `best_silhouette`, `best_cluster_k` β€” residual structure
- `effective_rank`, `utilization` β€” dimension utilization
- `secondary_antipodal` β€” further-collapse check
- `verdict` β€” PROJECTIVE-CLEAN / -MOSTLY / STRUCTURED / DEGENERATE
- `proj_angles_subset` β€” first 200 pairwise angles for plotting

---

## What this enables

1. **The polygonal omega is not a normalizer β€” it's an inference-time
   projection.** Training stays spherical (`F.normalize(M, dim=-1)`).
   At inference, apply antipodal-collapse to extract axis codebook.

2. **h2-64 is a library of 16 projective-axis codebooks**, one per
   noise type. Each codebook has 24-27 axes on ℝPΒ³.

3. **A `ProjectiveReader` module** can wrap the collapse + axis
   extraction as a clean inference operator. No D-dependent special
   cases β€” works at D ∈ {3, 4, 5} with the same code.

4. **For downstream tasks** (image discrimination, quantization,
   generation), the trained sphere-solvers can serve as pre-built
   discrete codebooks. No new training required for the codebook.

---

## Open questions (not in this repo)

- Per-input rotation: G-Cand showed row stability 0.531 β€” meaning
  rows rotate per-input. The projective reading describes WHICH axes
  exist; this asks HOW they activate per input. May be the actual
  capsule-like behavior, operating on top of the codebook substrate.
- Per-noise codebook similarity matrix: how geometrically similar are
  the 16 h2-64 codebooks to each other? Could reveal noise-type
  clustering.
- D β‰₯ 6 behavior: do antipodal pairs vanish entirely at very high D?
  Cross-D pattern predicts ~1-2 pairs at D=6, ~0 at D=8+.

---

## Reproducibility

The probe scripts (A0/A1/A2/A3/A4) are not in this repo β€” they live
with the geolip-svae working set and depend on `ablation_configs.py`
and `ablation_trainer.py` from that codebase.

The trained checkpoints + JSON results in this repo are sufficient to
verify the empirical claims without rerunning training.

---

## License

Apache 2.0