File size: 8,213 Bytes
873d9dc | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 | ---
license: mit
---
# geolip-svae-implicit-solver-experiments
Empirical artifacts from the **projective-axis** discovery in trained
sphere-solver batteries (geolip-svae lineage, 2026-04-24 session).
---
## TL;DR
Every trained sphere-solver tested produces an M tensor whose rows,
when antipodal pairs are collapsed, form a uniformly-distributed
codebook on **βP^(D-1)**. The "32 points on a sphere" reading is a
mislabel. The trained geometry is projective.
Verified across **19 trained models** spanning D=3, D=4, D=5.
This means the "polygonal omega" we were searching for already exists
as the projective reader applied to sphere-trained M. We don't need a
new normalizer or architecture. The trained sphere-solver IS the
polygonal codebook; we just read it through antipodal-collapse.
---
## The data
### Cross-D pattern at V=32
| D | Pairs collapsed | Axes | Deviation from uniform βP^(D-1) | Effective rank |
|---|-----------------|------|----------------------------------|----------------|
| 3 | 10 (62.5%) | 22 | -0.004 | 2.96 / 3 (99%) |
| 4 | 6 (37.5%) | 26 | +0.002 | 3.96 / 4 (99%) |
| 5 | 3 (18.7%) | 29 | +0.016 | 4.94 / 5 (99%) |
Pair-fraction halves with each D step. Axis count climbs toward V=32.
Deviation stays within Β±0.05 of uniform projective baseline at every D.
### Per-noise codebook differentiation (h2-64, V=32 D=4, 16 batteries)
All 16 single-noise batteries projective-clean. Antipodal pair count
varies systematically with training distribution:
- 5 pairs (5 batteries): gaussian, checker, salt_pepper, poisson, rayleigh
β central-tendency distributions
- 6 pairs (3 batteries): uniform, cauchy, exponential
β heavy-tailed or symmetric
- 7 pairs (5 batteries): uniform_scaled, laplace, periodic, mixed, structural
β mid-complexity
- 8 pairs (3 batteries): block, gradient, lognormal
β structured / asymmetric
13 of 16 batteries show positive deviation (axes slightly more spread
than uniform β the trainer prefers discriminative spread over perfect
uniformity).
---
## Method (named "projective collapse")
1. Run gaussian inputs through trained sphere-solver, collect M [B, V, D]
2. Average across samples β canonical M_avg [V, D]
3. Identify antipodal pairs via mutual-strongest matching:
- For each row i, find row j with most-negative cosine
- Pair (i, j) if cos(i, j) < -0.9 AND j's most-negative is i
- Greedy: strongest pairs claim first
4. For each pair, take (row_i - row_j) / 2, renormalize β axis vector
- Canonical sign: first nonzero coordinate positive
5. Unpaired rows kept as-is with sign canonicalization
6. Compute pairwise angles wrapped to [0, Ο/2] via min(ΞΈ, Ο-ΞΈ)
β this is the projective angle on βP^(D-1)
7. Compare distribution mean against empirical uniform-βP^(D-1) baseline
**Verdict thresholds:**
- PROJECTIVE-CLEAN: |deviation| < 0.05, full rank, silhouette < 0.4,
secondary antipodal β€ 3
- PROJECTIVE-MOSTLY: deviation and rank pass, other thresholds slip
- STRUCTURED / DEGENERATE: failures
---
## Repo contents
### `implicit_solver_reports/`
Probe results from the four projective re-probes:
- **`A0_projective_reprobe.json` / `.png`** β G-Cand (D=3, V=32)
- 10 pairs, 22 axes, deviation -0.004 β PROJECTIVE-CLEAN
- **`A1_projective_reprobe_h2a.json` / `.png`** β H2a (D=4, V=32)
- 6 pairs, 26 axes, deviation +0.002 β PROJECTIVE-CLEAN
- **`A2_projective_h2_64_singles.json` / `.png`** β h2-64 batteries 0-15
- All 16 PROJECTIVE-CLEAN, axis count range 24-27
- **`A3_d5_spherical/`** β D=5 spherical training + integrated probe
- `A3_results.json` / `A3_summary.png` β three D=5 configs at V β {16, 32, 64}
- `A3a_V16_D5_*/epoch_1_checkpoint.pt` β V=16 D=5 trained model
- `A3b_V32_D5_*/epoch_1_checkpoint.pt` β V=32 D=5 trained model
- `A3c_V64_D5_*/epoch_1_checkpoint.pt` β V=64 D=5 trained model
### `phaseQ_reports/`
Q-sweep training artifacts (10 candidates at 1000 batches):
- **`Q_rank02_h64_V32_D4_*`** β H2a (the canonical D=4 sphere-solver
used in A1 probe). 40,227 params, MSE 0.00205.
- **`Q_rank09_h64_V32_D3_*`** β G-Cand (the D=3 model probed in A0).
28,899 params, MSE 0.028.
- 8 other rank-ordered configs from the H2 / G-class characterization
Each variant directory contains `epoch_1_checkpoint.pt` and the
training report JSON.
### `phaseR_reports/`
Sphere-packing test (3 configs, hypothesis falsified β see notes below):
- V=16, D=4 β predicted H2-LIKE, observed HYBRID (stab 0.74)
- V=8, D=4 β predicted H2-LIKE, observed DIFFUSE (failed to converge)
- V=20, D=3 β predicted H2-LIKE, observed HYBRID with 6/10 antipodal
Polytope-vertex-count packing was NOT a sufficient predictor of
H2-LIKE static-row behavior. The geometric pattern that actually holds
is the projective-axis structure, not polytope alignment.
---
## How to load a checkpoint
```python
import torch
from huggingface_hub import hf_hub_download
ckpt_path = hf_hub_download(
repo_id="AbstractPhil/geolip-svae-implicit-solver-experiments",
filename="implicit_solver_reports/A3_d5_spherical/A3b_V32_D5_h64_dp0_nx0_adam/epoch_1_checkpoint.pt",
)
ckpt = torch.load(ckpt_path, map_location='cpu', weights_only=False)
state_dict = ckpt['model_state']
```
To rebuild the model architecture, you need the same training config
used to train it (V, D, hidden, depth, n_cross, etc.). The
`ablation_configs.py` and `ablation_trainer.py` from the geolip-svae
working set are the source of truth.
---
## How to read a probe result
```python
import json
from huggingface_hub import hf_hub_download
p = hf_hub_download(
repo_id="AbstractPhil/geolip-svae-implicit-solver-experiments",
filename="implicit_solver_reports/A2_projective_h2_64_singles.json",
)
with open(p) as f:
data = json.load(f)
# data['results_per_battery'] β per-battery probe metrics (16 batteries)
# data['aggregate'] β summary statistics across all 16
```
Each per-battery entry contains:
- `pairs`, `n_axes`, `unpaired` β collapse counts
- `proj_angle_mean`, `uniform_baseline`, `deviation` β uniformity test
- `best_silhouette`, `best_cluster_k` β residual structure
- `effective_rank`, `utilization` β dimension utilization
- `secondary_antipodal` β further-collapse check
- `verdict` β PROJECTIVE-CLEAN / -MOSTLY / STRUCTURED / DEGENERATE
- `proj_angles_subset` β first 200 pairwise angles for plotting
---
## What this enables
1. **The polygonal omega is not a normalizer β it's an inference-time
projection.** Training stays spherical (`F.normalize(M, dim=-1)`).
At inference, apply antipodal-collapse to extract axis codebook.
2. **h2-64 is a library of 16 projective-axis codebooks**, one per
noise type. Each codebook has 24-27 axes on βPΒ³.
3. **A `ProjectiveReader` module** can wrap the collapse + axis
extraction as a clean inference operator. No D-dependent special
cases β works at D β {3, 4, 5} with the same code.
4. **For downstream tasks** (image discrimination, quantization,
generation), the trained sphere-solvers can serve as pre-built
discrete codebooks. No new training required for the codebook.
---
## Open questions (not in this repo)
- Per-input rotation: G-Cand showed row stability 0.531 β meaning
rows rotate per-input. The projective reading describes WHICH axes
exist; this asks HOW they activate per input. May be the actual
capsule-like behavior, operating on top of the codebook substrate.
- Per-noise codebook similarity matrix: how geometrically similar are
the 16 h2-64 codebooks to each other? Could reveal noise-type
clustering.
- D β₯ 6 behavior: do antipodal pairs vanish entirely at very high D?
Cross-D pattern predicts ~1-2 pairs at D=6, ~0 at D=8+.
---
## Reproducibility
The probe scripts (A0/A1/A2/A3/A4) are not in this repo β they live
with the geolip-svae working set and depend on `ablation_configs.py`
and `ablation_trainer.py` from that codebase.
The trained checkpoints + JSON results in this repo are sufficient to
verify the empirical claims without rerunning training.
---
## License
Apache 2.0 |