Create OMEGA_PROGRESSION.md
Browse files- OMEGA_PROGRESSION.md +443 -0
OMEGA_PROGRESSION.md
ADDED
|
@@ -0,0 +1,443 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Potential Downstream Utilities Clause
|
| 2 |
+
|
| 3 |
+
**Status:** Forward-looking. Each utility takes the Omega substrate as a
|
| 4 |
+
load-bearing assumption β regime-independence of reconstruction quality
|
| 5 |
+
across input scale, the projective-axis codebook as a deterministic
|
| 6 |
+
property of trained sphere-solvers, and hardware-determined throughput
|
| 7 |
+
limits independent of model behavior. Utilities that would work
|
| 8 |
+
equivalently on any encoder are excluded; this is a list of capabilities
|
| 9 |
+
that are *enabled* by Omega, not capabilities incidentally compatible
|
| 10 |
+
with it.
|
| 11 |
+
|
| 12 |
+
**Methodology.** Per the post-000108 research stage, every utility
|
| 13 |
+
section ends with a falsifiable prediction β what would have to be true
|
| 14 |
+
for the utility to NOT work. Construction precedes proof. The first
|
| 15 |
+
build that fails its prediction tells us where the substrate's
|
| 16 |
+
boundary actually is.
|
| 17 |
+
|
| 18 |
+
---
|
| 19 |
+
|
| 20 |
+
## 1. Classification
|
| 21 |
+
|
| 22 |
+
**The utility.** A projective codebook of `n_axes` directions on
|
| 23 |
+
βP^(D-1) is a vocabulary of feature primitives. Image β patch grid β M
|
| 24 |
+
tensor β per-patch projection onto codebook axes β activation pattern
|
| 25 |
+
of shape `[B, n_patches, V, n_axes]`. A linear or shallow head over
|
| 26 |
+
this representation performs classification.
|
| 27 |
+
|
| 28 |
+
**Why Omega.** The codebook is model-intrinsic and regime-flat. A
|
| 29 |
+
classifier trained on activation patterns at 64Γ64 should generalize
|
| 30 |
+
to 512Γ512 inputs at inference without retraining, because the
|
| 31 |
+
codebook itself doesn't change with input size. Standard CLIP-style
|
| 32 |
+
models do not give this property β their representations drift with
|
| 33 |
+
input resolution; their pooling operations bake in a particular spatial
|
| 34 |
+
extent.
|
| 35 |
+
|
| 36 |
+
**Specific construction.** Train classifier head on per-patch axis
|
| 37 |
+
activations averaged across patches (or attended-over). For
|
| 38 |
+
fine-grained tasks, retain the spatial structure: classifier sees the
|
| 39 |
+
full `[n_patches, n_axes]` matrix as a 2D feature map. Per-patch
|
| 40 |
+
aggregation already validated in scratchpad 000104 β patch_idx=0 fails
|
| 41 |
+
because it discards spatial signal; patch-mean recovers most of the
|
| 42 |
+
gap.
|
| 43 |
+
|
| 44 |
+
**Falsifiable prediction.** A classifier trained on 64Γ64 activation
|
| 45 |
+
patterns achieves comparable accuracy on 512Γ512 test inputs (within
|
| 46 |
+
2 percentage points) without any architectural adaptation. If accuracy
|
| 47 |
+
drops sharply with input resolution, the codebook activations are not
|
| 48 |
+
in fact regime-invariant in the way reconstruction is, and Omega
|
| 49 |
+
covers reconstruction but not classification β a meaningful boundary.
|
| 50 |
+
|
| 51 |
+
---
|
| 52 |
+
|
| 53 |
+
## 2. Diffusion
|
| 54 |
+
|
| 55 |
+
**The utility.** Discrete diffusion in axis-index space. Each patch's
|
| 56 |
+
M-tensor row gets quantized to its nearest codebook axis (or top-k
|
| 57 |
+
mixture). The "noise" process is gradual randomization of axis
|
| 58 |
+
assignments; the "denoise" process is a transformer that predicts
|
| 59 |
+
axis indices from corrupted sequences. Sampling = run denoiser to
|
| 60 |
+
clean axis sequence β reconstruct image via codebook β decoder.
|
| 61 |
+
|
| 62 |
+
**Why Omega.** Three properties combine here. The codebook is a
|
| 63 |
+
finite, deterministic vocabulary, so discrete diffusion is well-defined
|
| 64 |
+
without extra quantizer training. The decoder is regime-flat, so a
|
| 65 |
+
diffusion model trained on 64Γ64 axis sequences can sample at any
|
| 66 |
+
resolution by predicting longer sequences and decoding at the target
|
| 67 |
+
size. The codebook's projective structure means antipodal axes carry
|
| 68 |
+
equivalent information β meaningfully reduces the effective
|
| 69 |
+
vocabulary size for the diffusion target.
|
| 70 |
+
|
| 71 |
+
**Specific construction.** Diffusion target: `[n_patches, top_k]`
|
| 72 |
+
discrete indices into codebook. Loss: cross-entropy over axis indices.
|
| 73 |
+
Backbone: any transformer that handles variable-length token sequences
|
| 74 |
+
(patch count varies with target resolution). Conditioning: optional
|
| 75 |
+
class label or text embedding via cross-attention.
|
| 76 |
+
|
| 77 |
+
**Falsifiable prediction.** A diffusion model trained on 64Γ64 axis
|
| 78 |
+
sequences from h2-64 produces coherent samples at 256Γ256 by sampling
|
| 79 |
+
longer sequences and decoding at the target size, without retraining.
|
| 80 |
+
If samples at non-native resolution show mode collapse or boundary
|
| 81 |
+
artifacts beyond what the encoder-decoder pair produces directly,
|
| 82 |
+
the codebook's discreteness is interfering with the regime-flat
|
| 83 |
+
reconstruction β narrower than expected.
|
| 84 |
+
|
| 85 |
+
---
|
| 86 |
+
|
| 87 |
+
## 3. Processing (image-to-image edits in axis space)
|
| 88 |
+
|
| 89 |
+
**The utility.** Operations applied to codebook activations rather
|
| 90 |
+
than pixels. Image β encode β edit activations β decode. Style
|
| 91 |
+
transfer, denoising, inpainting, semantic editing all become
|
| 92 |
+
manipulations of the `[n_patches, V, n_axes]` activation tensor,
|
| 93 |
+
followed by reconstruction.
|
| 94 |
+
|
| 95 |
+
**Why Omega.** Edits made at one resolution are coherent when decoded
|
| 96 |
+
at another, because the codebook is the same vocabulary at every
|
| 97 |
+
scale. A 64Γ64 inpaint mask can produce a 512Γ512 inpainted output by
|
| 98 |
+
upsampling the edited activations and decoding at the target size.
|
| 99 |
+
Critically, the activation edits respect the geometric constraints
|
| 100 |
+
that produced the codebook β operations that move activations *off*
|
| 101 |
+
the codebook produce reconstruction artifacts that are themselves a
|
| 102 |
+
useful signal.
|
| 103 |
+
|
| 104 |
+
**Specific construction.** Define edit operations as activation-tensor
|
| 105 |
+
transformations: zero-out (denoise), substitute axis-set (style
|
| 106 |
+
transfer), spatial-gather + redistribute (inpaint), interpolate
|
| 107 |
+
between two images' activations (semantic morph). Provide a
|
| 108 |
+
`process_at_scale` API mirroring `reconstruct_at_scale`.
|
| 109 |
+
|
| 110 |
+
**Falsifiable prediction.** Style transfer applied to 64Γ64
|
| 111 |
+
activations and decoded at 512Γ512 produces output indistinguishable
|
| 112 |
+
in style consistency from the same operation applied directly to a
|
| 113 |
+
512Γ512 encoding. If the upsampled-edit path produces worse style
|
| 114 |
+
transfer than the direct-encode path, the activation upsampling is
|
| 115 |
+
losing geometric structure that the encoder captures β and Omega's
|
| 116 |
+
regime-flatness has a stricter envelope than reconstruction MSE
|
| 117 |
+
alone reveals.
|
| 118 |
+
|
| 119 |
+
---
|
| 120 |
+
|
| 121 |
+
## 4. Solving
|
| 122 |
+
|
| 123 |
+
**The utility.** The most direct framing: use the trained sphere-solver
|
| 124 |
+
to solve geometric problems on its native manifold. Given a set of
|
| 125 |
+
points in β^D, encode them via the model's projection path to get
|
| 126 |
+
their representation on RP^(D-1). Given a set of vectors, solve for
|
| 127 |
+
the codebook axes that span them. Given two sets of points, find the
|
| 128 |
+
optimal projective alignment via Procrustes on their codebooks.
|
| 129 |
+
|
| 130 |
+
**Why Omega.** This is the closest utility to the model's identity
|
| 131 |
+
claim. The model is named "sphere-solver" because that's what it is β
|
| 132 |
+
a parametric solver for "what's the best projective representation of
|
| 133 |
+
this data on the unit sphere?" The Omega finding is that this solver
|
| 134 |
+
is regime-independent: the same machinery handles 64 input points or
|
| 135 |
+
65,536 input points and produces structurally consistent answers.
|
| 136 |
+
|
| 137 |
+
**Specific construction.** Expose three solver primitives:
|
| 138 |
+
- `project(points, model) β axes`: encode arbitrary point clouds via
|
| 139 |
+
the model's encoder to get their codebook representation
|
| 140 |
+
- `align(codebook_a, codebook_b) β rotation`: Procrustes-align two
|
| 141 |
+
codebooks (already implemented in tests/framework.py)
|
| 142 |
+
- `solve_basis(target_vectors, model) β axis_indices`: given target
|
| 143 |
+
vectors, find the codebook axes that best span them
|
| 144 |
+
|
| 145 |
+
**Falsifiable prediction.** Procrustes alignment between codebooks of
|
| 146 |
+
the same model on different calibration distributions yields a
|
| 147 |
+
rotation distance below 0.1 (already verified at U5 β calibration
|
| 148 |
+
deviations differ by ~0.003). Cross-model alignment between two
|
| 149 |
+
sphere-solvers trained on the same data yields a rotation distance
|
| 150 |
+
below 0.3 (predicted, not yet measured). If cross-model alignment
|
| 151 |
+
turns out to be near-orthogonal random, codebook structure is
|
| 152 |
+
data-driven not architecture-driven, and the solver's "intrinsic"
|
| 153 |
+
status is overstated.
|
| 154 |
+
|
| 155 |
+
---
|
| 156 |
+
|
| 157 |
+
## 5. Distillation
|
| 158 |
+
|
| 159 |
+
Two directions, distinct enough to enumerate separately.
|
| 160 |
+
|
| 161 |
+
### 5a. Distillation INTO sphere-solvers
|
| 162 |
+
|
| 163 |
+
**The utility.** Train a sphere-solver student to match a non-Omega
|
| 164 |
+
teacher's representations. Student inherits regime-flatness
|
| 165 |
+
automatically; teacher's representational quality flows into a
|
| 166 |
+
deployable encoder that handles arbitrary resolution without extra
|
| 167 |
+
machinery.
|
| 168 |
+
|
| 169 |
+
**Why Omega.** Standard distillation produces a student whose
|
| 170 |
+
behavior interpolates the teacher's at training scale. A
|
| 171 |
+
sphere-solver student, by virtue of its architecture, additionally
|
| 172 |
+
inherits regime-flatness β the student behaves consistently at
|
| 173 |
+
inference scales the teacher was never tested on. This is a
|
| 174 |
+
distillation result that wouldn't follow from teacher quality alone.
|
| 175 |
+
|
| 176 |
+
**Specific construction.** Loss combines reconstruction (the
|
| 177 |
+
sphere-solver's native objective) with representation matching
|
| 178 |
+
against the teacher's pooled features at intermediate resolution.
|
| 179 |
+
Student emerges with both teacher-like representations AND
|
| 180 |
+
resolution-agnosticism. Teacher candidates: CLIP, DINOv2, Whisper
|
| 181 |
+
(per the Bertenstein cross-modal alignment work).
|
| 182 |
+
|
| 183 |
+
**Falsifiable prediction.** A sphere-solver student distilled from
|
| 184 |
+
DINOv2 at 224Γ224 produces representations that, when evaluated on a
|
| 185 |
+
standard linear-probe benchmark at 448Γ448, match or exceed direct
|
| 186 |
+
DINOv2 at 448Γ448. If the student degrades at non-training scale
|
| 187 |
+
the way the teacher does, distillation didn't transfer
|
| 188 |
+
regime-flatness β it transferred only representational quality, and
|
| 189 |
+
the architectural Omega property is more fragile than the
|
| 190 |
+
training-from-scratch case suggests.
|
| 191 |
+
|
| 192 |
+
### 5b. Distillation FROM sphere-solvers (codebook freezing)
|
| 193 |
+
|
| 194 |
+
**The utility.** Extract a codebook artifact, freeze it, train cheap
|
| 195 |
+
downstream models that consume codebook activations rather than
|
| 196 |
+
re-running the encoder. The codebook becomes a portable feature
|
| 197 |
+
vocabulary; downstream models are 1-2 orders of magnitude smaller.
|
| 198 |
+
|
| 199 |
+
**Why Omega.** U5's verdict (as_is_packaging) makes this trivially
|
| 200 |
+
feasible β codebooks are stable artifacts, model-intrinsic and
|
| 201 |
+
calibration-insensitive. The downstream model never sees the original
|
| 202 |
+
encoder; it only sees activation patterns over a fixed vocabulary.
|
| 203 |
+
Resolution-agnosticism is inherited because the codebook is the same
|
| 204 |
+
at every scale.
|
| 205 |
+
|
| 206 |
+
**Specific construction.** Pipeline: (1) extract codebook once, save
|
| 207 |
+
as safetensors+JSON. (2) Pre-compute activation patterns for
|
| 208 |
+
training corpus. (3) Train any standard architecture (MLP, small
|
| 209 |
+
transformer, CNN) with axis activations as input. Codebook stays
|
| 210 |
+
frozen forever after step 1.
|
| 211 |
+
|
| 212 |
+
**Falsifiable prediction.** Already validated by U5 + the geolip-core
|
| 213 |
+
pipeline. Failure mode would be: a downstream model trained on
|
| 214 |
+
codebook activations underperforms an end-to-end model of similar
|
| 215 |
+
parameter count. Predicted not to fail in the regime-flat use case
|
| 216 |
+
(where end-to-end models lack regime-flatness anyway), but might fail
|
| 217 |
+
in the standard fixed-resolution regime where end-to-end has free
|
| 218 |
+
parameter advantage.
|
| 219 |
+
|
| 220 |
+
---
|
| 221 |
+
|
| 222 |
+
## 6. Tokenization for downstream LLMs / multimodal models
|
| 223 |
+
|
| 224 |
+
**The utility.** The codebook is a discrete vocabulary of size
|
| 225 |
+
`n_axes` (typically 27β230). Images β axis activation sequences β
|
| 226 |
+
discrete tokens fed to autoregressive language models. The geolip-svae
|
| 227 |
+
becomes an image tokenizer for the existing multimodal-LLM ecosystem.
|
| 228 |
+
|
| 229 |
+
**Why Omega.** Three properties matter. Vocabulary size is small
|
| 230 |
+
compared to standard learned image tokenizers (VQ-VAE typically
|
| 231 |
+
~8K-16K codes); axis count being ~30 means a 512-token-budget LLM can
|
| 232 |
+
attend to ~17 patches, or with top-k=4 mixture per patch, the same
|
| 233 |
+
budget covers ~128 patches. Resolution-agnosticism means the same
|
| 234 |
+
tokenizer handles any input image without retraining. Calibration
|
| 235 |
+
insensitivity means the tokenizer is a fixed component, not a
|
| 236 |
+
learned-per-task module.
|
| 237 |
+
|
| 238 |
+
**Specific construction.** Wrap codebook quantization as a tokenizer
|
| 239 |
+
class with `encode(image) β token_sequence` and `decode(token_sequence,
|
| 240 |
+
target_size) β image` methods. Define special tokens for image-start,
|
| 241 |
+
image-end, optionally row-start markers for spatial structure.
|
| 242 |
+
Integrate via standard transformers/HuggingFace tokenizer interface.
|
| 243 |
+
|
| 244 |
+
**Falsifiable prediction.** A small (~100M param) decoder-only LLM
|
| 245 |
+
trained on text + axis-token sequences performs image captioning at
|
| 246 |
+
the same quality as CLIP+LLM with comparable compute. If quality is
|
| 247 |
+
significantly lower, axis tokenization is losing image content that
|
| 248 |
+
continuous embeddings preserve, and the discreteness has a real
|
| 249 |
+
cost. If quality matches, the small vocabulary is a free reduction
|
| 250 |
+
in token budget for image content.
|
| 251 |
+
|
| 252 |
+
---
|
| 253 |
+
|
| 254 |
+
## 7. Anomaly / OOD detection
|
| 255 |
+
|
| 256 |
+
**The utility.** Self-validating inference. Compute the codebook of
|
| 257 |
+
the input itself (not the model's reference codebook) and measure
|
| 258 |
+
deviation from the reference. Inputs whose induced codebook
|
| 259 |
+
substantially deviates from the model's training-derived codebook
|
| 260 |
+
are out-of-distribution; the deviation magnitude is the OOD score.
|
| 261 |
+
|
| 262 |
+
**Why Omega.** A regime-flat model has a well-defined "in-distribution"
|
| 263 |
+
surface in codebook space. The `is_projective_clean` check already
|
| 264 |
+
captures this internally for codebook validation. Inverted, the same
|
| 265 |
+
machinery becomes an inference-time validity flag: every prediction
|
| 266 |
+
ships with a confidence signal derived from the input's geometric
|
| 267 |
+
compatibility with the codebook.
|
| 268 |
+
|
| 269 |
+
**Specific construction.** At inference, extract a per-batch codebook
|
| 270 |
+
from the input M tensor and compute Procrustes distance to the
|
| 271 |
+
attached reference codebook. Add to InferenceEngine as
|
| 272 |
+
`engine.validity_score(images) β float` and threshold-based
|
| 273 |
+
`engine.predict_with_confidence(images) β (recon, confidence)`.
|
| 274 |
+
The throughput sweep already shows MSE ratio is a candidate validity
|
| 275 |
+
signal β Procrustes distance on a per-batch codebook is the
|
| 276 |
+
finer-grained version.
|
| 277 |
+
|
| 278 |
+
**Falsifiable prediction.** Inputs with codebook Procrustes distance
|
| 279 |
+
> 0.5 from reference produce reconstructions with MSE > 5Γ native
|
| 280 |
+
floor. If correlation between codebook deviation and reconstruction
|
| 281 |
+
quality is weak (correlation < 0.5), the codebook deviation is
|
| 282 |
+
measuring something independent of model competence, and it isn't a
|
| 283 |
+
useful inference-time validity signal.
|
| 284 |
+
|
| 285 |
+
---
|
| 286 |
+
|
| 287 |
+
## 8. Cross-modal alignment
|
| 288 |
+
|
| 289 |
+
**The utility.** Multiple sphere-solvers trained on different
|
| 290 |
+
modalities (image, audio, text-as-noise) project into compatible
|
| 291 |
+
codebook spaces after Procrustes alignment. Cross-modal retrieval,
|
| 292 |
+
joint generation, and modality translation operate in shared axis
|
| 293 |
+
space rather than via a learned joint embedding.
|
| 294 |
+
|
| 295 |
+
**Why Omega.** The Bertenstein work demonstrated this with frozen
|
| 296 |
+
expert encoders projecting through a shared text hub. Today's finding
|
| 297 |
+
strengthens the claim: cross-modal alignment is *between codebooks*
|
| 298 |
+
(deterministic artifacts) rather than between learned projections.
|
| 299 |
+
Each modality's sphere-solver produces a codebook on its own
|
| 300 |
+
βP^(D-1); alignment is a fixed rotation, not a trained mapping.
|
| 301 |
+
|
| 302 |
+
**Specific construction.** Train sphere-solvers per modality. Extract
|
| 303 |
+
codebooks. Compute pairwise Procrustes alignments to a chosen
|
| 304 |
+
reference modality. At inference, project inputs through their native
|
| 305 |
+
sphere-solver, apply the cross-modal rotation, and operate in shared
|
| 306 |
+
axis space. No joint training required after the per-modality stage.
|
| 307 |
+
|
| 308 |
+
**Falsifiable prediction.** Image-text retrieval via codebook
|
| 309 |
+
alignment matches CLIP-style joint-embedding retrieval at comparable
|
| 310 |
+
compute on standard benchmarks (MS-COCO, Flickr30K). If retrieval is
|
| 311 |
+
significantly worse, cross-modal information lives in the relations
|
| 312 |
+
*between* codebook activations rather than in the codebooks
|
| 313 |
+
themselves, and the alignment-only approach is missing structure that
|
| 314 |
+
joint training captures.
|
| 315 |
+
|
| 316 |
+
---
|
| 317 |
+
|
| 318 |
+
## 9. Self-supervised pretraining recipes
|
| 319 |
+
|
| 320 |
+
**The utility.** Bootstrap foundation models on structured noise
|
| 321 |
+
alone. The h2-64 batteries already train on noise distributions and
|
| 322 |
+
develop projective-clean codebooks; this generalizes to a recipe for
|
| 323 |
+
training sphere-solver foundation models without curated real-world
|
| 324 |
+
data.
|
| 325 |
+
|
| 326 |
+
**Why Omega.** The projective-axis codebook emerges deterministically
|
| 327 |
+
from sphere-normalized SVD training, regardless of input distribution
|
| 328 |
+
(per U5: gaussian and sixteen-noise calibrations produce essentially
|
| 329 |
+
identical codebooks for the same model). The model's geometric
|
| 330 |
+
substrate is largely independent of training corpus identity. This
|
| 331 |
+
suggests a useful inverse: a foundation model can be pretrained on
|
| 332 |
+
synthetic/structured noise and then fine-tuned to specific modalities
|
| 333 |
+
via the cross-modal alignment recipe (Section 8).
|
| 334 |
+
|
| 335 |
+
**Specific construction.** Define a noise curriculum that exercises
|
| 336 |
+
the geometric primitives β gaussian, fractal, structured-but-random,
|
| 337 |
+
adversarial noise. Train sphere-solver to high reconstruction quality
|
| 338 |
+
on this curriculum. Verify the codebook is projective-clean (built-in
|
| 339 |
+
quality check). Release as foundation model.
|
| 340 |
+
|
| 341 |
+
**Falsifiable prediction.** A sphere-solver foundation model
|
| 342 |
+
pretrained on noise alone, fine-tuned on ImageNet via 1% of the
|
| 343 |
+
parameters (a small adapter on top of the frozen encoder), matches
|
| 344 |
+
or exceeds equivalent-compute models pretrained directly on
|
| 345 |
+
ImageNet. If noise-pretraining produces worse downstream performance
|
| 346 |
+
than ImageNet-pretraining at fixed compute, the geometric substrate
|
| 347 |
+
isn't sufficient on its own β there's content in real-world
|
| 348 |
+
distributions the model needs to see during pretraining to learn
|
| 349 |
+
effectively.
|
| 350 |
+
|
| 351 |
+
---
|
| 352 |
+
|
| 353 |
+
## 10. Continual learning / model-merging
|
| 354 |
+
|
| 355 |
+
**The utility.** Codebooks from independently-trained models are
|
| 356 |
+
comparable artifacts. Merging two models = aligning their codebooks
|
| 357 |
+
via Procrustes, optionally extending the joint axis set to cover
|
| 358 |
+
union-of-features. Continual learning becomes "extend the codebook
|
| 359 |
+
when novel structure appears" rather than "retrain to incorporate new
|
| 360 |
+
data."
|
| 361 |
+
|
| 362 |
+
**Why Omega.** Model identity in the geolip-svae family is largely
|
| 363 |
+
captured by the codebook (calibration insensitivity confirms this).
|
| 364 |
+
Two models trained on different distributions but the same
|
| 365 |
+
architecture have different codebooks; aligning them via Procrustes
|
| 366 |
+
gives a principled way to combine them without the parameter
|
| 367 |
+
interference that plagues standard model-merging methods.
|
| 368 |
+
|
| 369 |
+
**Specific construction.** Operations on Codebook artifacts:
|
| 370 |
+
- `Codebook.merge(other) β Codebook`: union of axes after Procrustes
|
| 371 |
+
alignment, with antipodal-pair re-collapse to deduplicate
|
| 372 |
+
- `Codebook.diff(other) β axes`: axes in `self` that don't have a
|
| 373 |
+
near-equivalent in `other` after alignment β the novel structure
|
| 374 |
+
- `Codebook.extend(novel_axes) β Codebook`: append new axes,
|
| 375 |
+
re-validate projective-cleanness
|
| 376 |
+
- Continual learning loop: train, extract codebook, diff against
|
| 377 |
+
prior codebook, decide whether to keep new axes, re-emit updated
|
| 378 |
+
codebook.
|
| 379 |
+
|
| 380 |
+
**Falsifiable prediction.** Two h2-64 batteries (different noise
|
| 381 |
+
distributions) merge into a combined codebook with deviation in the
|
| 382 |
+
0.20β0.23 CV band. If the merge produces a codebook that *fails*
|
| 383 |
+
projective-cleanness, the two codebooks live on incompatible
|
| 384 |
+
projective subspaces and merging is not just a Procrustes alignment
|
| 385 |
+
β there's content-level interference that requires retraining.
|
| 386 |
+
|
| 387 |
+
---
|
| 388 |
+
|
| 389 |
+
## What this clause does NOT cover
|
| 390 |
+
|
| 391 |
+
Excluded by methodology β these are useful applications of geolip-svae
|
| 392 |
+
but do not depend on the Omega substrate in a load-bearing way:
|
| 393 |
+
|
| 394 |
+
- **Standard feature extraction** for downstream tasks where the input
|
| 395 |
+
resolution and modality are fixed. Any encoder can do this; nothing
|
| 396 |
+
Omega-dependent.
|
| 397 |
+
- **Adversarial robustness** as a downstream goal. Possibly correlated
|
| 398 |
+
with codebook quality but not enabled by it specifically.
|
| 399 |
+
- **Reinforcement learning state representations.** The geometric
|
| 400 |
+
substrate provides nothing the RL community can't get from a
|
| 401 |
+
standard VAE.
|
| 402 |
+
- **Generative pretraining for autoregressive language modeling.**
|
| 403 |
+
Sphere-solvers are not autoregressive; pathway from this substrate
|
| 404 |
+
to LLM pretraining is speculative.
|
| 405 |
+
|
| 406 |
+
---
|
| 407 |
+
|
| 408 |
+
## Build-order considerations
|
| 409 |
+
|
| 410 |
+
If utilities will be built in sequence rather than parallel, the
|
| 411 |
+
priority ordering by *information value per build* is:
|
| 412 |
+
|
| 413 |
+
1. **Β§7 OOD detection** β already mostly present in the codebook
|
| 414 |
+
machinery, easiest to ship. Validates the validity-flag framing
|
| 415 |
+
from this morning's framing pivot.
|
| 416 |
+
2. **Β§5b distillation FROM sphere-solvers** β also mostly present,
|
| 417 |
+
needs only API wrapping. Demonstrates the codebook as portable
|
| 418 |
+
artifact for the public release.
|
| 419 |
+
3. **Β§4 solving primitives** β exposes the model's identity claim
|
| 420 |
+
directly. The `project / align / solve_basis` triple is a clean
|
| 421 |
+
API surface.
|
| 422 |
+
4. **Β§1 classification** β first non-trivial test of regime-flatness
|
| 423 |
+
beyond reconstruction. Falsifiable prediction is sharp.
|
| 424 |
+
5. **Β§6 tokenization** β bridge to mainstream multimodal architectures.
|
| 425 |
+
Higher build cost but high impact for adoption.
|
| 426 |
+
6. **Β§8 cross-modal alignment** β extends Bertenstein under the new
|
| 427 |
+
framing. Build cost is moderate; depends on having multiple
|
| 428 |
+
modality-specific sphere-solvers trained.
|
| 429 |
+
7. **Β§5a distillation INTO sphere-solvers** β significant training
|
| 430 |
+
investment. Defer until after smaller utilities validate.
|
| 431 |
+
8. **Β§2 diffusion** β substantial build, novel pathway, high uncertainty.
|
| 432 |
+
Worth doing once the codebook artifact patterns are mature.
|
| 433 |
+
9. **Β§9 self-supervised pretraining** β biggest investment, most
|
| 434 |
+
speculative, but if it works it's the largest payoff.
|
| 435 |
+
10. **Β§3 processing** β depends on Β§1 + Β§2 maturity for activation
|
| 436 |
+
edits to be principled. Last in sequence.
|
| 437 |
+
11. **Β§10 model-merging** β research utility rather than deployment
|
| 438 |
+
utility. Useful when there are many trained sphere-solvers to
|
| 439 |
+
consolidate.
|
| 440 |
+
|
| 441 |
+
The first three are all near-term and reuse existing machinery;
|
| 442 |
+
together they constitute a release-ready feature set. The remainder
|
| 443 |
+
are the multi-month research agenda.
|