| # Potential Downstream Utilities Clause |
|
|
| **Status:** Forward-looking. Each utility takes the Omega substrate as a |
| load-bearing assumption — regime-independence of reconstruction quality |
| across input scale, the projective-axis codebook as a deterministic |
| property of trained sphere-solvers, and hardware-determined throughput |
| limits independent of model behavior. Utilities that would work |
| equivalently on any encoder are excluded; this is a list of capabilities |
| that are *enabled* by Omega, not capabilities incidentally compatible |
| with it. |
|
|
| **Methodology.** Per the post-000108 research stage, every utility |
| section ends with a falsifiable prediction — what would have to be true |
| for the utility to NOT work. Construction precedes proof. The first |
| build that fails its prediction tells us where the substrate's |
| boundary actually is. |
|
|
| --- |
|
|
| ## 1. Classification |
|
|
| **The utility.** A projective codebook of `n_axes` directions on |
| ℝP^(D-1) is a vocabulary of feature primitives. Image → patch grid → M |
| tensor → per-patch projection onto codebook axes → activation pattern |
| of shape `[B, n_patches, V, n_axes]`. A linear or shallow head over |
| this representation performs classification. |
|
|
| **Why Omega.** The codebook is model-intrinsic and regime-flat. A |
| classifier trained on activation patterns at 64×64 should generalize |
| to 512×512 inputs at inference without retraining, because the |
| codebook itself doesn't change with input size. Standard CLIP-style |
| models do not give this property — their representations drift with |
| input resolution; their pooling operations bake in a particular spatial |
| extent. |
|
|
| **Specific construction.** Train classifier head on per-patch axis |
| activations averaged across patches (or attended-over). For |
| fine-grained tasks, retain the spatial structure: classifier sees the |
| full `[n_patches, n_axes]` matrix as a 2D feature map. Per-patch |
| aggregation already validated in scratchpad 000104 — patch_idx=0 fails |
| because it discards spatial signal; patch-mean recovers most of the |
| gap. |
| |
| **Falsifiable prediction.** A classifier trained on 64×64 activation |
| patterns achieves comparable accuracy on 512×512 test inputs (within |
| 2 percentage points) without any architectural adaptation. If accuracy |
| drops sharply with input resolution, the codebook activations are not |
| in fact regime-invariant in the way reconstruction is, and Omega |
| covers reconstruction but not classification — a meaningful boundary. |
| |
| --- |
| |
| ## 2. Diffusion |
| |
| **The utility.** Discrete diffusion in axis-index space. Each patch's |
| M-tensor row gets quantized to its nearest codebook axis (or top-k |
| mixture). The "noise" process is gradual randomization of axis |
| assignments; the "denoise" process is a transformer that predicts |
| axis indices from corrupted sequences. Sampling = run denoiser to |
| clean axis sequence → reconstruct image via codebook → decoder. |
| |
| **Why Omega.** Three properties combine here. The codebook is a |
| finite, deterministic vocabulary, so discrete diffusion is well-defined |
| without extra quantizer training. The decoder is regime-flat, so a |
| diffusion model trained on 64×64 axis sequences can sample at any |
| resolution by predicting longer sequences and decoding at the target |
| size. The codebook's projective structure means antipodal axes carry |
| equivalent information — meaningfully reduces the effective |
| vocabulary size for the diffusion target. |
| |
| **Specific construction.** Diffusion target: `[n_patches, top_k]` |
| discrete indices into codebook. Loss: cross-entropy over axis indices. |
| Backbone: any transformer that handles variable-length token sequences |
| (patch count varies with target resolution). Conditioning: optional |
| class label or text embedding via cross-attention. |
| |
| **Falsifiable prediction.** A diffusion model trained on 64×64 axis |
| sequences from h2-64 produces coherent samples at 256×256 by sampling |
| longer sequences and decoding at the target size, without retraining. |
| If samples at non-native resolution show mode collapse or boundary |
| artifacts beyond what the encoder-decoder pair produces directly, |
| the codebook's discreteness is interfering with the regime-flat |
| reconstruction — narrower than expected. |
| |
| --- |
| |
| ## 3. Processing (image-to-image edits in axis space) |
| |
| **The utility.** Operations applied to codebook activations rather |
| than pixels. Image → encode → edit activations → decode. Style |
| transfer, denoising, inpainting, semantic editing all become |
| manipulations of the `[n_patches, V, n_axes]` activation tensor, |
| followed by reconstruction. |
| |
| **Why Omega.** Edits made at one resolution are coherent when decoded |
| at another, because the codebook is the same vocabulary at every |
| scale. A 64×64 inpaint mask can produce a 512×512 inpainted output by |
| upsampling the edited activations and decoding at the target size. |
| Critically, the activation edits respect the geometric constraints |
| that produced the codebook — operations that move activations *off* |
| the codebook produce reconstruction artifacts that are themselves a |
| useful signal. |
| |
| **Specific construction.** Define edit operations as activation-tensor |
| transformations: zero-out (denoise), substitute axis-set (style |
| transfer), spatial-gather + redistribute (inpaint), interpolate |
| between two images' activations (semantic morph). Provide a |
| `process_at_scale` API mirroring `reconstruct_at_scale`. |
| |
| **Falsifiable prediction.** Style transfer applied to 64×64 |
| activations and decoded at 512×512 produces output indistinguishable |
| in style consistency from the same operation applied directly to a |
| 512×512 encoding. If the upsampled-edit path produces worse style |
| transfer than the direct-encode path, the activation upsampling is |
| losing geometric structure that the encoder captures — and Omega's |
| regime-flatness has a stricter envelope than reconstruction MSE |
| alone reveals. |
| |
| --- |
| |
| ## 4. Solving |
| |
| **The utility.** The most direct framing: use the trained sphere-solver |
| to solve geometric problems on its native manifold. Given a set of |
| points in ℝ^D, encode them via the model's projection path to get |
| their representation on RP^(D-1). Given a set of vectors, solve for |
| the codebook axes that span them. Given two sets of points, find the |
| optimal projective alignment via Procrustes on their codebooks. |
| |
| **Why Omega.** This is the closest utility to the model's identity |
| claim. The model is named "sphere-solver" because that's what it is — |
| a parametric solver for "what's the best projective representation of |
| this data on the unit sphere?" The Omega finding is that this solver |
| is regime-independent: the same machinery handles 64 input points or |
| 65,536 input points and produces structurally consistent answers. |
| |
| **Specific construction.** Expose three solver primitives: |
| - `project(points, model) → axes`: encode arbitrary point clouds via |
| the model's encoder to get their codebook representation |
| - `align(codebook_a, codebook_b) → rotation`: Procrustes-align two |
| codebooks (already implemented in tests/framework.py) |
| - `solve_basis(target_vectors, model) → axis_indices`: given target |
| vectors, find the codebook axes that best span them |
|
|
| **Falsifiable prediction.** Procrustes alignment between codebooks of |
| the same model on different calibration distributions yields a |
| rotation distance below 0.1 (already verified at U5 — calibration |
| deviations differ by ~0.003). Cross-model alignment between two |
| sphere-solvers trained on the same data yields a rotation distance |
| below 0.3 (predicted, not yet measured). If cross-model alignment |
| turns out to be near-orthogonal random, codebook structure is |
| data-driven not architecture-driven, and the solver's "intrinsic" |
| status is overstated. |
|
|
| --- |
|
|
| ## 5. Distillation |
|
|
| Two directions, distinct enough to enumerate separately. |
|
|
| ### 5a. Distillation INTO sphere-solvers |
|
|
| **The utility.** Train a sphere-solver student to match a non-Omega |
| teacher's representations. Student inherits regime-flatness |
| automatically; teacher's representational quality flows into a |
| deployable encoder that handles arbitrary resolution without extra |
| machinery. |
|
|
| **Why Omega.** Standard distillation produces a student whose |
| behavior interpolates the teacher's at training scale. A |
| sphere-solver student, by virtue of its architecture, additionally |
| inherits regime-flatness — the student behaves consistently at |
| inference scales the teacher was never tested on. This is a |
| distillation result that wouldn't follow from teacher quality alone. |
|
|
| **Specific construction.** Loss combines reconstruction (the |
| sphere-solver's native objective) with representation matching |
| against the teacher's pooled features at intermediate resolution. |
| Student emerges with both teacher-like representations AND |
| resolution-agnosticism. Teacher candidates: CLIP, DINOv2, Whisper |
| (per the Bertenstein cross-modal alignment work). |
|
|
| **Falsifiable prediction.** A sphere-solver student distilled from |
| DINOv2 at 224×224 produces representations that, when evaluated on a |
| standard linear-probe benchmark at 448×448, match or exceed direct |
| DINOv2 at 448×448. If the student degrades at non-training scale |
| the way the teacher does, distillation didn't transfer |
| regime-flatness — it transferred only representational quality, and |
| the architectural Omega property is more fragile than the |
| training-from-scratch case suggests. |
|
|
| ### 5b. Distillation FROM sphere-solvers (codebook freezing) |
|
|
| **The utility.** Extract a codebook artifact, freeze it, train cheap |
| downstream models that consume codebook activations rather than |
| re-running the encoder. The codebook becomes a portable feature |
| vocabulary; downstream models are 1-2 orders of magnitude smaller. |
|
|
| **Why Omega.** U5's verdict (as_is_packaging) makes this trivially |
| feasible — codebooks are stable artifacts, model-intrinsic and |
| calibration-insensitive. The downstream model never sees the original |
| encoder; it only sees activation patterns over a fixed vocabulary. |
| Resolution-agnosticism is inherited because the codebook is the same |
| at every scale. |
|
|
| **Specific construction.** Pipeline: (1) extract codebook once, save |
| as safetensors+JSON. (2) Pre-compute activation patterns for |
| training corpus. (3) Train any standard architecture (MLP, small |
| transformer, CNN) with axis activations as input. Codebook stays |
| frozen forever after step 1. |
|
|
| **Falsifiable prediction.** Already validated by U5 + the geolip-core |
| pipeline. Failure mode would be: a downstream model trained on |
| codebook activations underperforms an end-to-end model of similar |
| parameter count. Predicted not to fail in the regime-flat use case |
| (where end-to-end models lack regime-flatness anyway), but might fail |
| in the standard fixed-resolution regime where end-to-end has free |
| parameter advantage. |
|
|
| --- |
|
|
| ## 6. Tokenization for downstream LLMs / multimodal models |
|
|
| **The utility.** The codebook is a discrete vocabulary of size |
| `n_axes` (typically 27–230). Images → axis activation sequences → |
| discrete tokens fed to autoregressive language models. The geolip-svae |
| becomes an image tokenizer for the existing multimodal-LLM ecosystem. |
|
|
| **Why Omega.** Three properties matter. Vocabulary size is small |
| compared to standard learned image tokenizers (VQ-VAE typically |
| ~8K-16K codes); axis count being ~30 means a 512-token-budget LLM can |
| attend to ~17 patches, or with top-k=4 mixture per patch, the same |
| budget covers ~128 patches. Resolution-agnosticism means the same |
| tokenizer handles any input image without retraining. Calibration |
| insensitivity means the tokenizer is a fixed component, not a |
| learned-per-task module. |
|
|
| **Specific construction.** Wrap codebook quantization as a tokenizer |
| class with `encode(image) → token_sequence` and `decode(token_sequence, |
| target_size) → image` methods. Define special tokens for image-start, |
| image-end, optionally row-start markers for spatial structure. |
| Integrate via standard transformers/HuggingFace tokenizer interface. |
|
|
| **Falsifiable prediction.** A small (~100M param) decoder-only LLM |
| trained on text + axis-token sequences performs image captioning at |
| the same quality as CLIP+LLM with comparable compute. If quality is |
| significantly lower, axis tokenization is losing image content that |
| continuous embeddings preserve, and the discreteness has a real |
| cost. If quality matches, the small vocabulary is a free reduction |
| in token budget for image content. |
|
|
| --- |
|
|
| ## 7. Anomaly / OOD detection |
|
|
| **The utility.** Self-validating inference. Compute the codebook of |
| the input itself (not the model's reference codebook) and measure |
| deviation from the reference. Inputs whose induced codebook |
| substantially deviates from the model's training-derived codebook |
| are out-of-distribution; the deviation magnitude is the OOD score. |
|
|
| **Why Omega.** A regime-flat model has a well-defined "in-distribution" |
| surface in codebook space. The `is_projective_clean` check already |
| captures this internally for codebook validation. Inverted, the same |
| machinery becomes an inference-time validity flag: every prediction |
| ships with a confidence signal derived from the input's geometric |
| compatibility with the codebook. |
|
|
| **Specific construction.** At inference, extract a per-batch codebook |
| from the input M tensor and compute Procrustes distance to the |
| attached reference codebook. Add to InferenceEngine as |
| `engine.validity_score(images) → float` and threshold-based |
| `engine.predict_with_confidence(images) → (recon, confidence)`. |
| The throughput sweep already shows MSE ratio is a candidate validity |
| signal — Procrustes distance on a per-batch codebook is the |
| finer-grained version. |
|
|
| **Falsifiable prediction.** Inputs with codebook Procrustes distance |
| > 0.5 from reference produce reconstructions with MSE > 5× native |
| floor. If correlation between codebook deviation and reconstruction |
| quality is weak (correlation < 0.5), the codebook deviation is |
| measuring something independent of model competence, and it isn't a |
| useful inference-time validity signal. |
|
|
| --- |
|
|
| ## 8. Cross-modal alignment |
|
|
| **The utility.** Multiple sphere-solvers trained on different |
| modalities (image, audio, text-as-noise) project into compatible |
| codebook spaces after Procrustes alignment. Cross-modal retrieval, |
| joint generation, and modality translation operate in shared axis |
| space rather than via a learned joint embedding. |
|
|
| **Why Omega.** The Bertenstein work demonstrated this with frozen |
| expert encoders projecting through a shared text hub. Today's finding |
| strengthens the claim: cross-modal alignment is *between codebooks* |
| (deterministic artifacts) rather than between learned projections. |
| Each modality's sphere-solver produces a codebook on its own |
| ℝP^(D-1); alignment is a fixed rotation, not a trained mapping. |
|
|
| **Specific construction.** Train sphere-solvers per modality. Extract |
| codebooks. Compute pairwise Procrustes alignments to a chosen |
| reference modality. At inference, project inputs through their native |
| sphere-solver, apply the cross-modal rotation, and operate in shared |
| axis space. No joint training required after the per-modality stage. |
|
|
| **Falsifiable prediction.** Image-text retrieval via codebook |
| alignment matches CLIP-style joint-embedding retrieval at comparable |
| compute on standard benchmarks (MS-COCO, Flickr30K). If retrieval is |
| significantly worse, cross-modal information lives in the relations |
| *between* codebook activations rather than in the codebooks |
| themselves, and the alignment-only approach is missing structure that |
| joint training captures. |
|
|
| --- |
|
|
| ## 9. Self-supervised pretraining recipes |
|
|
| **The utility.** Bootstrap foundation models on structured noise |
| alone. The h2-64 batteries already train on noise distributions and |
| develop projective-clean codebooks; this generalizes to a recipe for |
| training sphere-solver foundation models without curated real-world |
| data. |
|
|
| **Why Omega.** The projective-axis codebook emerges deterministically |
| from sphere-normalized SVD training, regardless of input distribution |
| (per U5: gaussian and sixteen-noise calibrations produce essentially |
| identical codebooks for the same model). The model's geometric |
| substrate is largely independent of training corpus identity. This |
| suggests a useful inverse: a foundation model can be pretrained on |
| synthetic/structured noise and then fine-tuned to specific modalities |
| via the cross-modal alignment recipe (Section 8). |
|
|
| **Specific construction.** Define a noise curriculum that exercises |
| the geometric primitives — gaussian, fractal, structured-but-random, |
| adversarial noise. Train sphere-solver to high reconstruction quality |
| on this curriculum. Verify the codebook is projective-clean (built-in |
| quality check). Release as foundation model. |
|
|
| **Falsifiable prediction.** A sphere-solver foundation model |
| pretrained on noise alone, fine-tuned on ImageNet via 1% of the |
| parameters (a small adapter on top of the frozen encoder), matches |
| or exceeds equivalent-compute models pretrained directly on |
| ImageNet. If noise-pretraining produces worse downstream performance |
| than ImageNet-pretraining at fixed compute, the geometric substrate |
| isn't sufficient on its own — there's content in real-world |
| distributions the model needs to see during pretraining to learn |
| effectively. |
|
|
| --- |
|
|
| ## 10. Continual learning / model-merging |
|
|
| **The utility.** Codebooks from independently-trained models are |
| comparable artifacts. Merging two models = aligning their codebooks |
| via Procrustes, optionally extending the joint axis set to cover |
| union-of-features. Continual learning becomes "extend the codebook |
| when novel structure appears" rather than "retrain to incorporate new |
| data." |
|
|
| **Why Omega.** Model identity in the geolip-svae family is largely |
| captured by the codebook (calibration insensitivity confirms this). |
| Two models trained on different distributions but the same |
| architecture have different codebooks; aligning them via Procrustes |
| gives a principled way to combine them without the parameter |
| interference that plagues standard model-merging methods. |
|
|
| **Specific construction.** Operations on Codebook artifacts: |
| - `Codebook.merge(other) → Codebook`: union of axes after Procrustes |
| alignment, with antipodal-pair re-collapse to deduplicate |
| - `Codebook.diff(other) → axes`: axes in `self` that don't have a |
| near-equivalent in `other` after alignment — the novel structure |
| - `Codebook.extend(novel_axes) → Codebook`: append new axes, |
| re-validate projective-cleanness |
| - Continual learning loop: train, extract codebook, diff against |
| prior codebook, decide whether to keep new axes, re-emit updated |
| codebook. |
|
|
| **Falsifiable prediction.** Two h2-64 batteries (different noise |
| distributions) merge into a combined codebook with deviation in the |
| 0.20–0.23 CV band. If the merge produces a codebook that *fails* |
| projective-cleanness, the two codebooks live on incompatible |
| projective subspaces and merging is not just a Procrustes alignment |
| — there's content-level interference that requires retraining. |
|
|
| --- |
|
|
| ## What this clause does NOT cover |
|
|
| Excluded by methodology — these are useful applications of geolip-svae |
| but do not depend on the Omega substrate in a load-bearing way: |
|
|
| - **Standard feature extraction** for downstream tasks where the input |
| resolution and modality are fixed. Any encoder can do this; nothing |
| Omega-dependent. |
| - **Adversarial robustness** as a downstream goal. Possibly correlated |
| with codebook quality but not enabled by it specifically. |
| - **Reinforcement learning state representations.** The geometric |
| substrate provides nothing the RL community can't get from a |
| standard VAE. |
| - **Generative pretraining for autoregressive language modeling.** |
| Sphere-solvers are not autoregressive; pathway from this substrate |
| to LLM pretraining is speculative. |
|
|
| --- |
|
|
| ## Build-order considerations |
|
|
| If utilities will be built in sequence rather than parallel, the |
| priority ordering by *information value per build* is: |
|
|
| 1. **§7 OOD detection** — already mostly present in the codebook |
| machinery, easiest to ship. Validates the validity-flag framing |
| from this morning's framing pivot. |
| 2. **§5b distillation FROM sphere-solvers** — also mostly present, |
| needs only API wrapping. Demonstrates the codebook as portable |
| artifact for the public release. |
| 3. **§4 solving primitives** — exposes the model's identity claim |
| directly. The `project / align / solve_basis` triple is a clean |
| API surface. |
| 4. **§1 classification** — first non-trivial test of regime-flatness |
| beyond reconstruction. Falsifiable prediction is sharp. |
| 5. **§6 tokenization** — bridge to mainstream multimodal architectures. |
| Higher build cost but high impact for adoption. |
| 6. **§8 cross-modal alignment** — extends Bertenstein under the new |
| framing. Build cost is moderate; depends on having multiple |
| modality-specific sphere-solvers trained. |
| 7. **§5a distillation INTO sphere-solvers** — significant training |
| investment. Defer until after smaller utilities validate. |
| 8. **§2 diffusion** — substantial build, novel pathway, high uncertainty. |
| Worth doing once the codebook artifact patterns are mature. |
| 9. **§9 self-supervised pretraining** — biggest investment, most |
| speculative, but if it works it's the largest payoff. |
| 10. **§3 processing** — depends on §1 + §2 maturity for activation |
| edits to be principled. Last in sequence. |
| 11. **§10 model-merging** — research utility rather than deployment |
| utility. Useful when there are many trained sphere-solvers to |
| consolidate. |
| |
| The first three are all near-term and reuse existing machinery; |
| together they constitute a release-ready feature set. The remainder |
| are the multi-month research agenda. |