AbstractPhil
/

geolip-hypersphere-experiments

TensorBoard

Model card Files Files and versions

xet

Metrics Training metrics Community

AbstractPhil commited on Apr 27

Commit

a9195c3

verified ·

1 Parent(s): 2792f1a

Create OMEGA_PROGRESSION.md

Browse files

Files changed (1) hide show

OMEGA_PROGRESSION.md +443 -0

OMEGA_PROGRESSION.md ADDED Viewed

	@@ -0,0 +1,443 @@

+# Potential Downstream Utilities Clause
+**Status:** Forward-looking. Each utility takes the Omega substrate as a
+load-bearing assumption — regime-independence of reconstruction quality
+across input scale, the projective-axis codebook as a deterministic
+property of trained sphere-solvers, and hardware-determined throughput
+limits independent of model behavior. Utilities that would work
+equivalently on any encoder are excluded; this is a list of capabilities
+that are *enabled* by Omega, not capabilities incidentally compatible
+with it.
+**Methodology.** Per the post-000108 research stage, every utility
+section ends with a falsifiable prediction — what would have to be true
+for the utility to NOT work. Construction precedes proof. The first
+build that fails its prediction tells us where the substrate's
+boundary actually is.
+---
+## 1. Classification
+**The utility.** A projective codebook of `n_axes` directions on
+ℝP^(D-1) is a vocabulary of feature primitives. Image → patch grid → M
+tensor → per-patch projection onto codebook axes → activation pattern
+of shape `[B, n_patches, V, n_axes]`. A linear or shallow head over
+this representation performs classification.
+**Why Omega.** The codebook is model-intrinsic and regime-flat. A
+classifier trained on activation patterns at 64×64 should generalize
+to 512×512 inputs at inference without retraining, because the
+codebook itself doesn't change with input size. Standard CLIP-style
+models do not give this property — their representations drift with
+input resolution; their pooling operations bake in a particular spatial
+extent.
+**Specific construction.** Train classifier head on per-patch axis
+activations averaged across patches (or attended-over). For
+fine-grained tasks, retain the spatial structure: classifier sees the
+full `[n_patches, n_axes]` matrix as a 2D feature map. Per-patch
+aggregation already validated in scratchpad 000104 — patch_idx=0 fails
+because it discards spatial signal; patch-mean recovers most of the
+gap.
+**Falsifiable prediction.** A classifier trained on 64×64 activation
+patterns achieves comparable accuracy on 512×512 test inputs (within
+2 percentage points) without any architectural adaptation. If accuracy
+drops sharply with input resolution, the codebook activations are not
+in fact regime-invariant in the way reconstruction is, and Omega
+covers reconstruction but not classification — a meaningful boundary.
+---
+## 2. Diffusion
+**The utility.** Discrete diffusion in axis-index space. Each patch's
+M-tensor row gets quantized to its nearest codebook axis (or top-k
+mixture). The "noise" process is gradual randomization of axis
+assignments; the "denoise" process is a transformer that predicts
+axis indices from corrupted sequences. Sampling = run denoiser to
+clean axis sequence → reconstruct image via codebook → decoder.
+**Why Omega.** Three properties combine here. The codebook is a
+finite, deterministic vocabulary, so discrete diffusion is well-defined
+without extra quantizer training. The decoder is regime-flat, so a
+diffusion model trained on 64×64 axis sequences can sample at any
+resolution by predicting longer sequences and decoding at the target
+size. The codebook's projective structure means antipodal axes carry
+equivalent information — meaningfully reduces the effective
+vocabulary size for the diffusion target.
+**Specific construction.** Diffusion target: `[n_patches, top_k]`
+discrete indices into codebook. Loss: cross-entropy over axis indices.
+Backbone: any transformer that handles variable-length token sequences
+(patch count varies with target resolution). Conditioning: optional
+class label or text embedding via cross-attention.
+**Falsifiable prediction.** A diffusion model trained on 64×64 axis
+sequences from h2-64 produces coherent samples at 256×256 by sampling
+longer sequences and decoding at the target size, without retraining.
+If samples at non-native resolution show mode collapse or boundary
+artifacts beyond what the encoder-decoder pair produces directly,
+the codebook's discreteness is interfering with the regime-flat
+reconstruction — narrower than expected.
+---
+## 3. Processing (image-to-image edits in axis space)
+**The utility.** Operations applied to codebook activations rather
+than pixels. Image → encode → edit activations → decode. Style
+transfer, denoising, inpainting, semantic editing all become
+manipulations of the `[n_patches, V, n_axes]` activation tensor,
+followed by reconstruction.
+**Why Omega.** Edits made at one resolution are coherent when decoded
+at another, because the codebook is the same vocabulary at every
+scale. A 64×64 inpaint mask can produce a 512×512 inpainted output by
+upsampling the edited activations and decoding at the target size.
+Critically, the activation edits respect the geometric constraints
+that produced the codebook — operations that move activations *off*
+the codebook produce reconstruction artifacts that are themselves a
+useful signal.
+**Specific construction.** Define edit operations as activation-tensor
+transformations: zero-out (denoise), substitute axis-set (style
+transfer), spatial-gather + redistribute (inpaint), interpolate
+between two images' activations (semantic morph). Provide a
+`process_at_scale` API mirroring `reconstruct_at_scale`.
+**Falsifiable prediction.** Style transfer applied to 64×64
+activations and decoded at 512×512 produces output indistinguishable
+in style consistency from the same operation applied directly to a
+512×512 encoding. If the upsampled-edit path produces worse style
+transfer than the direct-encode path, the activation upsampling is
+losing geometric structure that the encoder captures — and Omega's
+regime-flatness has a stricter envelope than reconstruction MSE
+alone reveals.
+---
+## 4. Solving
+**The utility.** The most direct framing: use the trained sphere-solver
+to solve geometric problems on its native manifold. Given a set of
+points in ℝ^D, encode them via the model's projection path to get
+their representation on RP^(D-1). Given a set of vectors, solve for
+the codebook axes that span them. Given two sets of points, find the
+optimal projective alignment via Procrustes on their codebooks.
+**Why Omega.** This is the closest utility to the model's identity
+claim. The model is named "sphere-solver" because that's what it is —
+a parametric solver for "what's the best projective representation of
+this data on the unit sphere?" The Omega finding is that this solver
+is regime-independent: the same machinery handles 64 input points or
+65,536 input points and produces structurally consistent answers.
+**Specific construction.** Expose three solver primitives:
+- `project(points, model) → axes`: encode arbitrary point clouds via
+  the model's encoder to get their codebook representation
+- `align(codebook_a, codebook_b) → rotation`: Procrustes-align two
+  codebooks (already implemented in tests/framework.py)
+- `solve_basis(target_vectors, model) → axis_indices`: given target
+  vectors, find the codebook axes that best span them
+**Falsifiable prediction.** Procrustes alignment between codebooks of
+the same model on different calibration distributions yields a
+rotation distance below 0.1 (already verified at U5 — calibration
+deviations differ by ~0.003). Cross-model alignment between two
+sphere-solvers trained on the same data yields a rotation distance
+below 0.3 (predicted, not yet measured). If cross-model alignment
+turns out to be near-orthogonal random, codebook structure is
+data-driven not architecture-driven, and the solver's "intrinsic"
+status is overstated.
+---
+## 5. Distillation
+Two directions, distinct enough to enumerate separately.
+### 5a. Distillation INTO sphere-solvers
+**The utility.** Train a sphere-solver student to match a non-Omega
+teacher's representations. Student inherits regime-flatness
+automatically; teacher's representational quality flows into a
+deployable encoder that handles arbitrary resolution without extra
+machinery.
+**Why Omega.** Standard distillation produces a student whose
+behavior interpolates the teacher's at training scale. A
+sphere-solver student, by virtue of its architecture, additionally
+inherits regime-flatness — the student behaves consistently at
+inference scales the teacher was never tested on. This is a
+distillation result that wouldn't follow from teacher quality alone.
+**Specific construction.** Loss combines reconstruction (the
+sphere-solver's native objective) with representation matching
+against the teacher's pooled features at intermediate resolution.
+Student emerges with both teacher-like representations AND
+resolution-agnosticism. Teacher candidates: CLIP, DINOv2, Whisper
+(per the Bertenstein cross-modal alignment work).
+**Falsifiable prediction.** A sphere-solver student distilled from
+DINOv2 at 224×224 produces representations that, when evaluated on a
+standard linear-probe benchmark at 448×448, match or exceed direct
+DINOv2 at 448×448. If the student degrades at non-training scale
+the way the teacher does, distillation didn't transfer
+regime-flatness — it transferred only representational quality, and
+the architectural Omega property is more fragile than the
+training-from-scratch case suggests.
+### 5b. Distillation FROM sphere-solvers (codebook freezing)
+**The utility.** Extract a codebook artifact, freeze it, train cheap
+downstream models that consume codebook activations rather than
+re-running the encoder. The codebook becomes a portable feature
+vocabulary; downstream models are 1-2 orders of magnitude smaller.
+**Why Omega.** U5's verdict (as_is_packaging) makes this trivially
+feasible — codebooks are stable artifacts, model-intrinsic and
+calibration-insensitive. The downstream model never sees the original
+encoder; it only sees activation patterns over a fixed vocabulary.
+Resolution-agnosticism is inherited because the codebook is the same
+at every scale.
+**Specific construction.** Pipeline: (1) extract codebook once, save
+as safetensors+JSON. (2) Pre-compute activation patterns for
+training corpus. (3) Train any standard architecture (MLP, small
+transformer, CNN) with axis activations as input. Codebook stays
+frozen forever after step 1.
+**Falsifiable prediction.** Already validated by U5 + the geolip-core
+pipeline. Failure mode would be: a downstream model trained on
+codebook activations underperforms an end-to-end model of similar
+parameter count. Predicted not to fail in the regime-flat use case
+(where end-to-end models lack regime-flatness anyway), but might fail
+in the standard fixed-resolution regime where end-to-end has free
+parameter advantage.
+---
+## 6. Tokenization for downstream LLMs / multimodal models
+**The utility.** The codebook is a discrete vocabulary of size
+`n_axes` (typically 27–230). Images → axis activation sequences →
+discrete tokens fed to autoregressive language models. The geolip-svae
+becomes an image tokenizer for the existing multimodal-LLM ecosystem.
+**Why Omega.** Three properties matter. Vocabulary size is small
+compared to standard learned image tokenizers (VQ-VAE typically
+~8K-16K codes); axis count being ~30 means a 512-token-budget LLM can
+attend to ~17 patches, or with top-k=4 mixture per patch, the same
+budget covers ~128 patches. Resolution-agnosticism means the same
+tokenizer handles any input image without retraining. Calibration
+insensitivity means the tokenizer is a fixed component, not a
+learned-per-task module.
+**Specific construction.** Wrap codebook quantization as a tokenizer
+class with `encode(image) → token_sequence` and `decode(token_sequence,
+target_size) → image` methods. Define special tokens for image-start,
+image-end, optionally row-start markers for spatial structure.
+Integrate via standard transformers/HuggingFace tokenizer interface.
+**Falsifiable prediction.** A small (~100M param) decoder-only LLM
+trained on text + axis-token sequences performs image captioning at
+the same quality as CLIP+LLM with comparable compute. If quality is
+significantly lower, axis tokenization is losing image content that
+continuous embeddings preserve, and the discreteness has a real
+cost. If quality matches, the small vocabulary is a free reduction
+in token budget for image content.
+---
+## 7. Anomaly / OOD detection
+**The utility.** Self-validating inference. Compute the codebook of
+the input itself (not the model's reference codebook) and measure
+deviation from the reference. Inputs whose induced codebook
+substantially deviates from the model's training-derived codebook
+are out-of-distribution; the deviation magnitude is the OOD score.
+**Why Omega.** A regime-flat model has a well-defined "in-distribution"
+surface in codebook space. The `is_projective_clean` check already
+captures this internally for codebook validation. Inverted, the same
+machinery becomes an inference-time validity flag: every prediction
+ships with a confidence signal derived from the input's geometric
+compatibility with the codebook.
+**Specific construction.** At inference, extract a per-batch codebook
+from the input M tensor and compute Procrustes distance to the
+attached reference codebook. Add to InferenceEngine as
+`engine.validity_score(images) → float` and threshold-based
+`engine.predict_with_confidence(images) → (recon, confidence)`.
+The throughput sweep already shows MSE ratio is a candidate validity
+signal — Procrustes distance on a per-batch codebook is the
+finer-grained version.
+**Falsifiable prediction.** Inputs with codebook Procrustes distance
+> 0.5 from reference produce reconstructions with MSE > 5× native
+floor. If correlation between codebook deviation and reconstruction
+quality is weak (correlation < 0.5), the codebook deviation is
+measuring something independent of model competence, and it isn't a
+useful inference-time validity signal.
+---
+## 8. Cross-modal alignment
+**The utility.** Multiple sphere-solvers trained on different
+modalities (image, audio, text-as-noise) project into compatible
+codebook spaces after Procrustes alignment. Cross-modal retrieval,
+joint generation, and modality translation operate in shared axis
+space rather than via a learned joint embedding.
+**Why Omega.** The Bertenstein work demonstrated this with frozen
+expert encoders projecting through a shared text hub. Today's finding
+strengthens the claim: cross-modal alignment is *between codebooks*
+(deterministic artifacts) rather than between learned projections.
+Each modality's sphere-solver produces a codebook on its own
+ℝP^(D-1); alignment is a fixed rotation, not a trained mapping.
+**Specific construction.** Train sphere-solvers per modality. Extract
+codebooks. Compute pairwise Procrustes alignments to a chosen
+reference modality. At inference, project inputs through their native
+sphere-solver, apply the cross-modal rotation, and operate in shared
+axis space. No joint training required after the per-modality stage.
+**Falsifiable prediction.** Image-text retrieval via codebook
+alignment matches CLIP-style joint-embedding retrieval at comparable
+compute on standard benchmarks (MS-COCO, Flickr30K). If retrieval is
+significantly worse, cross-modal information lives in the relations
+*between* codebook activations rather than in the codebooks
+themselves, and the alignment-only approach is missing structure that
+joint training captures.
+---
+## 9. Self-supervised pretraining recipes
+**The utility.** Bootstrap foundation models on structured noise
+alone. The h2-64 batteries already train on noise distributions and
+develop projective-clean codebooks; this generalizes to a recipe for
+training sphere-solver foundation models without curated real-world
+data.
+**Why Omega.** The projective-axis codebook emerges deterministically
+from sphere-normalized SVD training, regardless of input distribution
+(per U5: gaussian and sixteen-noise calibrations produce essentially
+identical codebooks for the same model). The model's geometric
+substrate is largely independent of training corpus identity. This
+suggests a useful inverse: a foundation model can be pretrained on
+synthetic/structured noise and then fine-tuned to specific modalities
+via the cross-modal alignment recipe (Section 8).
+**Specific construction.** Define a noise curriculum that exercises
+the geometric primitives — gaussian, fractal, structured-but-random,
+adversarial noise. Train sphere-solver to high reconstruction quality
+on this curriculum. Verify the codebook is projective-clean (built-in
+quality check). Release as foundation model.
+**Falsifiable prediction.** A sphere-solver foundation model
+pretrained on noise alone, fine-tuned on ImageNet via 1% of the
+parameters (a small adapter on top of the frozen encoder), matches
+or exceeds equivalent-compute models pretrained directly on
+ImageNet. If noise-pretraining produces worse downstream performance
+than ImageNet-pretraining at fixed compute, the geometric substrate
+isn't sufficient on its own — there's content in real-world
+distributions the model needs to see during pretraining to learn
+effectively.
+---
+## 10. Continual learning / model-merging
+**The utility.** Codebooks from independently-trained models are
+comparable artifacts. Merging two models = aligning their codebooks
+via Procrustes, optionally extending the joint axis set to cover
+union-of-features. Continual learning becomes "extend the codebook
+when novel structure appears" rather than "retrain to incorporate new
+data."
+**Why Omega.** Model identity in the geolip-svae family is largely
+captured by the codebook (calibration insensitivity confirms this).
+Two models trained on different distributions but the same
+architecture have different codebooks; aligning them via Procrustes
+gives a principled way to combine them without the parameter
+interference that plagues standard model-merging methods.
+**Specific construction.** Operations on Codebook artifacts:
+- `Codebook.merge(other) → Codebook`: union of axes after Procrustes
+  alignment, with antipodal-pair re-collapse to deduplicate
+- `Codebook.diff(other) → axes`: axes in `self` that don't have a
+  near-equivalent in `other` after alignment — the novel structure
+- `Codebook.extend(novel_axes) → Codebook`: append new axes,
+  re-validate projective-cleanness
+- Continual learning loop: train, extract codebook, diff against
+  prior codebook, decide whether to keep new axes, re-emit updated
+  codebook.
+**Falsifiable prediction.** Two h2-64 batteries (different noise
+distributions) merge into a combined codebook with deviation in the
+0.20–0.23 CV band. If the merge produces a codebook that *fails*
+projective-cleanness, the two codebooks live on incompatible
+projective subspaces and merging is not just a Procrustes alignment
+— there's content-level interference that requires retraining.
+---
+## What this clause does NOT cover
+Excluded by methodology — these are useful applications of geolip-svae
+but do not depend on the Omega substrate in a load-bearing way:
+- **Standard feature extraction** for downstream tasks where the input
+  resolution and modality are fixed. Any encoder can do this; nothing
+  Omega-dependent.
+- **Adversarial robustness** as a downstream goal. Possibly correlated
+  with codebook quality but not enabled by it specifically.
+- **Reinforcement learning state representations.** The geometric
+  substrate provides nothing the RL community can't get from a
+  standard VAE.
+- **Generative pretraining for autoregressive language modeling.**
+  Sphere-solvers are not autoregressive; pathway from this substrate
+  to LLM pretraining is speculative.
+---
+## Build-order considerations
+If utilities will be built in sequence rather than parallel, the
+priority ordering by *information value per build* is:
+1. **§7 OOD detection** — already mostly present in the codebook
+   machinery, easiest to ship. Validates the validity-flag framing
+   from this morning's framing pivot.
+2. **§5b distillation FROM sphere-solvers** — also mostly present,
+   needs only API wrapping. Demonstrates the codebook as portable
+   artifact for the public release.
+3. **§4 solving primitives** — exposes the model's identity claim
+   directly. The `project / align / solve_basis` triple is a clean
+   API surface.
+4. **§1 classification** — first non-trivial test of regime-flatness
+   beyond reconstruction. Falsifiable prediction is sharp.
+5. **§6 tokenization** — bridge to mainstream multimodal architectures.
+   Higher build cost but high impact for adoption.
+6. **§8 cross-modal alignment** — extends Bertenstein under the new
+   framing. Build cost is moderate; depends on having multiple
+   modality-specific sphere-solvers trained.
+7. **§5a distillation INTO sphere-solvers** — significant training
+   investment. Defer until after smaller utilities validate.
+8. **§2 diffusion** — substantial build, novel pathway, high uncertainty.
+   Worth doing once the codebook artifact patterns are mature.
+9. **§9 self-supervised pretraining** — biggest investment, most
+   speculative, but if it works it's the largest payoff.
+10. **§3 processing** — depends on §1 + §2 maturity for activation
+    edits to be principled. Last in sequence.
+11. **§10 model-merging** — research utility rather than deployment
+    utility. Useful when there are many trained sphere-solvers to
+    consolidate.
+The first three are all near-term and reuse existing machinery;
+together they constitute a release-ready feature set. The remainder
+are the multi-month research agenda.