Create constellation.md
Browse files- constellation.md +468 -0
constellation.md
ADDED
|
@@ -0,0 +1,468 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Constellation Forms Catalogue
|
| 2 |
+
## GeoLIP Architecture Reference β March 2026
|
| 3 |
+
|
| 4 |
+
Sources:
|
| 5 |
+
- geometric-memory-ft1 (GM1)
|
| 6 |
+
- geometric-memory-ft2 (GM2)
|
| 7 |
+
- geometric-memory-ft3 (GM3)
|
| 8 |
+
- procrustes-vit-hypersphere-ft1 (PVH)
|
| 9 |
+
- constellation-diffusion-bottleneck (CDB)
|
| 10 |
+
- Session benchmarks (SB)
|
| 11 |
+
|
| 12 |
+
---
|
| 13 |
+
|
| 14 |
+
## Universal Constants
|
| 15 |
+
|
| 16 |
+
| Constant | Value | Source |
|
| 17 |
+
|----------|-------|--------|
|
| 18 |
+
| Pentachoron CV attractor | 0.20β0.23 | Geometry of S^15 itself (CDB Β§3) |
|
| 19 |
+
| Binding/separation boundary | 0.29154 radians | 5+ architectures (CDB Β§11) |
|
| 20 |
+
| Effective geometric dimension | ~16 | All trained models (CDB Β§3.3) |
|
| 21 |
+
| CV precision invariance | fp64 through 1-bit | CDB Β§3.2 |
|
| 22 |
+
|
| 23 |
+
## Universal Rules
|
| 24 |
+
|
| 25 |
+
| Rule | Source |
|
| 26 |
+
|------|--------|
|
| 27 |
+
| SquaredReLU in all constellation paths, never GELU | SB activation tests |
|
| 28 |
+
| Patchwork: Linear(tri, triΓ2) β SquaredReLU β LN β Linear(triΓ2, dim) | SB proven |
|
| 29 |
+
| Gate init: -3.0 (sigmoid β 0.047) | SB proven |
|
| 30 |
+
| SLERP: only acos in fp32 (16KB tensor), everything else stays in compute dtype | SB fp32 fix |
|
| 31 |
+
| Adam, NO weight decay β geometry IS regularization | GM3 Β§2.4, PVH Β§12 |
|
| 32 |
+
| InfoNCE is the alignment FORCE. Procrustes is the REGULARIZER. | GM1 Β§4.1 |
|
| 33 |
+
| CV loss on the BOTTLENECK, not the output | GM1 Β§4.2 |
|
| 34 |
+
| CV loss weight: micro (0.001 or below) | GM3 Β§2.2 |
|
| 35 |
+
| Procrustes calibration is non-negotiable for anchor init | PVH Β§5.1 |
|
| 36 |
+
| Anchor dropout (30%) prevents collapse | PVH Β§5.2 |
|
| 37 |
+
|
| 38 |
+
---
|
| 39 |
+
|
| 40 |
+
## Form 1: GeoLIP Core (Classification)
|
| 41 |
+
|
| 42 |
+
**Source:** CDB Β§2
|
| 43 |
+
|
| 44 |
+
**Purpose:** Minimal image classification pipeline. Proves the constellation works as a primary representation layer.
|
| 45 |
+
|
| 46 |
+
**Pipeline:**
|
| 47 |
+
```
|
| 48 |
+
Input image
|
| 49 |
+
β Conv encoder (builds channel depth: 3β64β128β256)
|
| 50 |
+
β AdaptiveAvgPool β Linear(encoder_out, D) β L2 normalize to S^(d-1)
|
| 51 |
+
β Triangulate against N anchors at 3 SLERP phases β tri_dim profile
|
| 52 |
+
β Patchwork MLP reads triangulation
|
| 53 |
+
β Classifier head β logits
|
| 54 |
+
```
|
| 55 |
+
|
| 56 |
+
**Key properties:**
|
| 57 |
+
- Every embedding on the unit sphere BEFORE the constellation sees it
|
| 58 |
+
- The conv encoder builds channel depth β constellation operates on channel dimension
|
| 59 |
+
- One global vector per image, not a sequence
|
| 60 |
+
- No attention anywhere
|
| 61 |
+
|
| 62 |
+
**Proven results:** 91.5% CIFAR-10, 1.6M params, CV=0.2045, 62/64 active anchors
|
| 63 |
+
|
| 64 |
+
**Loss:** CE + CV on embeddings
|
| 65 |
+
|
| 66 |
+
**When to use:** Single-input classification where the input can be reduced to one D-dimensional vector on S^(d-1).
|
| 67 |
+
|
| 68 |
+
---
|
| 69 |
+
|
| 70 |
+
## Form 2: Expert Soup (Multi-Expert Fusion)
|
| 71 |
+
|
| 72 |
+
**Source:** PVH Β§1, Β§4
|
| 73 |
+
|
| 74 |
+
**Purpose:** Fuse multiple frozen pretrained experts into a shared geometric representation on S^(d-1).
|
| 75 |
+
|
| 76 |
+
**Pipeline:**
|
| 77 |
+
```
|
| 78 |
+
Input image
|
| 79 |
+
β N frozen expert encoders (CLIP, DINOv2, SigLIP, etc.) β N Γ 768-d
|
| 80 |
+
β GPA alignment at 768-d (iterative Procrustes to mutual mean)
|
| 81 |
+
β PCA to D_ANCHOR dims
|
| 82 |
+
β Per-expert Procrustes-initialized projectors (768 β D_ANCHOR)
|
| 83 |
+
β L2 normalize β shared constellation on S^(D_ANCHOR-1)
|
| 84 |
+
β Triangulate: each expert through its own Procrustes rotation
|
| 85 |
+
β Patchwork reads fused triangulation
|
| 86 |
+
β Classifier
|
| 87 |
+
```
|
| 88 |
+
|
| 89 |
+
**Key properties:**
|
| 90 |
+
- Experts are FROZEN β never modified
|
| 91 |
+
- Procrustes initialization essential (without: 1/256 active anchors, collapsed)
|
| 92 |
+
- Anchor dropout (30%) β 508/512 active anchors
|
| 93 |
+
- Effective dimensionality matches task complexity (76.9 for COCO's 80 classes)
|
| 94 |
+
- Pipeline is almost entirely linear: 7 linear ops + 2 nonlinearities (GELU in patchwork + classifier)
|
| 95 |
+
- Weight decay explicitly avoided
|
| 96 |
+
|
| 97 |
+
**Proven results:** mAP=0.84 ceiling (data-limited), perfect hypersphere verified (1000/1000 positive volumes), 508/512 active anchors
|
| 98 |
+
|
| 99 |
+
**Loss:** InfoNCE(fused, consensus) + MSE + BCE + Procrustes_align + CV + anchor_spread
|
| 100 |
+
|
| 101 |
+
**Optimizer:** Adam lr=1e-3, NO weight decay
|
| 102 |
+
|
| 103 |
+
**When to use:** Combining multiple pretrained encoders into a shared geometric space for downstream tasks.
|
| 104 |
+
|
| 105 |
+
---
|
| 106 |
+
|
| 107 |
+
## Form 3: Geometric Memory / Anchor Bank (Context Extension)
|
| 108 |
+
|
| 109 |
+
**Source:** GM1 Β§2, GM2 Β§2
|
| 110 |
+
|
| 111 |
+
**Purpose:** Extend a frozen encoder's context window by accumulating segment-level geometric addresses in a memory bank.
|
| 112 |
+
|
| 113 |
+
**Pipeline:**
|
| 114 |
+
```
|
| 115 |
+
Long document (N tokens, N >> encoder context)
|
| 116 |
+
β Split into overlapping segments (sized to encoder window)
|
| 117 |
+
β For each segment:
|
| 118 |
+
β Frozen encoder forward β hidden states at multiple layers
|
| 119 |
+
β Multi-layer fusion (learned weighted sum)
|
| 120 |
+
β Memory tokens cross-attend to fused hidden states
|
| 121 |
+
β Depth-profile compressor: per-layer CLS β single anchor (L2-normalized)
|
| 122 |
+
β Anchor stored in geometric memory bank
|
| 123 |
+
β GRU gate updates rolling memory state
|
| 124 |
+
β Final output: encoder-compatible embedding
|
| 125 |
+
```
|
| 126 |
+
|
| 127 |
+
**Key properties:**
|
| 128 |
+
- Frozen encoder, trainable memory wrapper
|
| 129 |
+
- Depth-profile anchors encode HOW the encoder processed (not just WHAT)
|
| 130 |
+
- CV loss on the BANK ANCHORS specifically β the bottleneck between segments
|
| 131 |
+
- Without CV on bank: projector shortcut collapse (m_acc plateaus at 0.670)
|
| 132 |
+
- With CV on bank: m_acc reaches 0.945
|
| 133 |
+
- Segment size must produce 5+ anchors for CV computation (pentachoron needs 5 points)
|
| 134 |
+
- Convergence order: CV locks first β m_acc climbs β s_cos climbs last
|
| 135 |
+
|
| 136 |
+
**Proven results:**
|
| 137 |
+
- GEOLIP-BERT-8192: m_acc=0.927, CV=0.200 (512β8192 context)
|
| 138 |
+
- GEOLIP-CLIP-ctx576: m_acc=0.945, CV=0.162 (77β576 context)
|
| 139 |
+
|
| 140 |
+
**Loss:** InfoNCE(student, teacher) + Procrustes_SVD + |CV(bank_anchors) - 0.20|
|
| 141 |
+
|
| 142 |
+
**When to use:** Extending frozen encoder context windows while preserving embedding space compatibility.
|
| 143 |
+
|
| 144 |
+
---
|
| 145 |
+
|
| 146 |
+
## Form 4: Sequence Reconstructor (Per-Position Output)
|
| 147 |
+
|
| 148 |
+
**Source:** GM2 Β§2
|
| 149 |
+
|
| 150 |
+
**Purpose:** Produce full per-position output sequences from memory state for diffusion cross-attention.
|
| 151 |
+
|
| 152 |
+
**Pipeline:**
|
| 153 |
+
```
|
| 154 |
+
Memory state (from Form 3 bank accumulation)
|
| 155 |
+
β Context = cat(memory_tokens, bank_anchors, content_tokens)
|
| 156 |
+
β 77 learned query tokens + positional encoding
|
| 157 |
+
β Cross-attend to context (2 layers)
|
| 158 |
+
β Self-attend among 77 output positions (2 layers)
|
| 159 |
+
β Output: (B, 77, 768) β in frozen encoder's native distribution
|
| 160 |
+
```
|
| 161 |
+
|
| 162 |
+
**Key properties:**
|
| 163 |
+
- Must produce output in the distribution the UNet was trained on
|
| 164 |
+
- Training target: frozen encoder's own output on same caption (truncated to 77 tokens)
|
| 165 |
+
- Two teachers: ModernBERT teaches what to remember, CLIP teaches how to say it
|
| 166 |
+
- Two-phase training works for CLIP-L but NOT universally
|
| 167 |
+
- Rule: if you need per-position output, train the per-position consumer from the start
|
| 168 |
+
- Memory format shaped by gradient loudness, not architectural capacity
|
| 169 |
+
|
| 170 |
+
**Proven results:**
|
| 171 |
+
- CLIP-L s_cos=0.734, tulips appeared in SD 1.5 from elements past token 77
|
| 172 |
+
- Meridian (bigG): s_cos=0.425 (limited by 1280β1024 dimensional mismatch)
|
| 173 |
+
|
| 174 |
+
**Loss:** MSE(normalize(pred), normalize(target)) + cosine_similarity + InfoNCE(pooled)
|
| 175 |
+
|
| 176 |
+
**When to use:** When downstream consumer needs per-position sequences (diffusion cross-attention, token-level tasks).
|
| 177 |
+
|
| 178 |
+
---
|
| 179 |
+
|
| 180 |
+
## Form 5: Constellation Relay (Per-Token Geometric Layer)
|
| 181 |
+
|
| 182 |
+
**Source:** CDB Β§4, SB
|
| 183 |
+
|
| 184 |
+
**Purpose:** Replace attention as a per-token processing layer. O(S) complexity. Preserves geometry at depth.
|
| 185 |
+
|
| 186 |
+
**Pipeline:**
|
| 187 |
+
```
|
| 188 |
+
Input (B, S, D) or (B, D)
|
| 189 |
+
β LayerNorm
|
| 190 |
+
β Chunk D into P patches of patch_dim (e.g., 16 Γ 16d = 256d)
|
| 191 |
+
β L2 normalize each patch to S^(d-1)
|
| 192 |
+
β Triangulate against anchors at 3 SLERP phases β tri_dim profile
|
| 193 |
+
β Patchwork MLP reads triangulation
|
| 194 |
+
β Gated residual (gate init -3.0)
|
| 195 |
+
β Output = residual + gate * patchwork_out
|
| 196 |
+
```
|
| 197 |
+
|
| 198 |
+
**Key properties:**
|
| 199 |
+
- Per-token, no cross-token interaction
|
| 200 |
+
- O(S) time and memory β no SΒ² term
|
| 201 |
+
- Preserves 99.4% cosine similarity to input at depth 16 (vs 7.4% for attention)
|
| 202 |
+
- 3.4Γ fewer parameters than vanilla attention
|
| 203 |
+
- Geometric preservation is sequence-length invariant (identical from S=64 through S=131072)
|
| 204 |
+
- Throughput crossover vs attention at Sβ32K; 8.4Γ faster at S=131K
|
| 205 |
+
- SquaredReLU wins: better anchor diversity (7.1 vs 4.6), better equivariance, 0.9999 reconstruction
|
| 206 |
+
|
| 207 |
+
**Proven results:** cos_to_orig=0.994 at depth 16, 8.4Γ faster than attention at S=131K
|
| 208 |
+
|
| 209 |
+
**When to use:** Processing token sequences where geometric preservation matters more than cross-token mixing. Stackable. The per-token processing layer.
|
| 210 |
+
|
| 211 |
+
---
|
| 212 |
+
|
| 213 |
+
## Form 6: Cantor Constellation Router (Cross-Token Routing)
|
| 214 |
+
|
| 215 |
+
**Source:** SB (cantor_constellation_relay.py)
|
| 216 |
+
|
| 217 |
+
**Purpose:** O(S) cross-token routing through the constellation's own anchor hierarchy. Replaces attention's cross-token role.
|
| 218 |
+
|
| 219 |
+
**Pipeline:**
|
| 220 |
+
```
|
| 221 |
+
Input tokens (B, S, D) + triangulation profiles (B, S, tri_dim) from relay
|
| 222 |
+
β Compute soft routing weights from phase-0 triangulation distances
|
| 223 |
+
β For each level l in binary anchor tree (16β8β4β2β1):
|
| 224 |
+
β Merge anchor weights into group weights at level l
|
| 225 |
+
β Weighted scatter: tokens β group summaries (bmm)
|
| 226 |
+
β Transform: per-level MLP(dimβdimΓ2βdim) + LN
|
| 227 |
+
β Weighted gather: group summaries β token updates (bmm)
|
| 228 |
+
β Gated residual at each level
|
| 229 |
+
β Output: tokens with cross-token information
|
| 230 |
+
```
|
| 231 |
+
|
| 232 |
+
**Key properties:**
|
| 233 |
+
- O(S Γ n_levels Γ D) where n_levels = log2(A) + 1 = 5 for A=16
|
| 234 |
+
- No SΒ² term anywhere β not in compute, not in memory
|
| 235 |
+
- Triangulation from the per-token relay IS the routing key (zero redundant computation)
|
| 236 |
+
- Binary tree over anchors defines hierarchy (16β8β4β2β1 groups)
|
| 237 |
+
- At each level: scatter β transform β gather
|
| 238 |
+
- Cantor routing holds at distance BETTER than attention (2Γ stronger at S=4096)
|
| 239 |
+
- The router is a geometric REGULARIZER: cos_orig=0.9818 at 8 layers vs relay alone 0.6533
|
| 240 |
+
- Geometry IMPROVES with more tokens (0.982β0.986 as S increases)
|
| 241 |
+
|
| 242 |
+
**Proven results:** 97.0% cross-token task acc, 0.986 cos preservation at 131K tokens, 5.2Γ faster than attention at 131K
|
| 243 |
+
|
| 244 |
+
**When to use:** Combined with Form 5 relay as a complete O(S) transformer layer replacement (ConstellationCantorRelay).
|
| 245 |
+
|
| 246 |
+
---
|
| 247 |
+
|
| 248 |
+
## Form 7: Diffusion Bottleneck / Geometric Lookup Table
|
| 249 |
+
|
| 250 |
+
**Source:** CDB Β§7β9
|
| 251 |
+
|
| 252 |
+
**Purpose:** The constellation as the sole information bottleneck of a diffusion model. NOT an autoencoder.
|
| 253 |
+
|
| 254 |
+
**Pipeline:**
|
| 255 |
+
```
|
| 256 |
+
Encoder features (256Γ8Γ8 = 16384-d)
|
| 257 |
+
β Linear(16384, 256) β L2 normalize to S^15
|
| 258 |
+
β Reshape (B, 16, 16) β per-patch S^15 normalization
|
| 259 |
+
β Triangulate: 16 patches Γ 16 anchors Γ 3 phases = 768 dims
|
| 260 |
+
β Concat(768 tri dims, conditioning dims)
|
| 261 |
+
β Patchwork MLP β Linear(hidden, 16384) β reshape β decoder
|
| 262 |
+
```
|
| 263 |
+
|
| 264 |
+
**Key properties:**
|
| 265 |
+
- Compression ratio: 16384 β 768 = 21.3Γ
|
| 266 |
+
- cos_sim β 0 to input β the bottleneck does NOT reconstruct
|
| 267 |
+
- It's a geometric LOOKUP TABLE: triangulation profile is an address, patchwork generates from that address
|
| 268 |
+
- Works for flow matching because training signal is velocity prediction, not reconstruction
|
| 269 |
+
- Skip bypass experiment: given 268M linear bypass, model routed 88% through 768 constellation dims
|
| 270 |
+
- Constellation-only cos_sim=0.945 to full model; skip-only cos_sim=0.598
|
| 271 |
+
- The constellation provides a representational ADVANTAGE over unconstrained capacity
|
| 272 |
+
|
| 273 |
+
**Proven results:** Loss 0.1749 (beat 268M skip at 0.1757), 46% anchor convergence to 0.29154 in GLFM
|
| 274 |
+
|
| 275 |
+
**Loss:** Flow matching velocity loss (MSE on predicted vs target velocity)
|
| 276 |
+
|
| 277 |
+
**When to use:** Diffusion model bottleneck where geometric addressing replaces reconstruction.
|
| 278 |
+
|
| 279 |
+
---
|
| 280 |
+
|
| 281 |
+
## Form 8: Geometric Lookup Flow Matching (GLFM)
|
| 282 |
+
|
| 283 |
+
**Source:** CDB Β§10
|
| 284 |
+
|
| 285 |
+
**Purpose:** Formalized three-stage flow matching variant where velocity prediction is driven by geometric address lookup.
|
| 286 |
+
|
| 287 |
+
**Pipeline:**
|
| 288 |
+
```
|
| 289 |
+
Stage 1 β Geometric Addressing:
|
| 290 |
+
Encoder output β project to S^15 at two scales:
|
| 291 |
+
Coarse: global avg pool β 256d β L2 norm β triangulate (768d)
|
| 292 |
+
Fine: per-spatial β 256d β L2 norm β triangulate β aggregate (768d)
|
| 293 |
+
Total address: 1536 dims of angular measurements
|
| 294 |
+
|
| 295 |
+
Stage 2 β Address Conditioning:
|
| 296 |
+
Geometric address + sinusoidal timestep + class embed + noise-level bins
|
| 297 |
+
β Fused projection to generator input dim
|
| 298 |
+
|
| 299 |
+
Stage 3 β Velocity Generation:
|
| 300 |
+
Deep residual MLP generates velocity features from conditioned address
|
| 301 |
+
4 residual blocks width 1024 β 16384-d spatial features β decoder
|
| 302 |
+
```
|
| 303 |
+
|
| 304 |
+
**Key properties:**
|
| 305 |
+
- Explicit separation of addressing, conditioning, and generation
|
| 306 |
+
- Multi-scale collapse observed: coarseβfine cos=0.933 (needs pre-differentiated features like DINOv2)
|
| 307 |
+
- 46% of anchors converged within Β±0.05 of 0.29154 binding constant
|
| 308 |
+
- 59% of anchors crossed binding boundary into task-specific territory
|
| 309 |
+
|
| 310 |
+
**Proven results:** Loss 0.1754, accelerated drift convergence vs pure bottleneck
|
| 311 |
+
|
| 312 |
+
**When to use:** Flow matching diffusion where you want explicit geometric addressing.
|
| 313 |
+
|
| 314 |
+
---
|
| 315 |
+
|
| 316 |
+
## Form 9: From-Scratch Encoder (Pixel β Consensus)
|
| 317 |
+
|
| 318 |
+
**Source:** PVH Β§4.2
|
| 319 |
+
|
| 320 |
+
**Purpose:** Train a ViT from random initialization to reproduce the expert soup consensus embedding from raw pixels.
|
| 321 |
+
|
| 322 |
+
**Pipeline:**
|
| 323 |
+
```
|
| 324 |
+
Raw pixels
|
| 325 |
+
β From-scratch ViT (no pretrained weights)
|
| 326 |
+
β Project to D_ANCHOR dims β L2 normalize
|
| 327 |
+
β Train against frozen soup consensus as differentiable teacher
|
| 328 |
+
```
|
| 329 |
+
|
| 330 |
+
**Key properties:**
|
| 331 |
+
- The soup is the teacher β it provides the target embedding for each image
|
| 332 |
+
- Gradient bottleneck: all gradient flows through D_ANCHOR-dimensional output
|
| 333 |
+
- With 77M params and 128-d output: gradient density = 1.6Γ10β»βΆ per param
|
| 334 |
+
- Expansion warm-start works: 384-dβ1024-d by padding, recovers in 5 epochs
|
| 335 |
+
|
| 336 |
+
**Proven results:** 1024-d ViT reached cos=0.663, mAP=0.500 (limited by gradient bottleneck and 118K COCO)
|
| 337 |
+
|
| 338 |
+
**Loss:** Same as soup training + geometric autograd
|
| 339 |
+
|
| 340 |
+
**When to use:** When you need a single encoder that reproduces multi-expert consensus from raw input.
|
| 341 |
+
|
| 342 |
+
---
|
| 343 |
+
|
| 344 |
+
## Form 10: Dual-Teacher Consensus Distillation
|
| 345 |
+
|
| 346 |
+
**Source:** GM3 Β§4
|
| 347 |
+
|
| 348 |
+
**Purpose:** Two independently-trained models β GPA consensus β distill into student that exceeds both.
|
| 349 |
+
|
| 350 |
+
**Pipeline:**
|
| 351 |
+
```
|
| 352 |
+
Teacher A (any config) + Teacher B (any config)
|
| 353 |
+
β Extract embeddings on shared data
|
| 354 |
+
β GPA-align iteratively until Ξ΄ < 1e-8
|
| 355 |
+
β Consensus = L2_normalize(mean_shape)
|
| 356 |
+
β Student initializes anchors from k-means on consensus
|
| 357 |
+
β Train with: CE + InfoNCE(emb, consensus) + MSE(emb, consensus) + micro CV
|
| 358 |
+
β Geometric autograd: tang=0.01, sep=1.0
|
| 359 |
+
```
|
| 360 |
+
|
| 361 |
+
**Key properties:**
|
| 362 |
+
- Student exceeds BOTH teachers (0.761 vs 0.699/0.649)
|
| 363 |
+
- Student still ACCELERATING at epoch 30 (resonant dynamics)
|
| 364 |
+
- Consensus is the geometric truth β what both agree on after removing rotational frames
|
| 365 |
+
- Robust to catastrophic models: a 25.5% accuracy parent still contributed useful signal
|
| 366 |
+
- Diverse parent selection beats top-N selection
|
| 367 |
+
|
| 368 |
+
**Proven results:** Student 0.761 from parents averaging 0.664; still accelerating at E30
|
| 369 |
+
|
| 370 |
+
**When to use:** When you have 2+ trained models and want a superior student.
|
| 371 |
+
|
| 372 |
+
---
|
| 373 |
+
|
| 374 |
+
## Form 11: Multi-Generational Geometric Evolution
|
| 375 |
+
|
| 376 |
+
**Source:** GM3 Β§5
|
| 377 |
+
|
| 378 |
+
**Purpose:** Iterated consensus distillation across generations with data diversity.
|
| 379 |
+
|
| 380 |
+
**Pipeline:**
|
| 381 |
+
```
|
| 382 |
+
Gen 0: N founders trained independently β GPA β consensus anchors
|
| 383 |
+
Gen 1: M offspring from Gen 0 consensus + new founder (immigration)
|
| 384 |
+
Gen 2+: Previous gen offspring + founder β GPA β consensus β next gen
|
| 385 |
+
Each generation trains on differently-perturbed data
|
| 386 |
+
```
|
| 387 |
+
|
| 388 |
+
**Key properties:**
|
| 389 |
+
- Monotonically improving across generations
|
| 390 |
+
- Each generation inherits consensus-derived anchor coordinates
|
| 391 |
+
- Fresh founders each generation prevent convergence collapse (gene flow)
|
| 392 |
+
- Robust: catastrophic models don't poison the lineage
|
| 393 |
+
- Diverse data across generations captures INVARIANT structure
|
| 394 |
+
- CV converges toward 0.2 naturally across generations
|
| 395 |
+
|
| 396 |
+
**Proven results:** Gen 0 mean=0.664 β Gen 4 best=0.775; FUSE_distilled=0.830
|
| 397 |
+
|
| 398 |
+
**When to use:** When you want to compound geometric knowledge across training runs.
|
| 399 |
+
|
| 400 |
+
---
|
| 401 |
+
|
| 402 |
+
## Form 12: Geometric Autograd (Optimizer)
|
| 403 |
+
|
| 404 |
+
**Source:** GM3 Β§2
|
| 405 |
+
|
| 406 |
+
**Purpose:** Gradient filtering that replaces weight decay with manifold-aware optimization.
|
| 407 |
+
|
| 408 |
+
**Components:**
|
| 409 |
+
```
|
| 410 |
+
Embedding backward:
|
| 411 |
+
β Decompose gradient into tangential + radial relative to S^(d-1)
|
| 412 |
+
β Pass tangential fully, attenuate radial by (1 - tang_strength)
|
| 413 |
+
β If gradient moves toward nearest anchor: attenuate by sep_strength
|
| 414 |
+
|
| 415 |
+
Anchor backward:
|
| 416 |
+
β Project gradient tangential to hypersphere at anchor position
|
| 417 |
+
β Scale by drift_strength
|
| 418 |
+
|
| 419 |
+
Forward losses (all differentiable):
|
| 420 |
+
β CV: |CV(pentachoron volumes) - 0.2| Γ 0.001
|
| 421 |
+
β Spread: anchor cosΒ² off-diagonal mean Γ 1e-3
|
| 422 |
+
β Ortho: gram off-diagonal β 0 Γ 1e-3
|
| 423 |
+
β Entropy: -Ξ£ pΒ·log(p) Γ 1e-4
|
| 424 |
+
β Cluster var: -var(per-anchor mean cosine) Γ 1e-4
|
| 425 |
+
```
|
| 426 |
+
|
| 427 |
+
**Key properties:**
|
| 428 |
+
- Adam + geometric autograd > AdamW consistently
|
| 429 |
+
- Weight decay destroys the geometric harmonic the autograd creates
|
| 430 |
+
- tang=0.01, sep=1.0 proven optimal
|
| 431 |
+
- CV loss MUST be forward loss, never backward injection
|
| 432 |
+
- Enables resonant dynamics: constructive interference compounds across epochs
|
| 433 |
+
|
| 434 |
+
**When to use:** Training any constellation-based model. The geometry IS the regularization.
|
| 435 |
+
|
| 436 |
+
---
|
| 437 |
+
|
| 438 |
+
## Composition Map
|
| 439 |
+
|
| 440 |
+
| Task | Primary Form | Supporting Forms |
|
| 441 |
+
|------|-------------|-----------------|
|
| 442 |
+
| Image classification (single image) | Form 1 (Core) | Form 12 (Autograd) |
|
| 443 |
+
| Multi-expert fusion | Form 2 (Soup) | Form 12 |
|
| 444 |
+
| Context extension | Form 3 (Memory Bank) | Form 4 (Seq Reconstructor) |
|
| 445 |
+
| Diffusion cross-attention | Form 3 + Form 4 | |
|
| 446 |
+
| Sequence processing (long) | Form 5 (Relay) + Form 6 (Router) | |
|
| 447 |
+
| Diffusion bottleneck | Form 7 (Lookup Table) | Form 8 (GLFM) |
|
| 448 |
+
| Train encoder from scratch | Form 9 (From-Scratch) | Form 2 (Soup as teacher) |
|
| 449 |
+
| Model distillation | Form 10 (Consensus) | Form 12 |
|
| 450 |
+
| Compound improvement | Form 11 (Evolution) | Form 10 + Form 12 |
|
| 451 |
+
|
| 452 |
+
---
|
| 453 |
+
|
| 454 |
+
## What the Constellation IS
|
| 455 |
+
|
| 456 |
+
The constellation is a set of learned anchor points on S^(d-1). It is simultaneously:
|
| 457 |
+
|
| 458 |
+
1. **A measurement instrument** β triangulation computes angular distances to reference points
|
| 459 |
+
2. **A coordinate system** β the triangulation profile IS the geometric address
|
| 460 |
+
3. **A lookup table** β the patchwork generates from the address, not reconstructing the input
|
| 461 |
+
4. **A routing topology** β anchor proximity determines cross-token interaction (Cantor)
|
| 462 |
+
5. **A geometric regularizer** β anchor structure prevents collapse and preserves manifold health
|
| 463 |
+
|
| 464 |
+
The constellation is NOT:
|
| 465 |
+
- An autoencoder (cos_sim β 0 to input in bottleneck form)
|
| 466 |
+
- A positional encoding (it measures WHERE on S^(d-1), not WHERE in sequence)
|
| 467 |
+
- Class prototypes (anchors β classes; anchor count independent of class count)
|
| 468 |
+
- Patches of an image (constellation "patches" = dimensional subspace slices, not spatial tiles)
|