AbstractPhil commited on
Commit
8a787c9
Β·
verified Β·
1 Parent(s): 8c5dc2e

Create constellation.md

Browse files
Files changed (1) hide show
  1. constellation.md +468 -0
constellation.md ADDED
@@ -0,0 +1,468 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Constellation Forms Catalogue
2
+ ## GeoLIP Architecture Reference β€” March 2026
3
+
4
+ Sources:
5
+ - geometric-memory-ft1 (GM1)
6
+ - geometric-memory-ft2 (GM2)
7
+ - geometric-memory-ft3 (GM3)
8
+ - procrustes-vit-hypersphere-ft1 (PVH)
9
+ - constellation-diffusion-bottleneck (CDB)
10
+ - Session benchmarks (SB)
11
+
12
+ ---
13
+
14
+ ## Universal Constants
15
+
16
+ | Constant | Value | Source |
17
+ |----------|-------|--------|
18
+ | Pentachoron CV attractor | 0.20–0.23 | Geometry of S^15 itself (CDB Β§3) |
19
+ | Binding/separation boundary | 0.29154 radians | 5+ architectures (CDB Β§11) |
20
+ | Effective geometric dimension | ~16 | All trained models (CDB Β§3.3) |
21
+ | CV precision invariance | fp64 through 1-bit | CDB Β§3.2 |
22
+
23
+ ## Universal Rules
24
+
25
+ | Rule | Source |
26
+ |------|--------|
27
+ | SquaredReLU in all constellation paths, never GELU | SB activation tests |
28
+ | Patchwork: Linear(tri, triΓ—2) β†’ SquaredReLU β†’ LN β†’ Linear(triΓ—2, dim) | SB proven |
29
+ | Gate init: -3.0 (sigmoid β‰ˆ 0.047) | SB proven |
30
+ | SLERP: only acos in fp32 (16KB tensor), everything else stays in compute dtype | SB fp32 fix |
31
+ | Adam, NO weight decay β€” geometry IS regularization | GM3 Β§2.4, PVH Β§12 |
32
+ | InfoNCE is the alignment FORCE. Procrustes is the REGULARIZER. | GM1 Β§4.1 |
33
+ | CV loss on the BOTTLENECK, not the output | GM1 Β§4.2 |
34
+ | CV loss weight: micro (0.001 or below) | GM3 Β§2.2 |
35
+ | Procrustes calibration is non-negotiable for anchor init | PVH Β§5.1 |
36
+ | Anchor dropout (30%) prevents collapse | PVH Β§5.2 |
37
+
38
+ ---
39
+
40
+ ## Form 1: GeoLIP Core (Classification)
41
+
42
+ **Source:** CDB Β§2
43
+
44
+ **Purpose:** Minimal image classification pipeline. Proves the constellation works as a primary representation layer.
45
+
46
+ **Pipeline:**
47
+ ```
48
+ Input image
49
+ β†’ Conv encoder (builds channel depth: 3β†’64β†’128β†’256)
50
+ β†’ AdaptiveAvgPool β†’ Linear(encoder_out, D) β†’ L2 normalize to S^(d-1)
51
+ β†’ Triangulate against N anchors at 3 SLERP phases β†’ tri_dim profile
52
+ β†’ Patchwork MLP reads triangulation
53
+ β†’ Classifier head β†’ logits
54
+ ```
55
+
56
+ **Key properties:**
57
+ - Every embedding on the unit sphere BEFORE the constellation sees it
58
+ - The conv encoder builds channel depth β€” constellation operates on channel dimension
59
+ - One global vector per image, not a sequence
60
+ - No attention anywhere
61
+
62
+ **Proven results:** 91.5% CIFAR-10, 1.6M params, CV=0.2045, 62/64 active anchors
63
+
64
+ **Loss:** CE + CV on embeddings
65
+
66
+ **When to use:** Single-input classification where the input can be reduced to one D-dimensional vector on S^(d-1).
67
+
68
+ ---
69
+
70
+ ## Form 2: Expert Soup (Multi-Expert Fusion)
71
+
72
+ **Source:** PVH Β§1, Β§4
73
+
74
+ **Purpose:** Fuse multiple frozen pretrained experts into a shared geometric representation on S^(d-1).
75
+
76
+ **Pipeline:**
77
+ ```
78
+ Input image
79
+ β†’ N frozen expert encoders (CLIP, DINOv2, SigLIP, etc.) β†’ N Γ— 768-d
80
+ β†’ GPA alignment at 768-d (iterative Procrustes to mutual mean)
81
+ β†’ PCA to D_ANCHOR dims
82
+ β†’ Per-expert Procrustes-initialized projectors (768 β†’ D_ANCHOR)
83
+ β†’ L2 normalize β†’ shared constellation on S^(D_ANCHOR-1)
84
+ β†’ Triangulate: each expert through its own Procrustes rotation
85
+ β†’ Patchwork reads fused triangulation
86
+ β†’ Classifier
87
+ ```
88
+
89
+ **Key properties:**
90
+ - Experts are FROZEN β€” never modified
91
+ - Procrustes initialization essential (without: 1/256 active anchors, collapsed)
92
+ - Anchor dropout (30%) β†’ 508/512 active anchors
93
+ - Effective dimensionality matches task complexity (76.9 for COCO's 80 classes)
94
+ - Pipeline is almost entirely linear: 7 linear ops + 2 nonlinearities (GELU in patchwork + classifier)
95
+ - Weight decay explicitly avoided
96
+
97
+ **Proven results:** mAP=0.84 ceiling (data-limited), perfect hypersphere verified (1000/1000 positive volumes), 508/512 active anchors
98
+
99
+ **Loss:** InfoNCE(fused, consensus) + MSE + BCE + Procrustes_align + CV + anchor_spread
100
+
101
+ **Optimizer:** Adam lr=1e-3, NO weight decay
102
+
103
+ **When to use:** Combining multiple pretrained encoders into a shared geometric space for downstream tasks.
104
+
105
+ ---
106
+
107
+ ## Form 3: Geometric Memory / Anchor Bank (Context Extension)
108
+
109
+ **Source:** GM1 Β§2, GM2 Β§2
110
+
111
+ **Purpose:** Extend a frozen encoder's context window by accumulating segment-level geometric addresses in a memory bank.
112
+
113
+ **Pipeline:**
114
+ ```
115
+ Long document (N tokens, N >> encoder context)
116
+ β†’ Split into overlapping segments (sized to encoder window)
117
+ β†’ For each segment:
118
+ β†’ Frozen encoder forward β†’ hidden states at multiple layers
119
+ β†’ Multi-layer fusion (learned weighted sum)
120
+ β†’ Memory tokens cross-attend to fused hidden states
121
+ β†’ Depth-profile compressor: per-layer CLS β†’ single anchor (L2-normalized)
122
+ β†’ Anchor stored in geometric memory bank
123
+ β†’ GRU gate updates rolling memory state
124
+ β†’ Final output: encoder-compatible embedding
125
+ ```
126
+
127
+ **Key properties:**
128
+ - Frozen encoder, trainable memory wrapper
129
+ - Depth-profile anchors encode HOW the encoder processed (not just WHAT)
130
+ - CV loss on the BANK ANCHORS specifically β€” the bottleneck between segments
131
+ - Without CV on bank: projector shortcut collapse (m_acc plateaus at 0.670)
132
+ - With CV on bank: m_acc reaches 0.945
133
+ - Segment size must produce 5+ anchors for CV computation (pentachoron needs 5 points)
134
+ - Convergence order: CV locks first β†’ m_acc climbs β†’ s_cos climbs last
135
+
136
+ **Proven results:**
137
+ - GEOLIP-BERT-8192: m_acc=0.927, CV=0.200 (512β†’8192 context)
138
+ - GEOLIP-CLIP-ctx576: m_acc=0.945, CV=0.162 (77β†’576 context)
139
+
140
+ **Loss:** InfoNCE(student, teacher) + Procrustes_SVD + |CV(bank_anchors) - 0.20|
141
+
142
+ **When to use:** Extending frozen encoder context windows while preserving embedding space compatibility.
143
+
144
+ ---
145
+
146
+ ## Form 4: Sequence Reconstructor (Per-Position Output)
147
+
148
+ **Source:** GM2 Β§2
149
+
150
+ **Purpose:** Produce full per-position output sequences from memory state for diffusion cross-attention.
151
+
152
+ **Pipeline:**
153
+ ```
154
+ Memory state (from Form 3 bank accumulation)
155
+ β†’ Context = cat(memory_tokens, bank_anchors, content_tokens)
156
+ β†’ 77 learned query tokens + positional encoding
157
+ β†’ Cross-attend to context (2 layers)
158
+ β†’ Self-attend among 77 output positions (2 layers)
159
+ β†’ Output: (B, 77, 768) β€” in frozen encoder's native distribution
160
+ ```
161
+
162
+ **Key properties:**
163
+ - Must produce output in the distribution the UNet was trained on
164
+ - Training target: frozen encoder's own output on same caption (truncated to 77 tokens)
165
+ - Two teachers: ModernBERT teaches what to remember, CLIP teaches how to say it
166
+ - Two-phase training works for CLIP-L but NOT universally
167
+ - Rule: if you need per-position output, train the per-position consumer from the start
168
+ - Memory format shaped by gradient loudness, not architectural capacity
169
+
170
+ **Proven results:**
171
+ - CLIP-L s_cos=0.734, tulips appeared in SD 1.5 from elements past token 77
172
+ - Meridian (bigG): s_cos=0.425 (limited by 1280β†’1024 dimensional mismatch)
173
+
174
+ **Loss:** MSE(normalize(pred), normalize(target)) + cosine_similarity + InfoNCE(pooled)
175
+
176
+ **When to use:** When downstream consumer needs per-position sequences (diffusion cross-attention, token-level tasks).
177
+
178
+ ---
179
+
180
+ ## Form 5: Constellation Relay (Per-Token Geometric Layer)
181
+
182
+ **Source:** CDB Β§4, SB
183
+
184
+ **Purpose:** Replace attention as a per-token processing layer. O(S) complexity. Preserves geometry at depth.
185
+
186
+ **Pipeline:**
187
+ ```
188
+ Input (B, S, D) or (B, D)
189
+ β†’ LayerNorm
190
+ β†’ Chunk D into P patches of patch_dim (e.g., 16 Γ— 16d = 256d)
191
+ β†’ L2 normalize each patch to S^(d-1)
192
+ β†’ Triangulate against anchors at 3 SLERP phases β†’ tri_dim profile
193
+ β†’ Patchwork MLP reads triangulation
194
+ β†’ Gated residual (gate init -3.0)
195
+ β†’ Output = residual + gate * patchwork_out
196
+ ```
197
+
198
+ **Key properties:**
199
+ - Per-token, no cross-token interaction
200
+ - O(S) time and memory β€” no SΒ² term
201
+ - Preserves 99.4% cosine similarity to input at depth 16 (vs 7.4% for attention)
202
+ - 3.4Γ— fewer parameters than vanilla attention
203
+ - Geometric preservation is sequence-length invariant (identical from S=64 through S=131072)
204
+ - Throughput crossover vs attention at Sβ‰ˆ32K; 8.4Γ— faster at S=131K
205
+ - SquaredReLU wins: better anchor diversity (7.1 vs 4.6), better equivariance, 0.9999 reconstruction
206
+
207
+ **Proven results:** cos_to_orig=0.994 at depth 16, 8.4Γ— faster than attention at S=131K
208
+
209
+ **When to use:** Processing token sequences where geometric preservation matters more than cross-token mixing. Stackable. The per-token processing layer.
210
+
211
+ ---
212
+
213
+ ## Form 6: Cantor Constellation Router (Cross-Token Routing)
214
+
215
+ **Source:** SB (cantor_constellation_relay.py)
216
+
217
+ **Purpose:** O(S) cross-token routing through the constellation's own anchor hierarchy. Replaces attention's cross-token role.
218
+
219
+ **Pipeline:**
220
+ ```
221
+ Input tokens (B, S, D) + triangulation profiles (B, S, tri_dim) from relay
222
+ β†’ Compute soft routing weights from phase-0 triangulation distances
223
+ β†’ For each level l in binary anchor tree (16β†’8β†’4β†’2β†’1):
224
+ β†’ Merge anchor weights into group weights at level l
225
+ β†’ Weighted scatter: tokens β†’ group summaries (bmm)
226
+ → Transform: per-level MLP(dim→dim×2→dim) + LN
227
+ β†’ Weighted gather: group summaries β†’ token updates (bmm)
228
+ β†’ Gated residual at each level
229
+ β†’ Output: tokens with cross-token information
230
+ ```
231
+
232
+ **Key properties:**
233
+ - O(S Γ— n_levels Γ— D) where n_levels = log2(A) + 1 = 5 for A=16
234
+ - No SΒ² term anywhere β€” not in compute, not in memory
235
+ - Triangulation from the per-token relay IS the routing key (zero redundant computation)
236
+ - Binary tree over anchors defines hierarchy (16β†’8β†’4β†’2β†’1 groups)
237
+ - At each level: scatter β†’ transform β†’ gather
238
+ - Cantor routing holds at distance BETTER than attention (2Γ— stronger at S=4096)
239
+ - The router is a geometric REGULARIZER: cos_orig=0.9818 at 8 layers vs relay alone 0.6533
240
+ - Geometry IMPROVES with more tokens (0.982β†’0.986 as S increases)
241
+
242
+ **Proven results:** 97.0% cross-token task acc, 0.986 cos preservation at 131K tokens, 5.2Γ— faster than attention at 131K
243
+
244
+ **When to use:** Combined with Form 5 relay as a complete O(S) transformer layer replacement (ConstellationCantorRelay).
245
+
246
+ ---
247
+
248
+ ## Form 7: Diffusion Bottleneck / Geometric Lookup Table
249
+
250
+ **Source:** CDB Β§7–9
251
+
252
+ **Purpose:** The constellation as the sole information bottleneck of a diffusion model. NOT an autoencoder.
253
+
254
+ **Pipeline:**
255
+ ```
256
+ Encoder features (256Γ—8Γ—8 = 16384-d)
257
+ β†’ Linear(16384, 256) β†’ L2 normalize to S^15
258
+ β†’ Reshape (B, 16, 16) β†’ per-patch S^15 normalization
259
+ β†’ Triangulate: 16 patches Γ— 16 anchors Γ— 3 phases = 768 dims
260
+ β†’ Concat(768 tri dims, conditioning dims)
261
+ β†’ Patchwork MLP β†’ Linear(hidden, 16384) β†’ reshape β†’ decoder
262
+ ```
263
+
264
+ **Key properties:**
265
+ - Compression ratio: 16384 β†’ 768 = 21.3Γ—
266
+ - cos_sim β‰ˆ 0 to input β€” the bottleneck does NOT reconstruct
267
+ - It's a geometric LOOKUP TABLE: triangulation profile is an address, patchwork generates from that address
268
+ - Works for flow matching because training signal is velocity prediction, not reconstruction
269
+ - Skip bypass experiment: given 268M linear bypass, model routed 88% through 768 constellation dims
270
+ - Constellation-only cos_sim=0.945 to full model; skip-only cos_sim=0.598
271
+ - The constellation provides a representational ADVANTAGE over unconstrained capacity
272
+
273
+ **Proven results:** Loss 0.1749 (beat 268M skip at 0.1757), 46% anchor convergence to 0.29154 in GLFM
274
+
275
+ **Loss:** Flow matching velocity loss (MSE on predicted vs target velocity)
276
+
277
+ **When to use:** Diffusion model bottleneck where geometric addressing replaces reconstruction.
278
+
279
+ ---
280
+
281
+ ## Form 8: Geometric Lookup Flow Matching (GLFM)
282
+
283
+ **Source:** CDB Β§10
284
+
285
+ **Purpose:** Formalized three-stage flow matching variant where velocity prediction is driven by geometric address lookup.
286
+
287
+ **Pipeline:**
288
+ ```
289
+ Stage 1 β€” Geometric Addressing:
290
+ Encoder output β†’ project to S^15 at two scales:
291
+ Coarse: global avg pool β†’ 256d β†’ L2 norm β†’ triangulate (768d)
292
+ Fine: per-spatial β†’ 256d β†’ L2 norm β†’ triangulate β†’ aggregate (768d)
293
+ Total address: 1536 dims of angular measurements
294
+
295
+ Stage 2 β€” Address Conditioning:
296
+ Geometric address + sinusoidal timestep + class embed + noise-level bins
297
+ β†’ Fused projection to generator input dim
298
+
299
+ Stage 3 β€” Velocity Generation:
300
+ Deep residual MLP generates velocity features from conditioned address
301
+ 4 residual blocks width 1024 β†’ 16384-d spatial features β†’ decoder
302
+ ```
303
+
304
+ **Key properties:**
305
+ - Explicit separation of addressing, conditioning, and generation
306
+ - Multi-scale collapse observed: coarse↔fine cos=0.933 (needs pre-differentiated features like DINOv2)
307
+ - 46% of anchors converged within Β±0.05 of 0.29154 binding constant
308
+ - 59% of anchors crossed binding boundary into task-specific territory
309
+
310
+ **Proven results:** Loss 0.1754, accelerated drift convergence vs pure bottleneck
311
+
312
+ **When to use:** Flow matching diffusion where you want explicit geometric addressing.
313
+
314
+ ---
315
+
316
+ ## Form 9: From-Scratch Encoder (Pixel β†’ Consensus)
317
+
318
+ **Source:** PVH Β§4.2
319
+
320
+ **Purpose:** Train a ViT from random initialization to reproduce the expert soup consensus embedding from raw pixels.
321
+
322
+ **Pipeline:**
323
+ ```
324
+ Raw pixels
325
+ β†’ From-scratch ViT (no pretrained weights)
326
+ β†’ Project to D_ANCHOR dims β†’ L2 normalize
327
+ β†’ Train against frozen soup consensus as differentiable teacher
328
+ ```
329
+
330
+ **Key properties:**
331
+ - The soup is the teacher β€” it provides the target embedding for each image
332
+ - Gradient bottleneck: all gradient flows through D_ANCHOR-dimensional output
333
+ - With 77M params and 128-d output: gradient density = 1.6Γ—10⁻⁢ per param
334
+ - Expansion warm-start works: 384-d→1024-d by padding, recovers in 5 epochs
335
+
336
+ **Proven results:** 1024-d ViT reached cos=0.663, mAP=0.500 (limited by gradient bottleneck and 118K COCO)
337
+
338
+ **Loss:** Same as soup training + geometric autograd
339
+
340
+ **When to use:** When you need a single encoder that reproduces multi-expert consensus from raw input.
341
+
342
+ ---
343
+
344
+ ## Form 10: Dual-Teacher Consensus Distillation
345
+
346
+ **Source:** GM3 Β§4
347
+
348
+ **Purpose:** Two independently-trained models β†’ GPA consensus β†’ distill into student that exceeds both.
349
+
350
+ **Pipeline:**
351
+ ```
352
+ Teacher A (any config) + Teacher B (any config)
353
+ β†’ Extract embeddings on shared data
354
+ β†’ GPA-align iteratively until Ξ΄ < 1e-8
355
+ β†’ Consensus = L2_normalize(mean_shape)
356
+ β†’ Student initializes anchors from k-means on consensus
357
+ β†’ Train with: CE + InfoNCE(emb, consensus) + MSE(emb, consensus) + micro CV
358
+ β†’ Geometric autograd: tang=0.01, sep=1.0
359
+ ```
360
+
361
+ **Key properties:**
362
+ - Student exceeds BOTH teachers (0.761 vs 0.699/0.649)
363
+ - Student still ACCELERATING at epoch 30 (resonant dynamics)
364
+ - Consensus is the geometric truth β€” what both agree on after removing rotational frames
365
+ - Robust to catastrophic models: a 25.5% accuracy parent still contributed useful signal
366
+ - Diverse parent selection beats top-N selection
367
+
368
+ **Proven results:** Student 0.761 from parents averaging 0.664; still accelerating at E30
369
+
370
+ **When to use:** When you have 2+ trained models and want a superior student.
371
+
372
+ ---
373
+
374
+ ## Form 11: Multi-Generational Geometric Evolution
375
+
376
+ **Source:** GM3 Β§5
377
+
378
+ **Purpose:** Iterated consensus distillation across generations with data diversity.
379
+
380
+ **Pipeline:**
381
+ ```
382
+ Gen 0: N founders trained independently β†’ GPA β†’ consensus anchors
383
+ Gen 1: M offspring from Gen 0 consensus + new founder (immigration)
384
+ Gen 2+: Previous gen offspring + founder β†’ GPA β†’ consensus β†’ next gen
385
+ Each generation trains on differently-perturbed data
386
+ ```
387
+
388
+ **Key properties:**
389
+ - Monotonically improving across generations
390
+ - Each generation inherits consensus-derived anchor coordinates
391
+ - Fresh founders each generation prevent convergence collapse (gene flow)
392
+ - Robust: catastrophic models don't poison the lineage
393
+ - Diverse data across generations captures INVARIANT structure
394
+ - CV converges toward 0.2 naturally across generations
395
+
396
+ **Proven results:** Gen 0 mean=0.664 β†’ Gen 4 best=0.775; FUSE_distilled=0.830
397
+
398
+ **When to use:** When you want to compound geometric knowledge across training runs.
399
+
400
+ ---
401
+
402
+ ## Form 12: Geometric Autograd (Optimizer)
403
+
404
+ **Source:** GM3 Β§2
405
+
406
+ **Purpose:** Gradient filtering that replaces weight decay with manifold-aware optimization.
407
+
408
+ **Components:**
409
+ ```
410
+ Embedding backward:
411
+ β†’ Decompose gradient into tangential + radial relative to S^(d-1)
412
+ β†’ Pass tangential fully, attenuate radial by (1 - tang_strength)
413
+ β†’ If gradient moves toward nearest anchor: attenuate by sep_strength
414
+
415
+ Anchor backward:
416
+ β†’ Project gradient tangential to hypersphere at anchor position
417
+ β†’ Scale by drift_strength
418
+
419
+ Forward losses (all differentiable):
420
+ β†’ CV: |CV(pentachoron volumes) - 0.2| Γ— 0.001
421
+ β†’ Spread: anchor cosΒ² off-diagonal mean Γ— 1e-3
422
+ β†’ Ortho: gram off-diagonal β†’ 0 Γ— 1e-3
423
+ β†’ Entropy: -Ξ£ pΒ·log(p) Γ— 1e-4
424
+ β†’ Cluster var: -var(per-anchor mean cosine) Γ— 1e-4
425
+ ```
426
+
427
+ **Key properties:**
428
+ - Adam + geometric autograd > AdamW consistently
429
+ - Weight decay destroys the geometric harmonic the autograd creates
430
+ - tang=0.01, sep=1.0 proven optimal
431
+ - CV loss MUST be forward loss, never backward injection
432
+ - Enables resonant dynamics: constructive interference compounds across epochs
433
+
434
+ **When to use:** Training any constellation-based model. The geometry IS the regularization.
435
+
436
+ ---
437
+
438
+ ## Composition Map
439
+
440
+ | Task | Primary Form | Supporting Forms |
441
+ |------|-------------|-----------------|
442
+ | Image classification (single image) | Form 1 (Core) | Form 12 (Autograd) |
443
+ | Multi-expert fusion | Form 2 (Soup) | Form 12 |
444
+ | Context extension | Form 3 (Memory Bank) | Form 4 (Seq Reconstructor) |
445
+ | Diffusion cross-attention | Form 3 + Form 4 | |
446
+ | Sequence processing (long) | Form 5 (Relay) + Form 6 (Router) | |
447
+ | Diffusion bottleneck | Form 7 (Lookup Table) | Form 8 (GLFM) |
448
+ | Train encoder from scratch | Form 9 (From-Scratch) | Form 2 (Soup as teacher) |
449
+ | Model distillation | Form 10 (Consensus) | Form 12 |
450
+ | Compound improvement | Form 11 (Evolution) | Form 10 + Form 12 |
451
+
452
+ ---
453
+
454
+ ## What the Constellation IS
455
+
456
+ The constellation is a set of learned anchor points on S^(d-1). It is simultaneously:
457
+
458
+ 1. **A measurement instrument** β€” triangulation computes angular distances to reference points
459
+ 2. **A coordinate system** β€” the triangulation profile IS the geometric address
460
+ 3. **A lookup table** β€” the patchwork generates from the address, not reconstructing the input
461
+ 4. **A routing topology** β€” anchor proximity determines cross-token interaction (Cantor)
462
+ 5. **A geometric regularizer** β€” anchor structure prevents collapse and preserves manifold health
463
+
464
+ The constellation is NOT:
465
+ - An autoencoder (cos_sim β‰ˆ 0 to input in bottleneck form)
466
+ - A positional encoding (it measures WHERE on S^(d-1), not WHERE in sequence)
467
+ - Class prototypes (anchors β‰  classes; anchor count independent of class count)
468
+ - Patches of an image (constellation "patches" = dimensional subspace slices, not spatial tiles)