AbstractPhil commited on
Commit
c972e15
·
verified ·
1 Parent(s): d2cf221

Create FORMULAS.md

Browse files
Files changed (1) hide show
  1. FORMULAS.md +745 -0
FORMULAS.md ADDED
@@ -0,0 +1,745 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Geometric Formula Catalog
2
+ ## Token Topology & Loss System — AbstractPhil + Claude
3
+
4
+ *ROSE loss discarded. These are the active formulas.*
5
+
6
+ ---
7
+
8
+ ## 1. Multi-Scale Crystal Loss
9
+
10
+ Classification through learnable crystal prototypes at multiple projection dimensions. Each class has a crystal centroid at each scale. No softmax — geometric distance IS the classifier.
11
+
12
+ **Scales:** `[64, 128, 256, 512, 1024]` (each is a projection dimension, not spatial)
13
+
14
+ ### 1.1 Per-Scale Crystal Similarity
15
+
16
+ ```
17
+ sim(x, c_k) = (x̂ · ĉ_k) / τ
18
+
19
+ where:
20
+ x̂ = normalize(proj_k(features)) # [B, scale_dim]
21
+ ĉ_k = normalize(crystals_k) # [num_classes, scale_dim]
22
+ τ = temperature (default 0.07)
23
+ ```
24
+
25
+ ### 1.2 Per-Scale Coherence Loss
26
+
27
+ Pull features toward their correct class crystal:
28
+
29
+ ```
30
+ L_coherence = -mean(log(exp(sim(x, c_y)) / Σ_j exp(sim(x, c_j))))
31
+
32
+ where y = true class label
33
+ ```
34
+
35
+ ### 1.3 Per-Scale Separation Loss
36
+
37
+ Push class crystals apart with margin:
38
+
39
+ ```
40
+ L_separation = Σ_{i≠j} max(0, margin - ||ĉ_i - ĉ_j||₂)² / (C(C-1))
41
+
42
+ where C = num_classes, margin = 1.0
43
+ ```
44
+
45
+ ### 1.4 Per-Scale Discretization Loss (Cantor Targets)
46
+
47
+ Cluster crystal Cantor values toward `{0.0, 0.5, 1.0}`:
48
+
49
+ ```
50
+ L_discretization = mean(min_t(||cantor(c_i) - t||²))
51
+
52
+ where t ∈ {0.0, 0.5, 1.0}
53
+ ```
54
+
55
+ ### 1.5 Per-Scale Crystal Geometry Loss
56
+
57
+ Maintain target distance from features to class prototypes:
58
+
59
+ ```
60
+ L_geometry = mean((||x - c_y||₂ - d_target)²)
61
+
62
+ where d_target = 1.0
63
+ ```
64
+
65
+ ### 1.6 Total Multi-Scale Crystal Loss
66
+
67
+ ```
68
+ L_crystal = (1/S) Σ_{k=1}^{S} w_k · (
69
+ w_coh · L_coherence_k +
70
+ w_sep · L_separation_k +
71
+ w_disc · L_discretization_k +
72
+ w_geom · L_geometry_k
73
+ )
74
+
75
+ Proven weights: w_coh=1.0, w_sep=0.5, w_disc=1.0, w_geom=0.5
76
+ ```
77
+
78
+ ### 1.7 Crystal Prediction (No Softmax Head)
79
+
80
+ ```
81
+ logits = Σ_k w_k · (α · cos_sim_k + β · cantor_coherence_k + γ · crystal_geometry_k)
82
+
83
+ where prediction = argmax(logits)
84
+ ```
85
+
86
+ **Results:** 86% ImageNet (CLIP bigG features), 74.87% CIFAR-100 (393K params), ~92% CIFAR-100 (78KB model)
87
+
88
+ ---
89
+
90
+ ## 2. Geometric Basin Compatibility Loss
91
+
92
+ Classification through geometric formula satisfaction. Four structural checks produce compatibility scores ∈ [0,1]. No cross-entropy needed.
93
+
94
+ ### 2.1 Triadic Compatibility
95
+
96
+ ```
97
+ T(x, c) = exp(-||proj(x) - c||₂² / (2σ²))
98
+
99
+ where c = class centroid, σ = learned bandwidth
100
+ ```
101
+
102
+ ### 2.2 Self-Similarity Check
103
+
104
+ ```
105
+ S(x) = exp(-Var(cantor_levels(x)))
106
+
107
+ where cantor_levels extracts per-level Cantor measures
108
+ High self-similarity → low variance across levels → high score
109
+ ```
110
+
111
+ ### 2.3 Cantor Coherence Check
112
+
113
+ ```
114
+ C(x, p_y) = exp(-||cantor(x) - p_y||₂²)
115
+
116
+ where p_y = class Cantor prototype
117
+ ```
118
+
119
+ ### 2.4 Hierarchical Check
120
+
121
+ ```
122
+ H(x) = Σ_{k=1}^{L} 0.5^k · match(level_k(x), expected_k)
123
+ ```
124
+
125
+ ### 2.5 Combined Compatibility Score
126
+
127
+ ```
128
+ compat(x, class_j) = T(x, c_j) · S(x) · C(x, p_j) · H(x)
129
+
130
+ Product of four factors ∈ [0,1] → output ∈ [0,1]
131
+ ```
132
+
133
+ ### 2.6 Basin Loss (Three-Term, No Cross-Entropy)
134
+
135
+ ```
136
+ L_correct = -mean(log(compat(x, y) + ε))
137
+ L_incorrect = -mean(log(1 - compat(x, j≠y) + ε))
138
+ L_contrastive = NLL(log_softmax(compat / τ), y)
139
+
140
+ L_basin = L_correct + 0.5 · L_incorrect + 0.5 · L_contrastive
141
+ ```
142
+
143
+ **Results:** 67.69% CIFAR-100 with NO attention, NO cross-entropy, NO transformers (geo-beatrix). Beat ViT-beatrix (66.0%).
144
+
145
+ ---
146
+
147
+ ## 3. K-Simplex Channel Formulas
148
+
149
+ Tokens represented as k-simplices with Cayley-Menger validated geometry. Shape `[B, T, K+1, F]` where K+1 = vertices.
150
+
151
+ ### 3.1 Template + Deformation
152
+
153
+ ```
154
+ v_i = v_i^{template} + α · Δv_i
155
+
156
+ where:
157
+ v_i^{template} = regular k-simplex vertices (frozen)
158
+ α = deformation scale (0.05 base, per-k scaled)
159
+ Δv_i = learned offset from neural network
160
+ ```
161
+
162
+ ### 3.2 K-Scaled Deformation
163
+
164
+ Volume scales as `edge^k`, so higher k needs smaller deformation:
165
+
166
+ ```
167
+ α_k = α_base / √(k + 1)
168
+
169
+ k=1: α × 0.71 k=3: α × 0.50
170
+ k=2: α × 0.58 k=4: α × 0.45
171
+ ```
172
+
173
+ ### 3.3 Per-Token Simplex Coordinates
174
+
175
+ ```
176
+ coords = proj(token_embedding) # [B, T, edim]
177
+ vertex_weights = softmax(route(token_embedding)) # [B, T, K+1]
178
+ simplex_state = vertex_weights @ vertices # [B, T, edim]
179
+ ```
180
+
181
+ ### 3.4 K-Simplex Attention (Proven Superior to K-Simplex Classification)
182
+
183
+ ```
184
+ For each token pair (i, j):
185
+ d²_ij = ||simplex_i - simplex_j||² # pairwise simplex distance
186
+ attn_ij = softmax(-d²_ij / τ) # geometric attention weights
187
+
188
+ Output = attn @ V # standard value projection
189
+ ```
190
+
191
+ **Results:** 89.13% FMNIST, 84.59% CIFAR-10, 69.08% CIFAR-100 as attention. Entropy decreases through layers (sharpening). Fewer tokens = sharper attention (25 patches > 64 patches).
192
+
193
+ ---
194
+
195
+ ## 4. Cayley-Menger Formulas
196
+
197
+ The structural invariant. If CM fails, geometry is invalid. Non-negotiable.
198
+
199
+ ### 4.1 Cayley-Menger Matrix
200
+
201
+ ```
202
+ CM = | 0 1 1 ... 1 |
203
+ | 1 0 d₀₁² ... d₀���² |
204
+ | 1 d₀₁² 0 ... d₁ₖ² |
205
+ | ⋮ ⋮ ⋮ ⋱ ⋮ |
206
+ | 1 d₀ₖ² d₁ₖ² ... 0 |
207
+
208
+ Size: (K+2) × (K+2) for a K-simplex
209
+ ```
210
+
211
+ ### 4.2 Volume Formula (Corrected)
212
+
213
+ ```
214
+ Vol² = (-1)^(K+1) / (2^K · (K!)²) · det(CM)
215
+
216
+ Validity: Vol² > 0 indicates non-degenerate simplex
217
+ ```
218
+
219
+ ### 4.3 Gram Determinant Alternative (More Stable)
220
+
221
+ ```
222
+ X_translated = X[:, 1:, :] - X[:, 0:1, :] # [B, K, D]
223
+ G = X_translated @ X_translated.T # [B, K, K]
224
+ Vol = √(det(G)) / K!
225
+ ```
226
+
227
+ ### 4.4 Validity Loss
228
+
229
+ ```
230
+ L_validity = mean(ReLU(-Vol²))
231
+
232
+ Penalizes collapsed simplices (Vol² < 0)
233
+ ```
234
+
235
+ ### 4.5 Volume Consistency Loss
236
+
237
+ ```
238
+ L_vol_consistency = Var(Vol²) across batch
239
+
240
+ Encourages uniform geometric structure
241
+ ```
242
+
243
+ ### 4.6 Hierarchical Cell Loss (k=4 pentachoron)
244
+
245
+ ```
246
+ 5 cells (tetrahedra), each with 4 vertices, 6 edges:
247
+
248
+ L_cell = mean(ReLU(ε - Vol²_cell_i))
249
+
250
+ for i = 1..5 cells of the pentachoron
251
+ ```
252
+
253
+ ### 4.7 Vol² Scaling Reference
254
+
255
+ ```
256
+ k=1: Vol² ~ 1e+0 (edge length squared)
257
+ k=2: Vol² ~ 1e-1 (triangle area squared)
258
+ k=3: Vol² ~ 1e-2 (tetrahedron volume squared)
259
+ k=4: Vol² ~ 1e-3 (5-cell hypervolume squared)
260
+ ```
261
+
262
+ ---
263
+
264
+ ## 5. Cantor Lens Formulas
265
+
266
+ The Devil's Staircase as a hierarchical lens for viewing token relationships.
267
+
268
+ ### 5.1 Devil's Staircase (Beatrix Staircase)
269
+
270
+ ```
271
+ C(x) = Σ_{k=1}^{levels} bit_k × 0.5^k
272
+
273
+ where:
274
+ y_k = x × 3^k # scale to level k
275
+ p = softmax(-d²/τ) over centers [0.5, 1.5, 2.5]
276
+ bit_k = p_right + α × p_middle # soft ternary assignment
277
+ α = learnable middle-third fill (default 0.5)
278
+ τ = softmax temperature (default 0.25)
279
+ ```
280
+
281
+ ### 5.2 Branch Path Extraction
282
+
283
+ ```
284
+ branch_path(x) = [argmax(p_1), argmax(p_2), ..., argmax(p_L)]
285
+
286
+ Each level: L (left third), M (middle third), R (right third)
287
+ ```
288
+
289
+ ### 5.3 Hierarchical Alignment (NOT Distance)
290
+
291
+ **CRITICAL: Distance is meaningless on Cantor set.**
292
+
293
+ ```
294
+ alignment(i, j) = Σ_{k=1}^{L} 0.5^k · 𝟙(path_i[k] == path_j[k])
295
+
296
+ Level weights: [0.5, 0.25, 0.125, 0.0625, 0.03125]
297
+ ```
298
+
299
+ Coarse matches = routing highways (wormholes).
300
+ Fine matches = local structure only.
301
+
302
+ ### 5.4 Euclidean Bridge (Lossy but Necessary)
303
+
304
+ ```
305
+ distance(i, j) = |C(x_i) - C(x_j)|
306
+
307
+ Use ONLY when interfacing with Euclidean systems (optimizers, standard losses).
308
+ Alignment is the Cantor-native metric.
309
+ ```
310
+
311
+ ### 5.5 Cantor Routing Bias (for Attention)
312
+
313
+ ```
314
+ bias[i,j] = alignment(i, j) # precomputed [S, S] matrix
315
+
316
+ attn_scores = (Q @ K.T / √d) + λ · bias
317
+
318
+ where λ = learnable routing weight
319
+ ```
320
+
321
+ ### 5.6 Alpha Modulation
322
+
323
+ ```
324
+ α → 0.0: Pure ternary (Cantor dust, maximally disconnected)
325
+ α → 0.5: Triadic equilibrium (proven stable zone: 0.44-0.50)
326
+ α → 1.0: Filled (continuous, no fractal structure)
327
+ ```
328
+
329
+ ---
330
+
331
+ ## 6. Cantor Topological Ropes
332
+
333
+ Position encodings that encode structural hierarchy, not just sequence order.
334
+
335
+ ### 6.1 Standard RoPE (Baseline)
336
+
337
+ ```
338
+ θ_i = 10000^(-2i/d)
339
+ R(m) = [cos(mθ_i), -sin(mθ_i); sin(mθ_i), cos(mθ_i)]
340
+
341
+ for dimension pair (2i, 2i+1) at position m
342
+ ```
343
+
344
+ ### 6.2 BeatrixRoPE (Devil's Staircase Warping)
345
+
346
+ ```
347
+ pos_beatrix(m) = C(m / seq_len) # Cantor function of normalized position
348
+
349
+ R_beatrix(m) = R(pos_beatrix(m) × seq_len)
350
+ ```
351
+
352
+ Tokens in same ternary branch get **similar** positions → attend easily.
353
+ Creates hierarchical plateaus.
354
+
355
+ ### 6.3 CantorRoPE (Wormhole Shortcuts)
356
+
357
+ ```
358
+ pos_cantor(m) = trend × m + deviation × wormhole(m)
359
+
360
+ where:
361
+ trend = 1.0 (aligns macro slope with standard RoPE)
362
+ deviation = learnable perturbation scale
363
+ wormhole(m) = branch_path_alignment signal
364
+ ```
365
+
366
+ Tokens with aligned branch paths can shortcut regardless of sequential distance.
367
+
368
+ ### 6.4 Aligned Triad (Proven Configuration)
369
+
370
+ ```
371
+ Standard: linear baseline "this comes after that"
372
+ Beatrix: hierarchical plateaus "these belong together"
373
+ Cantor: wormhole perturbations "these can shortcut"
374
+
375
+ All share same macro slope (trend=1.0), different micro structure.
376
+ ```
377
+
378
+ ### 6.5 Tower Assignment
379
+
380
+ ```
381
+ Tower_positive = BeatrixRoPE(...) # hierarchical reasoning
382
+ Tower_negative = CantorRoPE(...) # wormhole reasoning
383
+
384
+ Signed pairs create differential forces in oscillator fusion.
385
+ ```
386
+
387
+ ---
388
+
389
+ ## 7. Beatrix Oscillation Formulas (GeoFractal Router)
390
+
391
+ Physics-based fusion replacing static weighted sums. Tower outputs are force fields, not opinions to average.
392
+
393
+ ### 7.1 Covariant Dynamics
394
+
395
+ ```
396
+ dx/dt = v
397
+ dv/dt = -2β(t)·v - ω²·Log_x(x_ref) + κ(t)·u_towers + γ(t)·ξ_guide
398
+
399
+ where:
400
+ x = position on manifold
401
+ v = velocity in tangent space
402
+ β(t) = damping schedule
403
+ ω = spring frequency
404
+ x_ref = conditioning anchor
405
+ κ(t) = tower coupling strength
406
+ u_towers = force from tower opinions
407
+ γ(t) = guidance strength
408
+ ξ_guide = external guidance (DINO, text, etc.)
409
+ ```
410
+
411
+ ### 7.2 Manifold Operations
412
+
413
+ ```
414
+ Log_x(y) = y - x # tangent vector from x toward y
415
+ Exp_x(v) = x + v # move along tangent vector
416
+ PT_{x→y}(v) = v # parallel transport (flat approx)
417
+ ```
418
+
419
+ ### 7.3 Tower Force Generation
420
+
421
+ ```
422
+ For N towers with signed pairs:
423
+ force_i = proj_i(tower_output_i) # [B, manifold_dim]
424
+ u_towers = Σ_i w_i · force_i # weighted combination
425
+
426
+ Positive towers push toward structure.
427
+ Negative towers push away from collapse.
428
+ ```
429
+
430
+ ### 7.4 Tesla 3-6-9 Schedule
431
+
432
+ ```
433
+ β(t) = β_base + resonance(t)
434
+
435
+ resonance(t) = 0.1·sin(3πt) + 0.05·sin(6πt) + 0.025·sin(9πt)
436
+
437
+ Resonant peaks at t = 1/3, 2/3, 1.0
438
+ Energy doesn't flow linearly — it oscillates.
439
+ ```
440
+
441
+ ### 7.5 Schedule Types
442
+
443
+ | Schedule | Formula |
444
+ |----------|---------|
445
+ | Constant | `s(t) = start` |
446
+ | Linear | `s(t) = start + (end - start) · t` |
447
+ | Cosine | `s(t) = end + (start - end) · 0.5(1 + cos(πt))` |
448
+ | Sigmoid | `s(t) = start + (end - start) · σ(12(t - 0.5))` |
449
+ | Tesla 3-6-9 | `s(t) = linear(t) + resonance(t)` |
450
+
451
+ ### 7.6 Intrinsic Tension τ
452
+
453
+ ```
454
+ τ = σ(gain · (Σ_i w_i · invariant_i - equilibrium))
455
+
456
+ where:
457
+ invariant_i = geometric invariants (Vol², edge stats, etc.)
458
+ w_i = learned per-invariant weights
459
+ gain = steepness of sigmoid response
460
+ equilibrium = learned bias
461
+
462
+ τ → 0: Pure spring (geometric constraint dominates)
463
+ τ → 1: Pure control (tower forces dominate)
464
+ ```
465
+
466
+ ### 7.7 Stability Criterion
467
+
468
+ ```
469
+ Eigenvalues of linearized system:
470
+ λ = -β ± √(β² - (1-τ)ω²)
471
+
472
+ Overdamped: β² > (1-τ)ω² (stable, no oscillation)
473
+ Underdamped: β² < (1-τ)ω² (oscillatory)
474
+ Critical: β² = (1-τ)ω² (fastest convergence)
475
+ ```
476
+
477
+ ### 7.8 Energy Tracking
478
+
479
+ ```
480
+ E_kinetic = 0.5 · ||v||²
481
+ E_potential = 0.5 · ω² · ||Log_x(x_ref)||²
482
+ E_total = E_kinetic + E_potential
483
+
484
+ Healthy training: E_total decreases over integration steps.
485
+ ```
486
+
487
+ ---
488
+
489
+ ## 8. K-Simplex Linear (Near-Zero Params)
490
+
491
+ Replaces `nn.Linear` with geometric routing through simplex structure.
492
+
493
+ ### 8.1 Architecture
494
+
495
+ ```
496
+ Input (B, input_dim)
497
+ → chunk into (B, num_simplices, K+1) groups
498
+ → per-scalar entry into vertex (K+1 options)
499
+ → private hidden projection per vertex (depth = K+1)
500
+ → pairwise signal passages between all vertex pairs
501
+ → attenuation gates on pairwise influence
502
+ → exit: weighted sum of vertex states
503
+ Output (B, output_dim)
504
+ ```
505
+
506
+ ### 8.2 Parameter Count
507
+
508
+ ```
509
+ Per simplex (K+1 inputs):
510
+ Entry: (K+1) × (K+1) × hidden
511
+ Vertex: (K+1) × hidden
512
+ Pairwise: C(K+1, 2) × 3 × hidden
513
+ Attenuate: C(K+1, 2) × 2
514
+ Exit: (K+1) × hidden + (K+1)
515
+
516
+ For K=4, input_dim=512:
517
+ 103 simplices × 300 params = 30,900
518
+ vs nn.Linear: 262,656
519
+ Ratio: 0.118x (11.8% of linear params)
520
+ ```
521
+
522
+ ### 8.3 Structural Comparison
523
+
524
+ ```
525
+ Structure size per simplex: (K+1) × (K+1) × C(K+1,2)
526
+
527
+ K=2: 3×3×3 = 27
528
+ K=4: 5×5×10 = 250
529
+ K=6: 7×7×21 = 1029
530
+ ```
531
+
532
+ ### 8.4 Results
533
+
534
+ ```
535
+ Fashion-MNIST:
536
+ KSimplex-k4: 85.94% with 8,511 params
537
+ MLP baseline: 89.00% with 101,770 params
538
+ Ratio: 11.5× more parameter-efficient
539
+
540
+ Epoch 1: 84.28% test (instant useful signal)
541
+ Epoch 19: 85.94% test (stable convergence)
542
+ ```
543
+
544
+ ---
545
+
546
+ ## 9. K-Simplex Deformation Limitations
547
+
548
+ Critical stability boundaries from extensive geometric explorer experiments.
549
+
550
+ ### 9.1 Stability Zones by Configuration
551
+
552
+ | Configuration | Differentiation Zone | Collapse Threshold |
553
+ |---------------|---------------------|-------------------|
554
+ | k=1-4, edim=16 | 0.15 - 0.35 | ~0.50 |
555
+ | k=1-4, edim=32 | 0.15 - 0.50 | >2.0 |
556
+ | k=1-6, edim=16 | 0.35 - 0.45 | ~0.50 |
557
+ | k=1-6, edim=32 | 0.25 - 0.60 | >2.0 |
558
+
559
+ ### 9.2 Embedding Dimension Safety Ratio
560
+
561
+ ```
562
+ stability_ratio = edim / k_max
563
+
564
+ ratio ≥ 8× → Very stable, deform up to 2.0
565
+ ratio ≥ 4× → Comfortable margin
566
+ ratio ≥ 2× → Tight but functional
567
+ ratio < 2× → Dangerous, frequent invalidity
568
+ ```
569
+
570
+ ### 9.3 Deformation Behavior
571
+
572
+ ```
573
+ Low deform (0 - 0.15):
574
+ Clear k-level hierarchy
575
+ Vol² decreases exponentially with k
576
+ Conservative but safe
577
+
578
+ Medium deform (0.15 - 0.35): ← OPTIMAL ZONE
579
+ Distinct geometric signatures per k
580
+ Maximum useful differentiation
581
+ Training should target this range
582
+
583
+ High deform (> 0.5):
584
+ Noise dominates
585
+ k-levels converge (lose meaning)
586
+ Geometric structure destroyed
587
+ ```
588
+
589
+ ### 9.4 Late-Stage K-Simplex Invalidity
590
+
591
+ ```
592
+ As k increases:
593
+ - CM determinant computation becomes numerically unstable
594
+ - More edge configurations become geometrically impossible
595
+ - Deeper layers produce invalid simplex configurations
596
+
597
+ k=4 in 32D: stable with wide margin
598
+ k=5 in 32D: functional but tighter
599
+ k=6 in 32D: approaching invalidity ceiling
600
+
601
+ Recommendation: k=4 (pentachoron) as primary, k≤3 for tight budgets
602
+ ```
603
+
604
+ ### 9.5 Cross-Entropy Degeneracy Problem
605
+
606
+ ```
607
+ Cross-entropy applied directly to simplex features:
608
+ → Vertices converge (minimizing distance to class boundary)
609
+ → Volume → 0 (simplex collapses)
610
+ → α diverges from triadic equilibrium
611
+ → Geometric structure destroyed after sufficient epochs
612
+
613
+ Solution: Use crystal loss or basin loss, NOT cross-entropy on geometric features.
614
+ ```
615
+
616
+ ---
617
+
618
+ ## 10. Cross-Contrast Capacity Tests
619
+
620
+ Validating that geometric structure survives training and provides meaningful classification signal.
621
+
622
+ ### 10.1 Geometric Cross-Contrastive Loss
623
+
624
+ ```
625
+ sim_matrix = (x̂ @ x̂.T) / τ # [B, B] embedding similarity
626
+
627
+ cantor_positives = (|C(i) - C(j)| < θ_cantor) AND (|Vol(i) - Vol(j)| < θ_vol)
628
+
629
+ L_cross = -log(Σ_j∈positives exp(sim_ij) / Σ_j∈all exp(sim_ij))
630
+
631
+ where positives are defined by geometric proximity, not class labels
632
+ ```
633
+
634
+ ### 10.2 Capacity Invariants to Monitor
635
+
636
+ ```
637
+ 1. Vol² > 0 for all simplices (validity)
638
+ 2. α ∈ [0.44, 0.50] (triadic equilibrium)
639
+ 3. Edge length variance < threshold (structural uniformity)
640
+ 4. Cantor prototype separation > margin (class distinctness)
641
+ 5. Crystal distance to prototype ~ d_target (geometric alignment)
642
+ ```
643
+
644
+ ### 10.3 Differential Cross-Contrast (Tower Pairs)
645
+
646
+ ```
647
+ For positive/negative tower pairs:
648
+ Δ_force = force_positive - force_negative
649
+
650
+ L_differential = -log(σ(Δ_force · direction_to_correct_class))
651
+ + log(σ(Δ_force · direction_to_incorrect_class))
652
+
653
+ Signed pairs create differential forces, not just different opinions.
654
+ ```
655
+
656
+ ### 10.4 Cross-Scale Consistency
657
+
658
+ ```
659
+ For scales s₁, s₂:
660
+ features_s1 = proj_s1(backbone_features)
661
+ features_s2 = proj_s2(backbone_features)
662
+
663
+ L_consistency = ||rank_order(sim_s1) - rank_order(sim_s2)||₂
664
+
665
+ Ensures geometric relationships are preserved across crystal scales.
666
+ ```
667
+
668
+ ### 10.5 OOD Detection via Geometric Violation
669
+
670
+ ```
671
+ In-distribution: Vol² > 0, α stable, Cantor coherent
672
+ Out-of-distribution: Violations of above
673
+
674
+ OOD_score = (1 - σ(Vol² · 10⁶)) + (|α - 0.5|) + (1 - compat_max)
675
+ ```
676
+
677
+ ### 10.6 Scaling Limitation (Known)
678
+
679
+ ```
680
+ Cross-contrastive loss across full vocabulary:
681
+ O(V²) pairwise comparisons
682
+
683
+ V=100 (CIFAR-100): 10K pairs → feasible
684
+ V=1000 (ImageNet): 1M pairs → expensive
685
+ V=50000 (tokenizer): 2.5B pairs → infeasible
686
+
687
+ Solution: Hierarchical contrastive within Cantor branches.
688
+ Only contrast within same coarse branch (routing highways).
689
+ Fine branches → local contrast only.
690
+ ```
691
+
692
+ ---
693
+
694
+ ## Appendix A: Proven Results Summary
695
+
696
+ | Model | Task | Accuracy | Params | Key Innovation |
697
+ |-------|------|----------|--------|----------------|
698
+ | David | ImageNet (CLIP bigG) | 86% | ~120K | Multi-scale crystal |
699
+ | David | CIFAR-100 | 74.87% | 393K | Crystal prototypes |
700
+ | David | CIFAR-100 | ~92% | 78KB | Extreme compression |
701
+ | geo-beatrix | CIFAR-100 | 67.69% | — | NO attention, NO CE |
702
+ | KSimplex Attention | FMNIST | 89.13% | — | Geometric attention |
703
+ | KSimplex Attention | CIFAR-10 | 84.59% | — | Conv stem + geo attn |
704
+ | KSimplex Attention | CIFAR-100 | 69.08% | — | Multi-layer sharpening |
705
+ | KSimplex Linear | FMNIST | 85.94% | 8,511 | 11.5× efficiency |
706
+ | KSimplex LLM | Shakespeare | PPL 113 | 54M | 100% geo validity |
707
+ | Beeper v5 | Ethics | Coherent | Random | Architecture IS intelligence |
708
+
709
+ ## Appendix B: Formula Dependencies
710
+
711
+ ```
712
+ ┌─────────────┐
713
+ │ Cayley-Menger│ ← structural invariant
714
+ └──────┬──────┘
715
+
716
+ ┌────────────┼────────────┐
717
+ ▼ ▼ ▼
718
+ ┌──────────┐ ┌──────────┐ ┌──────────┐
719
+ │ K-Simplex│ │ Crystal │ │ Basin │
720
+ │ Channel │ │ Loss │ │ Compat │
721
+ └────┬─────┘ └────┬─────┘ └────┬─────┘
722
+ │ │ │
723
+ ▼ ▼ ▼
724
+ ┌──────────────────────────────────┐
725
+ │ Cantor Lens │
726
+ │ (Staircase + Alignment + Bias) │
727
+ └──────────────┬───────────────────┘
728
+
729
+ ┌────────┼────────┐
730
+ ▼ ▼ ▼
731
+ ┌─────────┐ ┌──────┐ ┌──────────┐
732
+ │ Topo │ │ Osc │ │ KSimplex │
733
+ │ Ropes │ │ Fuse │ │ Linear │
734
+ └─────────┘ └──────┘ └──────────┘
735
+ ```
736
+
737
+ ## Appendix C: What Kills Geometry (Known Failure Modes)
738
+
739
+ 1. **Cross-entropy on geometric features** → simplex collapse
740
+ 2. **Distance on Cantor set** → meaningless (use alignment)
741
+ 3. **Deformation > 0.35 at edim/k < 4** → invalidity
742
+ 4. **k > 4 without edim ≥ 8k** → numerical instability
743
+ 5. **Uniform Cantor level weights** → hides 8× routing significance difference
744
+ 6. **Resizing crystal anchors across scales** → destroys pentachoron geometry (use separate init per scale)
745
+ 7. **Dropout scaling with √dim** → inconsistent information flow across scales