AbstractPhil commited on
Commit
79bc886
·
verified ·
1 Parent(s): b0e7d48

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +440 -0
README.md ADDED
@@ -0,0 +1,440 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+ # GeoLIP Spectral Encoder — Test Manifest
5
+ ## Geometric Primitives for Constellation-Anchored Classification
6
+
7
+ **Target**: CIFAR-10 (baseline), then generalize
8
+ **Constraint**: Zero or minimal learned encoder params. All learning in constellation anchors, patchwork, classifier.
9
+ **Metric**: Val accuracy, CV convergence, anchor activation, InfoNCE lock, train/val gap
10
+ **Baseline to beat**: 88.0% (conv encoder + SquaredReLU + full trainer, 1.6M params)
11
+ **Current best spectral**: 46.8% (STFT + Cholesky + SVD, v4, 137K params, CE-only carry)
12
+
13
+ ---
14
+
15
+ ## STATUS KEY
16
+ - `[ ]` — Not started
17
+ - `[R]` — Running
18
+ - `[X]` — Completed
19
+ - `[F]` — Failed (with reason)
20
+ - `[S]` — Skipped (with reason)
21
+ - `[P]` — Partially completed
22
+
23
+ ---
24
+
25
+ ## COMPLETED EXPERIMENTS (prior sessions + this session)
26
+
27
+ ### Conv Encoder Baselines (Form 1 Core)
28
+ - [X] Linear baseline, 100 epochs → **67.0%**, 422K params, overfits at E31
29
+ - [X] MLP baseline, 100 epochs → **65.0%**, 687K params, overfits at E10
30
+ - [X] Core CE-only, 100 epochs → **63.4%**, 820K params, CV=0.70, never converges
31
+ - [X] Core CE+CV, 100 epochs → **62.7%**, 820K params, CV=0.61, worse than CE-only
32
+ - [X] Core 32 anchors, interrupted E20 → **59.2%**, 1.8M params, slow convergence
33
+ - [X] Full trainer GELU, 100 epochs → **88.0%**, 1.6M params (original proven result)
34
+ - [X] Full trainer SquaredReLU, 100 epochs → **88.0%**, 1.6M params, E96 best
35
+
36
+ ### Spectral Encoder Experiments
37
+ - [F] Spectral v1: flat FFT → 768-d → single constellation → **collapsed**
38
+ - Cause: concat norm √48≈6.93 vs anchor norm 1, not on same sphere
39
+ - [F] Spectral v2: per-band constellation (48×64=3072 anchors) → **~35%**
40
+ - Cause: 3072 tri dims too diffuse, InfoNCE dead at 0.45, no cross-band structure
41
+ - [F] Spectral v3: FFT → 8 channels (spherical mean) → 128 anchors → **27%**
42
+ - Cause: cos≈0.99, spherical mean collapsed all images to same point
43
+ - [P] Spectral v4: STFT + Cholesky + SVD → S^43 → 64 anchors → **46.8%** (still running)
44
+ - CE carrying alone, CosineEmbeddingLoss frozen at 0.346, InfoNCE dead at 0.15
45
+ - Cholesky+SVD signature IS discriminative, contrastive losses unable to contribute
46
+
47
+ ---
48
+
49
+ ## CATEGORY 1: SIGNAL DECOMPOSITION TO GEOMETRY
50
+
51
+ ### 1.1 Wavelet Scattering Transform (Mallat)
52
+ **Formula**: S_J[p]x(u) = |||x * ψ_{λ₁}| * ψ_{λ₂}| ... | * φ_{2^J}(u)
53
+ **Library**: kymatio (pip install kymatio)
54
+ **Expected output**: ~10K-dim feature vector for 32×32
55
+ **Literature baseline**: ~82% CIFAR-10 with SVM, ~70.5% with linear
56
+ **Properties**: Deterministic, Lipschitz-continuous, approximately energy-preserving
57
+
58
+ - [ ] **1.1a** Scattering order 2, J=2, L=8 → L2 normalize → flat constellation on S^d
59
+ - Hypothesis: scattering features are rich enough that flat constellation should work
60
+ - Compare: direct linear classifier on scattering vs constellation pipeline
61
+ - [ ] **1.1b** Scattering → JL projection to S^127 → constellation (64 anchors)
62
+ - JL preserves distances; S^127 matches our proven dim
63
+ - [ ] **1.1c** Scattering → JL → S^43 → Cholesky/SVD signature → constellation
64
+ - Stack v4's geometric signature on top of scattering features
65
+ - [ ] **1.1d** Scattering order 1 vs order 2 ablation
66
+ - Order 1 is ~Gabor magnitude; order 2 adds inter-frequency structure
67
+ - [ ] **1.1e** Scattering + InfoNCE: does augmentation invariance help or hurt?
68
+ - Scattering is already translation-invariant; InfoNCE may be redundant
69
+ - [ ] **1.1f** Scattering hybrid: scattering front-end + lightweight learned projection + constellation
70
+ - Test minimal learned params needed to bridge the 82→88% gap
71
+
72
+ ### 1.2 Gabor Filter Banks
73
+ **Formula**: g(x,y) = exp(−(x'²+γ²y'²)/(2σ²)) · exp(i(2πx'/λ+ψ))
74
+ **Expected**: S scales × K orientations → S×K magnitude responses
75
+ **Properties**: Deterministic, O(N·S·K), first-order scattering ≈ Gabor modulus
76
+
77
+ - [ ] **1.2a** Gabor bank (4 scales × 8 orientations = 32 filters) → L2 norm → S^31
78
+ - Each filter response is a spatial map; pool to scalar per filter
79
+ - [ ] **1.2b** Gabor → per-filter spatial statistics (mean, std, skew, kurtosis) → S^127
80
+ - 32 filters × 4 stats = 128-d, matches conv encoder output dim
81
+ - [ ] **1.2c** Gabor vs scattering order 1 A/B test
82
+ - Validate that scattering order 1 ≈ Gabor + modulus
83
+
84
+ ### 1.3 Radon Transform
85
+ **Formula**: Rf(ω,t) = ∫ f(x) δ(x·ω − t) dx
86
+ **Properties**: Deterministic, exactly invertible via filtered back-projection
87
+
88
+ - [ ] **1.3a** Radon at K angles → sinogram → L2 norm per angle → K points on S^d
89
+ - K angles = K geometric addresses, constellation measures the cloud
90
+ - [ ] **1.3b** Radon → 1D wavelet per projection (= ridgelet) → aggregate to S^d
91
+ - Composition: Radon → Ridgelet, captures linear singularities
92
+
93
+ ### 1.4 Curvelet Transform
94
+ **Formula**: c_{j,l,k} = ⟨f, φ_{j,l,k}⟩, parabolic scaling: width ≈ length²
95
+ **Properties**: Deterministic, exactly invertible (tight frame), O(N² log N)
96
+
97
+ - [ ] **1.4a** Curvelet energy per (scale, orientation) band → L2 norm → S^d
98
+ - Captures directional frequency that scattering misses
99
+ - [ ] **1.4b** Curvelet + scattering concatenation → JL → constellation
100
+ - Test complementarity of isotropic (scattering) + anisotropic (curvelet) features
101
+
102
+ ### 1.5 Persistent Homology (TDA)
103
+ **Formula**: Track birth/death of β₀ (components), β₁ (loops) across filtration
104
+ **Library**: giotto-tda or ripser
105
+ **Properties**: Deterministic, O(n³), captures topology no other transform sees
106
+
107
+ - [ ] **1.5a** Sublevel set filtration on grayscale → persistence image → L2 norm → S^d
108
+ - [ ] **1.5b** PH on scattering feature maps (topology of the representation)
109
+ - Captures whether scattering features form clusters, loops, voids
110
+ - [ ] **1.5c** PH Betti curve as additional channel in multi-signature pipeline
111
+ - [ ] **1.5d** PH standalone classification baseline on CIFAR-10
112
+ - Literature suggests ~60-70% standalone; valuable as complementary signal
113
+
114
+ ### 1.6 STFT Variants (improving v4)
115
+ - [ ] **1.6a** 2D STFT via patch-wise FFT (overlapping patches) instead of row/col STFT
116
+ - True spatial-frequency decomposition vs row+col approximation
117
+ - [ ] **1.6b** STFT with larger n_fft=32 (current: 16) → more frequency resolution
118
+ - [ ] **1.6c** STFT preserving phase (not just magnitude) via analytic signal
119
+ - Phase encodes spatial structure; current pipeline discards it
120
+ - [ ] **1.6d** Multi-window STFT (different window sizes for different frequency ranges)
121
+
122
+ ---
123
+
124
+ ## CATEGORY 2: MANIFOLD STRUCTURES
125
+
126
+ ### 2.1 Hopf Fibration
127
+ **Formula**: h(z₁,z₂) = (2z̄₁z₂, |z₁|²−|z₂|²) : S³ → S²
128
+ **Properties**: Deterministic, O(1), hierarchical (base + fiber)
129
+
130
+ - [ ] **2.1a** Encode 4-d feature vectors on S³ → Hopf project to S² + fiber coordinate
131
+ - Coarse triangulation on S², fine discrimination in fiber
132
+ - [ ] **2.1b** Quaternionic Hopf S⁷ → S⁴ for 8-d features
133
+ - Natural for 8-channel spectral decomposition (v3/v4 channel count)
134
+ - [ ] **2.1c** Hopf foliation spherical codes for anchor initialization
135
+ - Replace uniform_hypersphere_init with Hopf-structured codes
136
+ - [ ] **2.1d** Hierarchical constellation: coarse anchors on base S², fine anchors per fiber
137
+
138
+ ### 2.2 Grassmannian Class Representations
139
+ **Formula**: Class = k-dim subspace of ℝⁿ, distances via principal angles
140
+ **Properties**: Requires SVD, O(nk²)
141
+
142
+ - [ ] **2.2a** Replace class vectors with class subspaces on Gr(k,n)
143
+ - Each class owns a k-dim subspace; classification = nearest subspace
144
+ - Literature: +1.3% on ImageNet over single class vectors
145
+ - [ ] **2.2b** Grassmannian distance metrics ablation: geodesic vs chordal vs projection
146
+ - [ ] **2.2c** Per-class anchor subspace: each anchor defines a subspace, not a point
147
+
148
+ ### 2.3 Flag Manifold (Nested Subspace Hierarchy)
149
+ **Formula**: V₁ ⊂ V₂ ⊂ ... ⊂ Vₖ, nested subspaces
150
+ **Properties**: Generalizes Grassmannian, natural for multi-resolution
151
+
152
+ - [ ] **2.3a** Flag decomposition of frequency channels (DC ⊂ low ⊂ mid ⊂ high)
153
+ - Test whether nesting constraint improves spectral encoder
154
+ - [ ] **2.3b** Flag-structured anchors: coarse-to-fine anchor hierarchy
155
+
156
+ ### 2.4 Von Mises-Fisher Mixture
157
+ **Formula**: f(x; μ, κ) = C_p(κ) exp(κ μᵀx), soft clustering on S^d
158
+ **Properties**: Natural density model for hyperspherical data
159
+
160
+ - [ ] **2.4a** Replace hard nearest-anchor assignment with vMF soft posteriors
161
+ - p(j|x) = α_j f(x;μ_j,κ_j) / Σ α_k f(x;μ_k,κ_k)
162
+ - Learned κ per anchor = adaptive influence radius
163
+ - [ ] **2.4b** vMF mixture EM for anchor initialization (replace uniform hypersphere init)
164
+ - [ ] **2.4c** vMF concentration κ as a diagnostic: track per-class κ convergence
165
+
166
+ ### 2.5 Optimal Anchor Placement
167
+ - [ ] **2.5a** E₈ lattice anchors for 8-d constellation (240 maximally separated points)
168
+ - [ ] **2.5b** Spherical t-design initialization vs uniform hypersphere init
169
+ - [ ] **2.5c** Thomson problem solver for N anchors on S^d (energy minimization)
170
+ - Compare: QR + iterative repulsion (current) vs Coulomb energy minimization
171
+
172
+ ---
173
+
174
+ ## CATEGORY 3: COMPACT REPRESENTATIONS
175
+
176
+ ### 3.1 Random Fourier Features
177
+ **Formula**: z(x) = √(2/D) [cos(ω₁ᵀx+b₁), ..., cos(ωDᵀx+bD)]
178
+ **Properties**: Pseudo-deterministic, preserves kernel structure, maps to S^d via cos/sin
179
+
180
+ - [ ] **3.1a** RFF on raw pixels → S^d → constellation
181
+ - Baseline: how much does nonlinear kernel approximation help raw pixels?
182
+ - [ ] **3.1b** RFF on scattering features → constellation
183
+ - Composition: scattering (linear invariants) → RFF (nonlinear kernel)
184
+ - [ ] **3.1c** Fourier feature positional encoding (Tancik/Mildenhall style)
185
+ - γ(v) = [cos(2πBv), sin(2πBv)]ᵀ explicitly maps to hypersphere
186
+
187
+ ### 3.2 Johnson-Lindenstrauss Projection
188
+ **Formula**: f(x) = (1/√k)Ax, preserves distances with k = O(ε⁻² log n)
189
+ **Properties**: Pseudo-deterministic, near-isometric
190
+
191
+ - [ ] **3.2a** JL from scattering (~10K) to 128-d → L2 norm → constellation
192
+ - Test: does JL + L2 norm preserve enough structure?
193
+ - [ ] **3.2b** JL target dimension sweep: 32, 64, 128, 256, 512
194
+ - Find minimum k where constellation accuracy saturates
195
+ - [ ] **3.2c** Fast JL (randomized Hadamard) vs Gaussian JL speed/accuracy tradeoff
196
+
197
+ ### 3.3 Compressed Sensing on Scattering Coefficients
198
+ **Formula**: y = Φx, recover via ℓ₁ minimization if x is k-sparse
199
+ **Properties**: Exact recovery for sparse signals, O(k log(N/k)) measurements
200
+
201
+ - [ ] **3.3a** Measure sparsity of scattering coefficients (how many are near-zero?)
202
+ - If sparse: CS can compress much more than JL
203
+ - [ ] **3.3b** CS measurement matrix → L2 norm → constellation
204
+ - Compare: CS vs JL at same target dimension
205
+
206
+ ### 3.4 Spherical Harmonics
207
+ **Formula**: Y_l^m(θ,φ), complete basis on S², (l_max+1)² coefficients
208
+ **Properties**: Deterministic, native Fourier on sphere, exactly invertible
209
+
210
+ - [ ] **3.4a** Expand constellation triangulation profile in spherical harmonics
211
+ - Which angular frequencies carry discriminative info?
212
+ - [ ] **3.4b** Spherical harmonic coefficients of embedding distribution as class signature
213
+ - [ ] **3.4c** Hyperspherical harmonics for S^15 and S^43 (higher-dim generalization)
214
+
215
+ ---
216
+
217
+ ## CATEGORY 4: INVERTIBLE GEOMETRIC TRANSFORMS
218
+
219
+ ### 4.1 Stereographic Projection
220
+ **Formula**: σ(x) = x_{1:n}/(1−x_{n+1}), σ⁻¹(y) = (2y, ‖y‖²−1)/(‖y‖²+1)
221
+ **Properties**: Conformal bijection S^n\{pole} ↔ ℝⁿ, preserves angles
222
+
223
+ - [ ] **4.1a** Stereographic → Euclidean scattering → inverse stereographic → S^d
224
+ - Apply scattering in flat space, project back to sphere
225
+ - [ ] **4.1b** Stereographic projection as constellation readout alternative
226
+ - Instead of triangulation distances, read local coordinates via stereographic
227
+
228
+ ### 4.2 Exponential / Logarithmic Maps
229
+ **Formula**: exp_p(v) = cos(‖v‖)·p + sin(‖v‖)·v/‖v‖
230
+ **Formula**: log_p(q) = arccos(⟨q,p⟩) · (q−⟨q,p⟩p)/‖q−⟨q,p⟩p‖
231
+ **Properties**: Deterministic, locally invertible, O(n)
232
+
233
+ - [ ] **4.2a** Replace triangulation (1−cos) with log map coordinates at each anchor
234
+ - Log map gives direction + distance in tangent space (richer than scalar distance)
235
+ - Each anchor contributes d-dim tangent vector instead of 1-d distance
236
+ - [ ] **4.2b** Log map triangulation → parallel transport to common tangent space → aggregate
237
+ - Geometrically principled alternative to patchwork concatenation
238
+
239
+ ### 4.3 Parallel Transport
240
+ **Formula**: Γ^q_p(v) = v − (⟨v,p⟩+⟨v,q⟩/(1+⟨p,q⟩))·(p+q) on S^n
241
+ **Properties**: Isometric between tangent spaces, exactly invertible
242
+
243
+ - [ ] **4.3a** Compute log maps at K anchors → parallel transport all to north pole → aggregate
244
+ - Creates a canonical tangent-space representation independent of anchor positions
245
+ - [ ] **4.3b** Parallel transport as inter-anchor communication in constellation
246
+ - How does the same input look from different anchor tangent spaces?
247
+
248
+ ### 4.4 Möbius Transformations
249
+ **Formula**: h_ω(z) = [(1−‖ω‖²)/‖z−ω‖²](z−ω) − ω
250
+ **Properties**: Conformal automorphism of S^d, invertible, O(d)
251
+
252
+ - [ ] **4.4a** Möbius "geometric attention": transform sphere to zoom into anchor regions
253
+ - Expand region near anchor, compress far regions
254
+ - Each anchor applies its own Möbius transform before measuring distance
255
+ - [ ] **4.4b** Composition of Möbius transforms as normalizing flow on S^d
256
+ - Learned flow that warps embedding distribution toward better separation
257
+
258
+ ### 4.5 Procrustes + Polar Decomposition
259
+ **Formula**: R* = argmin_R ‖RA−B‖_F = UVᵀ from SVD(BᵀA)
260
+ **Formula**: A = UP (rotation × stretch)
261
+
262
+ - [ ] **4.5a** Procrustes-align channel cloud to canonical pose before Cholesky/SVD
263
+ - Remove rotation variability, isolate shape information
264
+ - [ ] **4.5b** Polar decomposition of channel matrix: U (rotation) + P (stretch) as separate features
265
+ - U encodes orientation of frequency cloud; P encodes shape/scale
266
+ - Both are geometric, both are deterministic from the channel matrix
267
+
268
+ ---
269
+
270
+ ## CATEGORY 5: MATRIX DECOMPOSITION SIGNATURES
271
+
272
+ ### 5.1 Already Tested
273
+ - [X] Cholesky of Gram matrix → 36 lower-tri values (in v4, working)
274
+ - [X] SVD singular values → 8 values (in v4, working)
275
+ - [X] Concatenated 44-d signature on S^43 → 46.8% with CE-only
276
+
277
+ ### 5.2 Remaining Decompositions
278
+ - [ ] **5.2a** QR decomposition: Q (rotation) and R diagonal (scale per channel)
279
+ - R diagonal = per-channel magnitude; Q = inter-channel angular structure
280
+ - [ ] **5.2b** Schur decomposition: T diagonal = eigenvalues, T off-diagonal = coupling
281
+ - For the Gram matrix: Schur gives eigenstructure in triangular form
282
+ - [ ] **5.2c** Eigendecomposition of Gram: eigenvalues as spectral signature
283
+ - Compare: eigenvalues vs SVD singular values vs Cholesky diagonal
284
+ - These are related but not identical (λ_i = σ_i² for Gram = AᵀA)
285
+ - [ ] **5.2d** NMF of magnitude spectrum: parts-based decomposition
286
+ - Requires iterative optimization (not fully deterministic)
287
+ - But finds additive, non-negative parts — texture components
288
+ - [ ] **5.2e** Tucker tensor decomposition of spatial×frequency×channel tensor
289
+ - 3D structure: (H, W, freq_bins) per color channel
290
+ - Core tensor encodes interactions between spatial, frequency, channel modes
291
+
292
+ ---
293
+
294
+ ## CATEGORY 6: INFORMATION-THEORETIC LOSSES
295
+
296
+ ### 6.1 Already Tested
297
+ - [X] InfoNCE (self-contrastive, two augmented views) — dead at 0.15 in spectral v4
298
+ - [X] CosineEmbeddingLoss — frozen at 0.346 (margin-saturated)
299
+ - [X] CV loss (Cayley-Menger volume) — running but not in 0.18-0.25 band
300
+
301
+ ### 6.2 Loss Modifications
302
+ - [ ] **6.2a** Drop contrastive losses entirely, CE-only + geometric losses
303
+ - v4 shows CE is the only contributor; contrastive is dead weight
304
+ - Hypothesis: removing dead losses may speed convergence
305
+ - [ ] **6.2b** Class-conditional InfoNCE: positive = same class, not same image
306
+ - Requires labels but gives much stronger supervision signal
307
+ - [ ] **6.2c** vMF-based contrastive loss: replace dot-product similarity with vMF log-likelihood
308
+ - κ-adaptive: high-κ for nearby pairs, low-κ for far pairs
309
+ - [ ] **6.2d** Fisher-Rao distance as loss: d_FR(p,q) = 2·arccos(∫√(pq))
310
+ - Natural distance for distributions on the sphere
311
+ - [ ] **6.2e** Sliced spherical Wasserstein distance as distribution matching loss
312
+ - Matches embedding distribution to target (e.g., uniform on sphere)
313
+ - [ ] **6.2f** Geometric autograd (from GM3): tangential projection + separation preservation
314
+ - Adam + geometric autograd > AdamW on geometric tasks (proven)
315
+ - Operates on gradient direction, not loss value
316
+
317
+ ### 6.3 Anchor Management
318
+ - [ ] **6.3a** Anchor push frequency sweep: every 10, 25, 50, 100, 200 batches
319
+ - [ ] **6.3b** Anchor push with vMF-weighted centroids instead of hard class centroids
320
+ - [ ] **6.3c** Anchor birth/death: add anchors where density is high, remove where unused
321
+ - [ ] **6.3d** Anchor dropout sweep: 0%, 5%, 15%, 30%, 50%
322
+
323
+ ---
324
+
325
+ ## CATEGORY 7: COMPOSITE PIPELINE TESTS
326
+
327
+ ### 7.1 The Reference Pipeline (from research article)
328
+ - [ ] **7.1a** Scattering(J=2,L=8) → JL(128) → L2 norm → constellation(64) → classify
329
+ - The "canonical" pipeline; expected ~75-80% based on literature
330
+ - [ ] **7.1b** Same as 7.1a but with learned 2-layer projection replacing JL
331
+ - Minimal learned params (~16K), test if projection adaptation matters
332
+ - [ ] **7.1c** Scattering → curvelet energy → concat → JL → constellation
333
+ - Test complementarity
334
+
335
+ ### 7.2 Hybrid: Spectral + Scattering
336
+ - [ ] **7.2a** STFT channels (v4) + scattering features → concat → JL → S^d → constellation
337
+ - STFT gives spatial-frequency; scattering gives multi-scale invariants
338
+ - [ ] **7.2b** Scattering → Cholesky Gram + SVD signature → constellation
339
+ - Apply v4's geometric signature to scattering output instead of STFT
340
+
341
+ ### 7.3 Multi-Signature Constellation
342
+ - [ ] **7.3a** Parallel extraction: scattering + Gabor + Radon → separate constellations → fusion
343
+ - Each primitive captures different geometric aspect
344
+ - Fusion: concatenate patchwork outputs → shared classifier
345
+ - [ ] **7.3b** Hierarchical constellation: scattering → coarse anchors → residual → fine anchors
346
+ - Two-stage: first stage identifies broad category, second refines
347
+
348
+ ### 7.4 Minimal Learned Params Tests
349
+ - [ ] **7.4a** Best deterministic pipeline + 1 learned linear layer (d_in → 128) before constellation
350
+ - Measure: how much does a single projection layer help?
351
+ - Count: exact learned param count
352
+ - [ ] **7.4b** Same as 7.4a but with SquaredReLU + LayerNorm (the proven patchwork block)
353
+ - [ ] **7.4c** Sweep learned projection sizes: 0, 1K, 5K, 10K, 50K, 100K params
354
+ - Find the elbow where adding params stops helping
355
+
356
+ ---
357
+
358
+ ## PRIORITY QUEUE (recommended execution order)
359
+
360
+ ### Tier 1: Highest Expected Impact
361
+ 1. **1.1a** — Scattering + flat constellation (the literature leader)
362
+ 2. **1.1b** — Scattering + JL → S^127 + constellation
363
+ 3. **6.2a** — Drop dead contrastive losses from v4, measure CE-only ceiling
364
+ 4. **2.4a** — vMF soft assignment replacing hard nearest-anchor
365
+ 5. **4.2a** — Log map triangulation (richer than scalar distance)
366
+
367
+ ### Tier 2: High Expected Impact
368
+ 6. **7.1a** — Full reference pipeline
369
+ 7. **1.1f** — Scattering hybrid with minimal learned projection
370
+ 8. **1.2b** — Gabor spatial statistics → S^127
371
+ 9. **5.2c** — Eigendecomposition vs SVD vs Cholesky ablation
372
+ 10. **2.1b** — Quaternionic Hopf S⁷→S⁴ for 8-channel data
373
+
374
+ ### Tier 3: Exploratory
375
+ 11. **1.5a** — Persistent homology standalone
376
+ 12. **3.1b** — RFF on scattering features
377
+ 13. **4.4a** — Möbius geometric attention
378
+ 14. **7.3a** — Multi-signature parallel constellations
379
+ 15. **2.2a** — Grassmannian class subspaces
380
+
381
+ ### Tier 4: Deep Exploration
382
+ 16. **1.3a** — Radon cloud on S^d
383
+ 17. **1.4b** — Curvelet + scattering concat
384
+ 18. **2.3a** — Flag decomposition of frequency channels
385
+ 19. **4.3a** — Parallel transport aggregation
386
+ 20. **3.4c** — Hyperspherical harmonics analysis
387
+
388
+ ---
389
+
390
+ ## RUNNING SCOREBOARD
391
+
392
+ | Experiment | Val Acc | Params (learned) | CV | Anchors Active | InfoNCE | Key Finding |
393
+ |---|---|---|---|---|---|---|
394
+ | Linear baseline | 67.0% | 423K | — | — | — | Overfits E31 |
395
+ | MLP baseline | 65.0% | 687K | — | — | — | Overfits E10 |
396
+ | Core CE-only | 63.4% | 820K | 0.70 | — | — | CV never converges |
397
+ | Core CE+CV | 62.7% | 820K | 0.61 | — | — | CV hurts accuracy |
398
+ | Full GELU | 88.0% | 1.6M | 0.14-0.17 | 64/64 | 1.00 | Reference |
399
+ | Full SquaredReLU | 88.0% | 1.6M | 0.15 | 64/64 | 1.00 | Matches GELU |
400
+ | Spectral v1 (flat FFT) | FAIL | — | — | 1/64 | — | Norm mismatch |
401
+ | Spectral v2 (per-band) | ~35% | 1.2M | 0.17-0.19 | 900/3072 | 0.45 | Too diffuse |
402
+ | Spectral v3 (sph mean) | ~27% | 130K | 0.27-0.34 | 110/128 | 0.35 | Collapsed to point |
403
+ | Spectral v4 (STFT+Chol+SVD) | 46.8% | 137K | 0.52-0.66 | 53/64 | 0.15 | CE-only carry |
404
+ | *Scattering baseline* | *~82%** | *0* | *—* | *—* | *—* | *Literature (SVM)* |
405
+
406
+ *Italicized entries are literature values, not our runs*
407
+
408
+ ---
409
+
410
+ ## NOTES & INSIGHTS
411
+
412
+ ### Why contrastive losses die on deterministic encoders
413
+ The STFT/FFT faithfully reports every pixel-level difference between augmented views.
414
+ Two crops of the same image produce signatures as different as two different images.
415
+ Without a learned layer to absorb augmentation variance, InfoNCE has nothing to align.
416
+ Solutions: (a) augmentation-invariant features (scattering), (b) thin learned projection,
417
+ (c) class-conditional contrastive (6.2b), (d) drop contrastive entirely (6.2a).
418
+
419
+ ### The Cholesky insight
420
+ L diagonal encodes "new angular information per tier given all lower tiers."
421
+ This IS discriminative (proved by v4 reaching 46.8% with CE alone).
422
+ The 44-d signature on S^43 carries real inter-channel geometry.
423
+ Next question: is the STFT front-end the bottleneck, or the 44-d signature?
424
+
425
+ ### Scattering is the clear next step
426
+ 82% on CIFAR-10 with zero learned params (literature) vs our 46.8%.
427
+ Scattering is translation-invariant AND deformation-stable (Lipschitz).
428
+ This directly addresses the augmentation sensitivity problem.
429
+ kymatio provides GPU-accelerated PyTorch implementation.
430
+
431
+ ### The dimension question
432
+ S^15 (band_dim=16) vs S^43 (signature) vs S^127 (conv encoder output)
433
+ E₈ lattice gives 240 optimal anchors on S^7
434
+ Proven CV attractor at ~0.20 is on S^15
435
+ Need to test which target sphere dimension is optimal for spectral features
436
+
437
+ ---
438
+
439
+ *Last updated: 2026-03-18, session with Opus*
440
+ *Next: run scattering baseline (1.1a), then decide pipeline direction*