AbstractPhil
/

geolip-hypersphere-experiments

TensorBoard

Model card Files Files and versions

xet

Metrics Training metrics Community

AbstractPhil commited on 16 days ago

Commit

79bc886

verified ·

1 Parent(s): b0e7d48

Create README.md

Browse files

Files changed (1) hide show

README.md +440 -0

README.md ADDED Viewed

	@@ -0,0 +1,440 @@

+---
+license: mit
+---
+# GeoLIP Spectral Encoder — Test Manifest
+## Geometric Primitives for Constellation-Anchored Classification
+**Target**: CIFAR-10 (baseline), then generalize
+**Constraint**: Zero or minimal learned encoder params. All learning in constellation anchors, patchwork, classifier.
+**Metric**: Val accuracy, CV convergence, anchor activation, InfoNCE lock, train/val gap
+**Baseline to beat**: 88.0% (conv encoder + SquaredReLU + full trainer, 1.6M params)
+**Current best spectral**: 46.8% (STFT + Cholesky + SVD, v4, 137K params, CE-only carry)
+---
+## STATUS KEY
+- `[ ]` — Not started
+- `[R]` — Running
+- `[X]` — Completed
+- `[F]` — Failed (with reason)
+- `[S]` — Skipped (with reason)
+- `[P]` — Partially completed
+---
+## COMPLETED EXPERIMENTS (prior sessions + this session)
+### Conv Encoder Baselines (Form 1 Core)
+- [X] Linear baseline, 100 epochs → **67.0%**, 422K params, overfits at E31
+- [X] MLP baseline, 100 epochs → **65.0%**, 687K params, overfits at E10
+- [X] Core CE-only, 100 epochs → **63.4%**, 820K params, CV=0.70, never converges
+- [X] Core CE+CV, 100 epochs → **62.7%**, 820K params, CV=0.61, worse than CE-only
+- [X] Core 32 anchors, interrupted E20 → **59.2%**, 1.8M params, slow convergence
+- [X] Full trainer GELU, 100 epochs → **88.0%**, 1.6M params (original proven result)
+- [X] Full trainer SquaredReLU, 100 epochs → **88.0%**, 1.6M params, E96 best
+### Spectral Encoder Experiments
+- [F] Spectral v1: flat FFT → 768-d → single constellation → **collapsed**
+  - Cause: concat norm √48≈6.93 vs anchor norm 1, not on same sphere
+- [F] Spectral v2: per-band constellation (48×64=3072 anchors) → **~35%**
+  - Cause: 3072 tri dims too diffuse, InfoNCE dead at 0.45, no cross-band structure
+- [F] Spectral v3: FFT → 8 channels (spherical mean) → 128 anchors → **27%**
+  - Cause: cos≈0.99, spherical mean collapsed all images to same point
+- [P] Spectral v4: STFT + Cholesky + SVD → S^43 → 64 anchors → **46.8%** (still running)
+  - CE carrying alone, CosineEmbeddingLoss frozen at 0.346, InfoNCE dead at 0.15
+  - Cholesky+SVD signature IS discriminative, contrastive losses unable to contribute
+---
+## CATEGORY 1: SIGNAL DECOMPOSITION TO GEOMETRY
+### 1.1 Wavelet Scattering Transform (Mallat)
+**Formula**: S_J[p]x(u) = |||x * ψ_{λ₁}| * ψ_{λ₂}| ... | * φ_{2^J}(u)
+**Library**: kymatio (pip install kymatio)
+**Expected output**: ~10K-dim feature vector for 32×32
+**Literature baseline**: ~82% CIFAR-10 with SVM, ~70.5% with linear
+**Properties**: Deterministic, Lipschitz-continuous, approximately energy-preserving
+- [ ] **1.1a** Scattering order 2, J=2, L=8 → L2 normalize → flat constellation on S^d
+  - Hypothesis: scattering features are rich enough that flat constellation should work
+  - Compare: direct linear classifier on scattering vs constellation pipeline
+- [ ] **1.1b** Scattering → JL projection to S^127 → constellation (64 anchors)
+  - JL preserves distances; S^127 matches our proven dim
+- [ ] **1.1c** Scattering → JL → S^43 → Cholesky/SVD signature → constellation
+  - Stack v4's geometric signature on top of scattering features
+- [ ] **1.1d** Scattering order 1 vs order 2 ablation
+  - Order 1 is ~Gabor magnitude; order 2 adds inter-frequency structure
+- [ ] **1.1e** Scattering + InfoNCE: does augmentation invariance help or hurt?
+  - Scattering is already translation-invariant; InfoNCE may be redundant
+- [ ] **1.1f** Scattering hybrid: scattering front-end + lightweight learned projection + constellation
+  - Test minimal learned params needed to bridge the 82→88% gap
+### 1.2 Gabor Filter Banks
+**Formula**: g(x,y) = exp(−(x'²+γ²y'²)/(2σ²)) · exp(i(2πx'/λ+ψ))
+**Expected**: S scales × K orientations → S×K magnitude responses
+**Properties**: Deterministic, O(N·S·K), first-order scattering ≈ Gabor modulus
+- [ ] **1.2a** Gabor bank (4 scales × 8 orientations = 32 filters) → L2 norm → S^31
+  - Each filter response is a spatial map; pool to scalar per filter
+- [ ] **1.2b** Gabor → per-filter spatial statistics (mean, std, skew, kurtosis) → S^127
+  - 32 filters × 4 stats = 128-d, matches conv encoder output dim
+- [ ] **1.2c** Gabor vs scattering order 1 A/B test
+  - Validate that scattering order 1 ≈ Gabor + modulus
+### 1.3 Radon Transform
+**Formula**: Rf(ω,t) = ∫ f(x) δ(x·ω − t) dx
+**Properties**: Deterministic, exactly invertible via filtered back-projection
+- [ ] **1.3a** Radon at K angles → sinogram → L2 norm per angle → K points on S^d
+  - K angles = K geometric addresses, constellation measures the cloud
+- [ ] **1.3b** Radon → 1D wavelet per projection (= ridgelet) → aggregate to S^d
+  - Composition: Radon → Ridgelet, captures linear singularities
+### 1.4 Curvelet Transform
+**Formula**: c_{j,l,k} = ⟨f, φ_{j,l,k}⟩, parabolic scaling: width ≈ length²
+**Properties**: Deterministic, exactly invertible (tight frame), O(N² log N)
+- [ ] **1.4a** Curvelet energy per (scale, orientation) band → L2 norm → S^d
+  - Captures directional frequency that scattering misses
+- [ ] **1.4b** Curvelet + scattering concatenation → JL → constellation
+  - Test complementarity of isotropic (scattering) + anisotropic (curvelet) features
+### 1.5 Persistent Homology (TDA)
+**Formula**: Track birth/death of β₀ (components), β₁ (loops) across filtration
+**Library**: giotto-tda or ripser
+**Properties**: Deterministic, O(n³), captures topology no other transform sees
+- [ ] **1.5a** Sublevel set filtration on grayscale → persistence image → L2 norm → S^d
+- [ ] **1.5b** PH on scattering feature maps (topology of the representation)
+  - Captures whether scattering features form clusters, loops, voids
+- [ ] **1.5c** PH Betti curve as additional channel in multi-signature pipeline
+- [ ] **1.5d** PH standalone classification baseline on CIFAR-10
+  - Literature suggests ~60-70% standalone; valuable as complementary signal
+### 1.6 STFT Variants (improving v4)
+- [ ] **1.6a** 2D STFT via patch-wise FFT (overlapping patches) instead of row/col STFT
+  - True spatial-frequency decomposition vs row+col approximation
+- [ ] **1.6b** STFT with larger n_fft=32 (current: 16) → more frequency resolution
+- [ ] **1.6c** STFT preserving phase (not just magnitude) via analytic signal
+  - Phase encodes spatial structure; current pipeline discards it
+- [ ] **1.6d** Multi-window STFT (different window sizes for different frequency ranges)
+---
+## CATEGORY 2: MANIFOLD STRUCTURES
+### 2.1 Hopf Fibration
+**Formula**: h(z₁,z₂) = (2z̄₁z₂, |z₁|²−|z₂|²) : S³ → S²
+**Properties**: Deterministic, O(1), hierarchical (base + fiber)
+- [ ] **2.1a** Encode 4-d feature vectors on S³ → Hopf project to S² + fiber coordinate
+  - Coarse triangulation on S², fine discrimination in fiber
+- [ ] **2.1b** Quaternionic Hopf S⁷ → S⁴ for 8-d features
+  - Natural for 8-channel spectral decomposition (v3/v4 channel count)
+- [ ] **2.1c** Hopf foliation spherical codes for anchor initialization
+  - Replace uniform_hypersphere_init with Hopf-structured codes
+- [ ] **2.1d** Hierarchical constellation: coarse anchors on base S², fine anchors per fiber
+### 2.2 Grassmannian Class Representations
+**Formula**: Class = k-dim subspace of ℝⁿ, distances via principal angles
+**Properties**: Requires SVD, O(nk²)
+- [ ] **2.2a** Replace class vectors with class subspaces on Gr(k,n)
+  - Each class owns a k-dim subspace; classification = nearest subspace
+  - Literature: +1.3% on ImageNet over single class vectors
+- [ ] **2.2b** Grassmannian distance metrics ablation: geodesic vs chordal vs projection
+- [ ] **2.2c** Per-class anchor subspace: each anchor defines a subspace, not a point
+### 2.3 Flag Manifold (Nested Subspace Hierarchy)
+**Formula**: V₁ ⊂ V₂ ⊂ ... ⊂ Vₖ, nested subspaces
+**Properties**: Generalizes Grassmannian, natural for multi-resolution
+- [ ] **2.3a** Flag decomposition of frequency channels (DC ⊂ low ⊂ mid ⊂ high)
+  - Test whether nesting constraint improves spectral encoder
+- [ ] **2.3b** Flag-structured anchors: coarse-to-fine anchor hierarchy
+### 2.4 Von Mises-Fisher Mixture
+**Formula**: f(x; μ, κ) = C_p(κ) exp(κ μᵀx), soft clustering on S^d
+**Properties**: Natural density model for hyperspherical data
+- [ ] **2.4a** Replace hard nearest-anchor assignment with vMF soft posteriors
+  - p(j|x) = α_j f(x;μ_j,κ_j) / Σ α_k f(x;μ_k,κ_k)
+  - Learned κ per anchor = adaptive influence radius
+- [ ] **2.4b** vMF mixture EM for anchor initialization (replace uniform hypersphere init)
+- [ ] **2.4c** vMF concentration κ as a diagnostic: track per-class κ convergence
+### 2.5 Optimal Anchor Placement
+- [ ] **2.5a** E₈ lattice anchors for 8-d constellation (240 maximally separated points)
+- [ ] **2.5b** Spherical t-design initialization vs uniform hypersphere init
+- [ ] **2.5c** Thomson problem solver for N anchors on S^d (energy minimization)
+  - Compare: QR + iterative repulsion (current) vs Coulomb energy minimization
+---
+## CATEGORY 3: COMPACT REPRESENTATIONS
+### 3.1 Random Fourier Features
+**Formula**: z(x) = √(2/D) [cos(ω₁ᵀx+b₁), ..., cos(ωDᵀx+bD)]
+**Properties**: Pseudo-deterministic, preserves kernel structure, maps to S^d via cos/sin
+- [ ] **3.1a** RFF on raw pixels → S^d → constellation
+  - Baseline: how much does nonlinear kernel approximation help raw pixels?
+- [ ] **3.1b** RFF on scattering features → constellation
+  - Composition: scattering (linear invariants) → RFF (nonlinear kernel)
+- [ ] **3.1c** Fourier feature positional encoding (Tancik/Mildenhall style)
+  - γ(v) = [cos(2πBv), sin(2πBv)]ᵀ explicitly maps to hypersphere
+### 3.2 Johnson-Lindenstrauss Projection
+**Formula**: f(x) = (1/√k)Ax, preserves distances with k = O(ε⁻² log n)
+**Properties**: Pseudo-deterministic, near-isometric
+- [ ] **3.2a** JL from scattering (~10K) to 128-d → L2 norm → constellation
+  - Test: does JL + L2 norm preserve enough structure?
+- [ ] **3.2b** JL target dimension sweep: 32, 64, 128, 256, 512
+  - Find minimum k where constellation accuracy saturates
+- [ ] **3.2c** Fast JL (randomized Hadamard) vs Gaussian JL speed/accuracy tradeoff
+### 3.3 Compressed Sensing on Scattering Coefficients
+**Formula**: y = Φx, recover via ℓ₁ minimization if x is k-sparse
+**Properties**: Exact recovery for sparse signals, O(k log(N/k)) measurements
+- [ ] **3.3a** Measure sparsity of scattering coefficients (how many are near-zero?)
+  - If sparse: CS can compress much more than JL
+- [ ] **3.3b** CS measurement matrix → L2 norm → constellation
+  - Compare: CS vs JL at same target dimension
+### 3.4 Spherical Harmonics
+**Formula**: Y_l^m(θ,φ), complete basis on S², (l_max+1)² coefficients
+**Properties**: Deterministic, native Fourier on sphere, exactly invertible
+- [ ] **3.4a** Expand constellation triangulation profile in spherical harmonics
+  - Which angular frequencies carry discriminative info?
+- [ ] **3.4b** Spherical harmonic coefficients of embedding distribution as class signature
+- [ ] **3.4c** Hyperspherical harmonics for S^15 and S^43 (higher-dim generalization)
+---
+## CATEGORY 4: INVERTIBLE GEOMETRIC TRANSFORMS
+### 4.1 Stereographic Projection
+**Formula**: σ(x) = x_{1:n}/(1−x_{n+1}), σ⁻¹(y) = (2y, ‖y‖²−1)/(‖y‖²+1)
+**Properties**: Conformal bijection S^n\{pole} ↔ ℝⁿ, preserves angles
+- [ ] **4.1a** Stereographic → Euclidean scattering → inverse stereographic → S^d
+  - Apply scattering in flat space, project back to sphere
+- [ ] **4.1b** Stereographic projection as constellation readout alternative
+  - Instead of triangulation distances, read local coordinates via stereographic
+### 4.2 Exponential / Logarithmic Maps
+**Formula**: exp_p(v) = cos(‖v‖)·p + sin(‖v‖)·v/‖v‖
+**Formula**: log_p(q) = arccos(⟨q,p⟩) · (q−⟨q,p⟩p)/‖q−⟨q,p⟩p‖
+**Properties**: Deterministic, locally invertible, O(n)
+- [ ] **4.2a** Replace triangulation (1−cos) with log map coordinates at each anchor
+  - Log map gives direction + distance in tangent space (richer than scalar distance)
+  - Each anchor contributes d-dim tangent vector instead of 1-d distance
+- [ ] **4.2b** Log map triangulation → parallel transport to common tangent space → aggregate
+  - Geometrically principled alternative to patchwork concatenation
+### 4.3 Parallel Transport
+**Formula**: Γ^q_p(v) = v − (⟨v,p⟩+⟨v,q⟩/(1+⟨p,q⟩))·(p+q) on S^n
+**Properties**: Isometric between tangent spaces, exactly invertible
+- [ ] **4.3a** Compute log maps at K anchors → parallel transport all to north pole → aggregate
+  - Creates a canonical tangent-space representation independent of anchor positions
+- [ ] **4.3b** Parallel transport as inter-anchor communication in constellation
+  - How does the same input look from different anchor tangent spaces?
+### 4.4 Möbius Transformations
+**Formula**: h_ω(z) = [(1−‖ω‖²)/‖z−ω‖²](z−ω) − ω
+**Properties**: Conformal automorphism of S^d, invertible, O(d)
+- [ ] **4.4a** Möbius "geometric attention": transform sphere to zoom into anchor regions
+  - Expand region near anchor, compress far regions
+  - Each anchor applies its own Möbius transform before measuring distance
+- [ ] **4.4b** Composition of Möbius transforms as normalizing flow on S^d
+  - Learned flow that warps embedding distribution toward better separation
+### 4.5 Procrustes + Polar Decomposition
+**Formula**: R* = argmin_R ‖RA−B‖_F = UVᵀ from SVD(BᵀA)
+**Formula**: A = UP (rotation × stretch)
+- [ ] **4.5a** Procrustes-align channel cloud to canonical pose before Cholesky/SVD
+  - Remove rotation variability, isolate shape information
+- [ ] **4.5b** Polar decomposition of channel matrix: U (rotation) + P (stretch) as separate features
+  - U encodes orientation of frequency cloud; P encodes shape/scale
+  - Both are geometric, both are deterministic from the channel matrix
+---
+## CATEGORY 5: MATRIX DECOMPOSITION SIGNATURES
+### 5.1 Already Tested
+- [X] Cholesky of Gram matrix → 36 lower-tri values (in v4, working)
+- [X] SVD singular values → 8 values (in v4, working)
+- [X] Concatenated 44-d signature on S^43 → 46.8% with CE-only
+### 5.2 Remaining Decompositions
+- [ ] **5.2a** QR decomposition: Q (rotation) and R diagonal (scale per channel)
+  - R diagonal = per-channel magnitude; Q = inter-channel angular structure
+- [ ] **5.2b** Schur decomposition: T diagonal = eigenvalues, T off-diagonal = coupling
+  - For the Gram matrix: Schur gives eigenstructure in triangular form
+- [ ] **5.2c** Eigendecomposition of Gram: eigenvalues as spectral signature
+  - Compare: eigenvalues vs SVD singular values vs Cholesky diagonal
+  - These are related but not identical (λ_i = σ_i² for Gram = AᵀA)
+- [ ] **5.2d** NMF of magnitude spectrum: parts-based decomposition
+  - Requires iterative optimization (not fully deterministic)
+  - But finds additive, non-negative parts — texture components
+- [ ] **5.2e** Tucker tensor decomposition of spatial×frequency×channel tensor
+  - 3D structure: (H, W, freq_bins) per color channel
+  - Core tensor encodes interactions between spatial, frequency, channel modes
+---
+## CATEGORY 6: INFORMATION-THEORETIC LOSSES
+### 6.1 Already Tested
+- [X] InfoNCE (self-contrastive, two augmented views) — dead at 0.15 in spectral v4
+- [X] CosineEmbeddingLoss — frozen at 0.346 (margin-saturated)
+- [X] CV loss (Cayley-Menger volume) — running but not in 0.18-0.25 band
+### 6.2 Loss Modifications
+- [ ] **6.2a** Drop contrastive losses entirely, CE-only + geometric losses
+  - v4 shows CE is the only contributor; contrastive is dead weight
+  - Hypothesis: removing dead losses may speed convergence
+- [ ] **6.2b** Class-conditional InfoNCE: positive = same class, not same image
+  - Requires labels but gives much stronger supervision signal
+- [ ] **6.2c** vMF-based contrastive loss: replace dot-product similarity with vMF log-likelihood
+  - κ-adaptive: high-κ for nearby pairs, low-κ for far pairs
+- [ ] **6.2d** Fisher-Rao distance as loss: d_FR(p,q) = 2·arccos(∫√(pq))
+  - Natural distance for distributions on the sphere
+- [ ] **6.2e** Sliced spherical Wasserstein distance as distribution matching loss
+  - Matches embedding distribution to target (e.g., uniform on sphere)
+- [ ] **6.2f** Geometric autograd (from GM3): tangential projection + separation preservation
+  - Adam + geometric autograd > AdamW on geometric tasks (proven)
+  - Operates on gradient direction, not loss value
+### 6.3 Anchor Management
+- [ ] **6.3a** Anchor push frequency sweep: every 10, 25, 50, 100, 200 batches
+- [ ] **6.3b** Anchor push with vMF-weighted centroids instead of hard class centroids
+- [ ] **6.3c** Anchor birth/death: add anchors where density is high, remove where unused
+- [ ] **6.3d** Anchor dropout sweep: 0%, 5%, 15%, 30%, 50%
+---
+## CATEGORY 7: COMPOSITE PIPELINE TESTS
+### 7.1 The Reference Pipeline (from research article)
+- [ ] **7.1a** Scattering(J=2,L=8) → JL(128) → L2 norm → constellation(64) → classify
+  - The "canonical" pipeline; expected ~75-80% based on literature
+- [ ] **7.1b** Same as 7.1a but with learned 2-layer projection replacing JL
+  - Minimal learned params (~16K), test if projection adaptation matters
+- [ ] **7.1c** Scattering → curvelet energy → concat → JL → constellation
+  - Test complementarity
+### 7.2 Hybrid: Spectral + Scattering
+- [ ] **7.2a** STFT channels (v4) + scattering features → concat → JL → S^d → constellation
+  - STFT gives spatial-frequency; scattering gives multi-scale invariants
+- [ ] **7.2b** Scattering → Cholesky Gram + SVD signature → constellation
+  - Apply v4's geometric signature to scattering output instead of STFT
+### 7.3 Multi-Signature Constellation
+- [ ] **7.3a** Parallel extraction: scattering + Gabor + Radon → separate constellations → fusion
+  - Each primitive captures different geometric aspect
+  - Fusion: concatenate patchwork outputs → shared classifier
+- [ ] **7.3b** Hierarchical constellation: scattering → coarse anchors → residual → fine anchors
+  - Two-stage: first stage identifies broad category, second refines
+### 7.4 Minimal Learned Params Tests
+- [ ] **7.4a** Best deterministic pipeline + 1 learned linear layer (d_in → 128) before constellation
+  - Measure: how much does a single projection layer help?
+  - Count: exact learned param count
+- [ ] **7.4b** Same as 7.4a but with SquaredReLU + LayerNorm (the proven patchwork block)
+- [ ] **7.4c** Sweep learned projection sizes: 0, 1K, 5K, 10K, 50K, 100K params
+  - Find the elbow where adding params stops helping
+---
+## PRIORITY QUEUE (recommended execution order)
+### Tier 1: Highest Expected Impact
+1. **1.1a** — Scattering + flat constellation (the literature leader)
+2. **1.1b** — Scattering + JL → S^127 + constellation
+3. **6.2a** — Drop dead contrastive losses from v4, measure CE-only ceiling
+4. **2.4a** — vMF soft assignment replacing hard nearest-anchor
+5. **4.2a** — Log map triangulation (richer than scalar distance)
+### Tier 2: High Expected Impact
+6. **7.1a** — Full reference pipeline
+7. **1.1f** — Scattering hybrid with minimal learned projection
+8. **1.2b** — Gabor spatial statistics → S^127
+9. **5.2c** — Eigendecomposition vs SVD vs Cholesky ablation
+10. **2.1b** — Quaternionic Hopf S⁷→S⁴ for 8-channel data
+### Tier 3: Exploratory
+11. **1.5a** — Persistent homology standalone
+12. **3.1b** — RFF on scattering features
+13. **4.4a** — Möbius geometric attention
+14. **7.3a** — Multi-signature parallel constellations
+15. **2.2a** — Grassmannian class subspaces
+### Tier 4: Deep Exploration
+16. **1.3a** — Radon cloud on S^d
+17. **1.4b** — Curvelet + scattering concat
+18. **2.3a** — Flag decomposition of frequency channels
+19. **4.3a** — Parallel transport aggregation
+20. **3.4c** — Hyperspherical harmonics analysis
+---
+## RUNNING SCOREBOARD
+| Experiment | Val Acc | Params (learned) | CV | Anchors Active | InfoNCE | Key Finding |
+|---|---|---|---|---|---|---|
+| Linear baseline | 67.0% | 423K | — | — | — | Overfits E31 |
+| MLP baseline | 65.0% | 687K | — | — | — | Overfits E10 |
+| Core CE-only | 63.4% | 820K | 0.70 | — | — | CV never converges |
+| Core CE+CV | 62.7% | 820K | 0.61 | — | — | CV hurts accuracy |
+| Full GELU | 88.0% | 1.6M | 0.14-0.17 | 64/64 | 1.00 | Reference |
+| Full SquaredReLU | 88.0% | 1.6M | 0.15 | 64/64 | 1.00 | Matches GELU |
+| Spectral v1 (flat FFT) | FAIL | — | — | 1/64 | — | Norm mismatch |
+| Spectral v2 (per-band) | ~35% | 1.2M | 0.17-0.19 | 900/3072 | 0.45 | Too diffuse |
+| Spectral v3 (sph mean) | ~27% | 130K | 0.27-0.34 | 110/128 | 0.35 | Collapsed to point |
+| Spectral v4 (STFT+Chol+SVD) | 46.8% | 137K | 0.52-0.66 | 53/64 | 0.15 | CE-only carry |
+| *Scattering baseline* | *~82%** | *0* | *—* | *—* | *—* | *Literature (SVM)* |
+*Italicized entries are literature values, not our runs*
+---
+## NOTES & INSIGHTS
+### Why contrastive losses die on deterministic encoders
+The STFT/FFT faithfully reports every pixel-level difference between augmented views.
+Two crops of the same image produce signatures as different as two different images.
+Without a learned layer to absorb augmentation variance, InfoNCE has nothing to align.
+Solutions: (a) augmentation-invariant features (scattering), (b) thin learned projection,
+(c) class-conditional contrastive (6.2b), (d) drop contrastive entirely (6.2a).
+### The Cholesky insight
+L diagonal encodes "new angular information per tier given all lower tiers."
+This IS discriminative (proved by v4 reaching 46.8% with CE alone).
+The 44-d signature on S^43 carries real inter-channel geometry.
+Next question: is the STFT front-end the bottleneck, or the 44-d signature?
+### Scattering is the clear next step
+82% on CIFAR-10 with zero learned params (literature) vs our 46.8%.
+Scattering is translation-invariant AND deformation-stable (Lipschitz).
+This directly addresses the augmentation sensitivity problem.
+kymatio provides GPU-accelerated PyTorch implementation.
+### The dimension question
+S^15 (band_dim=16) vs S^43 (signature) vs S^127 (conv encoder output)
+E₈ lattice gives 240 optimal anchors on S^7
+Proven CV attractor at ~0.20 is on S^15
+Need to test which target sphere dimension is optimal for spectral features
+---
+*Last updated: 2026-03-18, session with Opus*
+*Next: run scattering baseline (1.1a), then decide pipeline direction*