AbstractPhil commited on
Commit
41f7796
Β·
verified Β·
1 Parent(s): b6baa53

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +210 -3
README.md CHANGED
@@ -1,3 +1,210 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - geometric-deep-learning
5
+ - vae-analysis
6
+ - latent-space
7
+ - diffusion-models
8
+ - 3d-classification
9
+ - pytorch
10
+ - research
11
+ datasets:
12
+ - synthetic-geometric-primitives
13
+ metrics:
14
+ - accuracy
15
+ pipeline_tag: image-classification
16
+ ---
17
+
18
+ # Grid Geometric Classifier β€” Sliding Window VAE Analysis
19
+
20
+ A **638K parameter classifier** trained on 38 synthetic geometric primitives that reads the intrinsic manifold structure of diffusion model VAE latent spaces. This tool enables geometric fingerprinting of any VAE by extracting and classifying local geometric patterns at multiple scales.
21
+
22
+ ## Key Finding
23
+
24
+ **Diffusion model VAEs learn consistent geometric structure β€” not noise.**
25
+
26
+ | VAE | Dominant Geometry | Confidence |
27
+ |-----|-------------------|------------|
28
+ | SD 1.5 | Saddle (57%) + Pentachoron (35%) | 0.880 |
29
+ | SDXL | Saddle (53%) + Pentachoron (30%) | 0.874 |
30
+ | Flux.1 | **Pentachoron (31%) + Plane (29%) + Saddle (15%)** | 0.878 |
31
+ | Flux.2 | Saddle (70%) + Pentachoron (21%) | 0.875 |
32
+
33
+ **Flux.1 is the geometric outlier** β€” it learned a richer, more diverse latent geometry while SD 1.5, SDXL, and Flux.2 converged to saddle-dominated hyperbolic manifolds.
34
+
35
+ ## Architecture
36
+
37
+ ```
38
+ Input: (B, 8, 16, 16) binary voxel grid
39
+ ↓
40
+ Patch Decomposition: 2Γ—4Γ—4 patches β†’ 64 patches per volume
41
+ ↓
42
+ Shared Patch Encoder (MLP + handcrafted features)
43
+ ↓
44
+ 3Γ— Cross-Attention Blocks (patches attend to each other)
45
+ ↓
46
+ Global Pool + Classification Heads
47
+ ↓
48
+ Output: 38 classes + dimension (0-3D) + curvature type
49
+ ```
50
+
51
+ **Parameters:** 638,387
52
+ **Patch Grid:** 4Γ—4Γ—4 macro grid of 2Γ—4Γ—4 local patches
53
+ **Attention:** 8 heads, 128 embed dim, 3 layers
54
+
55
+ ## 38 Geometric Classes
56
+
57
+ | Dimension | Flat | Curved |
58
+ |-----------|------|--------|
59
+ | **0D** | point | β€” |
60
+ | **1D** | line_x, line_y, line_z, line_diag, cross, l_shape, collinear | arc, helix |
61
+ | **2D** | triangle_xy, triangle_xz, triangle_3d, square_xy, square_xz, rectangle, coplanar, plane | circle, ellipse, disc |
62
+ | **3D** | tetrahedron, pyramid, pentachoron, cube, cuboid, triangular_prism, octahedron | sphere, hemisphere, cylinder, cone, capsule, torus, shell, tube, bowl, saddle |
63
+
64
+ **Curvature Types:** none, convex, concave, cylindrical, conical, toroidal, hyperbolic, helical
65
+
66
+ ## Quick Start
67
+
68
+ ```python
69
+ import torch
70
+ from cell2_model import PatchCrossAttentionClassifier, CLASS_NAMES, CURVATURE_NAMES
71
+
72
+ # Load classifier
73
+ model = PatchCrossAttentionClassifier(n_classes=38)
74
+ model.load_state_dict(torch.load('best_vae_ca_classifier.pt', map_location='cpu'))
75
+ model.eval()
76
+
77
+ # Classify a binary voxel grid
78
+ grid = torch.zeros(1, 8, 16, 16) # Your binarized patch
79
+ with torch.no_grad():
80
+ out = model(grid)
81
+
82
+ pred_class = CLASS_NAMES[out['class_logits'].argmax()]
83
+ pred_dim = out['dim_logits'].argmax().item()
84
+ is_curved = out['is_curved_pred'].squeeze() > 0
85
+ pred_curv = CURVATURE_NAMES[out['curv_type_logits'].argmax()]
86
+
87
+ print(f"Shape: {pred_class}, Dimension: {pred_dim}D, Curved: {is_curved}, Curvature: {pred_curv}")
88
+ ```
89
+
90
+ ## Full VAE Analysis Pipeline
91
+
92
+ ```python
93
+ # Cell 1: Shape generator (training data)
94
+ from cell1_shape_generator import ShapeGenerator, CLASS_NAMES, NUM_CLASSES
95
+
96
+ # Cell 2: Model architecture
97
+ from cell2_model import PatchCrossAttentionClassifier
98
+
99
+ # Cell 3: Training (if retraining)
100
+ # python cell3_trainer.py
101
+
102
+ # Cell 4: Multi-scale extraction from VAE latents
103
+ from cell4_vae_pipeline import MultiScaleExtractor, ExtractionConfig
104
+
105
+ # Cell 5: Single VAE analysis
106
+ # python cell5_quad_vae_geometric_analysis.py
107
+
108
+ # Cell 6: Multi-VAE comparison
109
+ # python cell6_quad_vae_analysis_mega_liminal.py
110
+ ```
111
+
112
+ ## Extraction Pipeline
113
+
114
+ The pipeline extracts geometric structure from VAE latents at multiple scales:
115
+
116
+ ```python
117
+ config = ExtractionConfig(
118
+ scales=[(16, 64, 64), (8, 32, 32), (8, 16, 16), (4, 8, 8)],
119
+ canonical_shape=(8, 16, 16),
120
+ confidence_threshold=0.6,
121
+ overlap=0.5,
122
+ )
123
+
124
+ extractor = MultiScaleExtractor(classifier, config)
125
+ result = extractor.extract_from_latent(vae_latent, channel_groups)
126
+
127
+ # Returns: raw_annotations, deviance_annotations
128
+ # Each annotation contains: class, confidence, scale, dimension, curvature, location
129
+ ```
130
+
131
+ **Two extraction modes:**
132
+ 1. **Raw:** Treat channels as depth dimension directly
133
+ 2. **Deviance:** Compute inter-channel differences, classify the relational geometry
134
+
135
+ ## Results: Why Saddles?
136
+
137
+ Saddle points dominate because **they're optimal for generative models**:
138
+
139
+ - **Steering capacity:** Small noise changes push trajectories toward different modes
140
+ - **Mode separation:** Unstable directions at saddles = decision boundaries between outputs
141
+ - **Exponential coverage:** Hyperbolic geometry packs more representations per dimension
142
+
143
+ The VAE didn't learn saddles by accident β€” it's the natural geometry for a diffusion decoder's latent manifold.
144
+
145
+ **Flux.1's difference:** More planar cross-sections (29%) and balanced primitives suggest a different optimization path. The batch norm weights in Flux.2 (`bn.running_var`, `bn.running_mean`) may be collapsing this richer structure back to hyperbolic.
146
+
147
+ ## Per-Scale Findings
148
+
149
+ | Scale | Dominant Class | Interpretation |
150
+ |-------|----------------|----------------|
151
+ | L0 (16Γ—64Γ—64) | Pentachoron 73% | Macro-level 5-simplex structure |
152
+ | L1 (8Γ—32Γ—32) | Pentachoron 60% | Transitional |
153
+ | L2 (8Γ—16Γ—16) | Plane 40% | Mid-level planar cross-sections |
154
+ | L3 (4Γ—8Γ—8) | Saddle 59% | Local hyperbolic curvature |
155
+
156
+ The hierarchy: **pentachorons organize the global structure, saddles dominate locally.**
157
+
158
+ ## Files
159
+
160
+ | File | Description |
161
+ |------|-------------|
162
+ | `best_vae_ca_classifier.pt` | Trained classifier weights (2.58 MB) |
163
+ | `cell1_shape_generator.py` | 38-class synthetic shape generator |
164
+ | `cell2_model.py` | PatchCrossAttentionClassifier architecture |
165
+ | `cell3_trainer.py` | Training pipeline with augmentation |
166
+ | `cell4_vae_pipeline.py` | Multi-scale batched extraction |
167
+ | `cell5_quad_vae_geometric_analysis.py` | Single VAE analysis script |
168
+ | `cell6_quad_vae_analysis_mega_liminal.py` | Multi-VAE comparison script |
169
+ | `liminal.zip` | Test image dataset (957 images) |
170
+ | `mega_liminal_captioned.zip` | Extended dataset (2074 images) |
171
+ | `multi_vae_comparison_*.json` | Raw comparison results |
172
+
173
+ ## Training
174
+
175
+ The classifier was trained on **76,000 synthetic shapes** (2000 per class Γ— 38 classes) generated procedurally:
176
+
177
+ ```python
178
+ gen = ShapeGenerator(seed=42)
179
+ train_data = gen.generate_dataset(n_per_class=2000, seed=42)
180
+ ```
181
+
182
+ **Training config:**
183
+ - 60 epochs, batch size 1024
184
+ - AdamW, lr=3e-3, cosine annealing
185
+ - Multi-task loss: classification + dimension + curved + curvature type
186
+ - Augmentation: voxel dropout, boundary addition, small translation
187
+
188
+ ## Citation
189
+
190
+ ```bibtex
191
+ @misc{abstractphil2025geometric,
192
+ author = {AbstractPhil},
193
+ title = {Grid Geometric Classifier: Reading VAE Latent Manifold Structure},
194
+ year = {2025},
195
+ publisher = {HuggingFace},
196
+ url = {https://huggingface.co/AbstractPhil/grid-geometric-classifier-sliding-proto}
197
+ }
198
+ ```
199
+
200
+ ## Related Work
201
+
202
+ This classifier is part of a broader research program on **geometric deep learning with pentachoron structures** β€” replacing learned embeddings with navigable k-simplex lattices. Key results include:
203
+
204
+ - **85% MNIST with ~750 parameters** (geometry encodes structure, learning only navigates)
205
+ - **72KB ImageNet classification head** (parameter efficiency through geometric priors)
206
+ - **Crystalline vocabulary systems** representing tokens as 5-vertex structures
207
+
208
+ ## License
209
+
210
+ MIT