File size: 7,474 Bytes
41f7796 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 | ---
license: mit
tags:
- geometric-deep-learning
- vae-analysis
- latent-space
- diffusion-models
- 3d-classification
- pytorch
- research
datasets:
- synthetic-geometric-primitives
metrics:
- accuracy
pipeline_tag: image-classification
---
# Grid Geometric Classifier β Sliding Window VAE Analysis
A **638K parameter classifier** trained on 38 synthetic geometric primitives that reads the intrinsic manifold structure of diffusion model VAE latent spaces. This tool enables geometric fingerprinting of any VAE by extracting and classifying local geometric patterns at multiple scales.
## Key Finding
**Diffusion model VAEs learn consistent geometric structure β not noise.**
| VAE | Dominant Geometry | Confidence |
|-----|-------------------|------------|
| SD 1.5 | Saddle (57%) + Pentachoron (35%) | 0.880 |
| SDXL | Saddle (53%) + Pentachoron (30%) | 0.874 |
| Flux.1 | **Pentachoron (31%) + Plane (29%) + Saddle (15%)** | 0.878 |
| Flux.2 | Saddle (70%) + Pentachoron (21%) | 0.875 |
**Flux.1 is the geometric outlier** β it learned a richer, more diverse latent geometry while SD 1.5, SDXL, and Flux.2 converged to saddle-dominated hyperbolic manifolds.
## Architecture
```
Input: (B, 8, 16, 16) binary voxel grid
β
Patch Decomposition: 2Γ4Γ4 patches β 64 patches per volume
β
Shared Patch Encoder (MLP + handcrafted features)
β
3Γ Cross-Attention Blocks (patches attend to each other)
β
Global Pool + Classification Heads
β
Output: 38 classes + dimension (0-3D) + curvature type
```
**Parameters:** 638,387
**Patch Grid:** 4Γ4Γ4 macro grid of 2Γ4Γ4 local patches
**Attention:** 8 heads, 128 embed dim, 3 layers
## 38 Geometric Classes
| Dimension | Flat | Curved |
|-----------|------|--------|
| **0D** | point | β |
| **1D** | line_x, line_y, line_z, line_diag, cross, l_shape, collinear | arc, helix |
| **2D** | triangle_xy, triangle_xz, triangle_3d, square_xy, square_xz, rectangle, coplanar, plane | circle, ellipse, disc |
| **3D** | tetrahedron, pyramid, pentachoron, cube, cuboid, triangular_prism, octahedron | sphere, hemisphere, cylinder, cone, capsule, torus, shell, tube, bowl, saddle |
**Curvature Types:** none, convex, concave, cylindrical, conical, toroidal, hyperbolic, helical
## Quick Start
```python
import torch
from cell2_model import PatchCrossAttentionClassifier, CLASS_NAMES, CURVATURE_NAMES
# Load classifier
model = PatchCrossAttentionClassifier(n_classes=38)
model.load_state_dict(torch.load('best_vae_ca_classifier.pt', map_location='cpu'))
model.eval()
# Classify a binary voxel grid
grid = torch.zeros(1, 8, 16, 16) # Your binarized patch
with torch.no_grad():
out = model(grid)
pred_class = CLASS_NAMES[out['class_logits'].argmax()]
pred_dim = out['dim_logits'].argmax().item()
is_curved = out['is_curved_pred'].squeeze() > 0
pred_curv = CURVATURE_NAMES[out['curv_type_logits'].argmax()]
print(f"Shape: {pred_class}, Dimension: {pred_dim}D, Curved: {is_curved}, Curvature: {pred_curv}")
```
## Full VAE Analysis Pipeline
```python
# Cell 1: Shape generator (training data)
from cell1_shape_generator import ShapeGenerator, CLASS_NAMES, NUM_CLASSES
# Cell 2: Model architecture
from cell2_model import PatchCrossAttentionClassifier
# Cell 3: Training (if retraining)
# python cell3_trainer.py
# Cell 4: Multi-scale extraction from VAE latents
from cell4_vae_pipeline import MultiScaleExtractor, ExtractionConfig
# Cell 5: Single VAE analysis
# python cell5_quad_vae_geometric_analysis.py
# Cell 6: Multi-VAE comparison
# python cell6_quad_vae_analysis_mega_liminal.py
```
## Extraction Pipeline
The pipeline extracts geometric structure from VAE latents at multiple scales:
```python
config = ExtractionConfig(
scales=[(16, 64, 64), (8, 32, 32), (8, 16, 16), (4, 8, 8)],
canonical_shape=(8, 16, 16),
confidence_threshold=0.6,
overlap=0.5,
)
extractor = MultiScaleExtractor(classifier, config)
result = extractor.extract_from_latent(vae_latent, channel_groups)
# Returns: raw_annotations, deviance_annotations
# Each annotation contains: class, confidence, scale, dimension, curvature, location
```
**Two extraction modes:**
1. **Raw:** Treat channels as depth dimension directly
2. **Deviance:** Compute inter-channel differences, classify the relational geometry
## Results: Why Saddles?
Saddle points dominate because **they're optimal for generative models**:
- **Steering capacity:** Small noise changes push trajectories toward different modes
- **Mode separation:** Unstable directions at saddles = decision boundaries between outputs
- **Exponential coverage:** Hyperbolic geometry packs more representations per dimension
The VAE didn't learn saddles by accident β it's the natural geometry for a diffusion decoder's latent manifold.
**Flux.1's difference:** More planar cross-sections (29%) and balanced primitives suggest a different optimization path. The batch norm weights in Flux.2 (`bn.running_var`, `bn.running_mean`) may be collapsing this richer structure back to hyperbolic.
## Per-Scale Findings
| Scale | Dominant Class | Interpretation |
|-------|----------------|----------------|
| L0 (16Γ64Γ64) | Pentachoron 73% | Macro-level 5-simplex structure |
| L1 (8Γ32Γ32) | Pentachoron 60% | Transitional |
| L2 (8Γ16Γ16) | Plane 40% | Mid-level planar cross-sections |
| L3 (4Γ8Γ8) | Saddle 59% | Local hyperbolic curvature |
The hierarchy: **pentachorons organize the global structure, saddles dominate locally.**
## Files
| File | Description |
|------|-------------|
| `best_vae_ca_classifier.pt` | Trained classifier weights (2.58 MB) |
| `cell1_shape_generator.py` | 38-class synthetic shape generator |
| `cell2_model.py` | PatchCrossAttentionClassifier architecture |
| `cell3_trainer.py` | Training pipeline with augmentation |
| `cell4_vae_pipeline.py` | Multi-scale batched extraction |
| `cell5_quad_vae_geometric_analysis.py` | Single VAE analysis script |
| `cell6_quad_vae_analysis_mega_liminal.py` | Multi-VAE comparison script |
| `liminal.zip` | Test image dataset (957 images) |
| `mega_liminal_captioned.zip` | Extended dataset (2074 images) |
| `multi_vae_comparison_*.json` | Raw comparison results |
## Training
The classifier was trained on **76,000 synthetic shapes** (2000 per class Γ 38 classes) generated procedurally:
```python
gen = ShapeGenerator(seed=42)
train_data = gen.generate_dataset(n_per_class=2000, seed=42)
```
**Training config:**
- 60 epochs, batch size 1024
- AdamW, lr=3e-3, cosine annealing
- Multi-task loss: classification + dimension + curved + curvature type
- Augmentation: voxel dropout, boundary addition, small translation
## Citation
```bibtex
@misc{abstractphil2025geometric,
author = {AbstractPhil},
title = {Grid Geometric Classifier: Reading VAE Latent Manifold Structure},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/AbstractPhil/grid-geometric-classifier-sliding-proto}
}
```
## Related Work
This classifier is part of a broader research program on **geometric deep learning with pentachoron structures** β replacing learned embeddings with navigable k-simplex lattices. Key results include:
- **85% MNIST with ~750 parameters** (geometry encodes structure, learning only navigates)
- **72KB ImageNet classification head** (parameter efficiency through geometric priors)
- **Crystalline vocabulary systems** representing tokens as 5-vertex structures
## License
MIT |