--- license: mit tags: - geometric-deep-learning - vae-analysis - latent-space - diffusion-models - 3d-classification - pytorch - research datasets: - synthetic-geometric-primitives metrics: - accuracy pipeline_tag: image-classification --- # Grid Geometric Classifier — Sliding Window VAE Analysis A **638K parameter classifier** trained on 38 synthetic geometric primitives that reads the intrinsic manifold structure of diffusion model VAE latent spaces. This tool enables geometric fingerprinting of any VAE by extracting and classifying local geometric patterns at multiple scales. ## Key Finding **Diffusion model VAEs learn consistent geometric structure — not noise.** | VAE | Dominant Geometry | Confidence | |-----|-------------------|------------| | SD 1.5 | Saddle (57%) + Pentachoron (35%) | 0.880 | | SDXL | Saddle (53%) + Pentachoron (30%) | 0.874 | | Flux.1 | **Pentachoron (31%) + Plane (29%) + Saddle (15%)** | 0.878 | | Flux.2 | Saddle (70%) + Pentachoron (21%) | 0.875 | **Flux.1 is the geometric outlier** — it learned a richer, more diverse latent geometry while SD 1.5, SDXL, and Flux.2 converged to saddle-dominated hyperbolic manifolds. ## Architecture ``` Input: (B, 8, 16, 16) binary voxel grid ↓ Patch Decomposition: 2×4×4 patches → 64 patches per volume ↓ Shared Patch Encoder (MLP + handcrafted features) ↓ 3× Cross-Attention Blocks (patches attend to each other) ↓ Global Pool + Classification Heads ↓ Output: 38 classes + dimension (0-3D) + curvature type ``` **Parameters:** 638,387 **Patch Grid:** 4×4×4 macro grid of 2×4×4 local patches **Attention:** 8 heads, 128 embed dim, 3 layers ## 38 Geometric Classes | Dimension | Flat | Curved | |-----------|------|--------| | **0D** | point | — | | **1D** | line_x, line_y, line_z, line_diag, cross, l_shape, collinear | arc, helix | | **2D** | triangle_xy, triangle_xz, triangle_3d, square_xy, square_xz, rectangle, coplanar, plane | circle, ellipse, disc | | **3D** | tetrahedron, pyramid, pentachoron, cube, cuboid, triangular_prism, octahedron | sphere, hemisphere, cylinder, cone, capsule, torus, shell, tube, bowl, saddle | **Curvature Types:** none, convex, concave, cylindrical, conical, toroidal, hyperbolic, helical ## Quick Start ```python import torch from cell2_model import PatchCrossAttentionClassifier, CLASS_NAMES, CURVATURE_NAMES # Load classifier model = PatchCrossAttentionClassifier(n_classes=38) model.load_state_dict(torch.load('best_vae_ca_classifier.pt', map_location='cpu')) model.eval() # Classify a binary voxel grid grid = torch.zeros(1, 8, 16, 16) # Your binarized patch with torch.no_grad(): out = model(grid) pred_class = CLASS_NAMES[out['class_logits'].argmax()] pred_dim = out['dim_logits'].argmax().item() is_curved = out['is_curved_pred'].squeeze() > 0 pred_curv = CURVATURE_NAMES[out['curv_type_logits'].argmax()] print(f"Shape: {pred_class}, Dimension: {pred_dim}D, Curved: {is_curved}, Curvature: {pred_curv}") ``` ## Full VAE Analysis Pipeline ```python # Cell 1: Shape generator (training data) from cell1_shape_generator import ShapeGenerator, CLASS_NAMES, NUM_CLASSES # Cell 2: Model architecture from cell2_model import PatchCrossAttentionClassifier # Cell 3: Training (if retraining) # python cell3_trainer.py # Cell 4: Multi-scale extraction from VAE latents from cell4_vae_pipeline import MultiScaleExtractor, ExtractionConfig # Cell 5: Single VAE analysis # python cell5_quad_vae_geometric_analysis.py # Cell 6: Multi-VAE comparison # python cell6_quad_vae_analysis_mega_liminal.py ``` ## Extraction Pipeline The pipeline extracts geometric structure from VAE latents at multiple scales: ```python config = ExtractionConfig( scales=[(16, 64, 64), (8, 32, 32), (8, 16, 16), (4, 8, 8)], canonical_shape=(8, 16, 16), confidence_threshold=0.6, overlap=0.5, ) extractor = MultiScaleExtractor(classifier, config) result = extractor.extract_from_latent(vae_latent, channel_groups) # Returns: raw_annotations, deviance_annotations # Each annotation contains: class, confidence, scale, dimension, curvature, location ``` **Two extraction modes:** 1. **Raw:** Treat channels as depth dimension directly 2. **Deviance:** Compute inter-channel differences, classify the relational geometry ## Results: Why Saddles? Saddle points dominate because **they're optimal for generative models**: - **Steering capacity:** Small noise changes push trajectories toward different modes - **Mode separation:** Unstable directions at saddles = decision boundaries between outputs - **Exponential coverage:** Hyperbolic geometry packs more representations per dimension The VAE didn't learn saddles by accident — it's the natural geometry for a diffusion decoder's latent manifold. **Flux.1's difference:** More planar cross-sections (29%) and balanced primitives suggest a different optimization path. The batch norm weights in Flux.2 (`bn.running_var`, `bn.running_mean`) may be collapsing this richer structure back to hyperbolic. ## Per-Scale Findings | Scale | Dominant Class | Interpretation | |-------|----------------|----------------| | L0 (16×64×64) | Pentachoron 73% | Macro-level 5-simplex structure | | L1 (8×32×32) | Pentachoron 60% | Transitional | | L2 (8×16×16) | Plane 40% | Mid-level planar cross-sections | | L3 (4×8×8) | Saddle 59% | Local hyperbolic curvature | The hierarchy: **pentachorons organize the global structure, saddles dominate locally.** ## Files | File | Description | |------|-------------| | `best_vae_ca_classifier.pt` | Trained classifier weights (2.58 MB) | | `cell1_shape_generator.py` | 38-class synthetic shape generator | | `cell2_model.py` | PatchCrossAttentionClassifier architecture | | `cell3_trainer.py` | Training pipeline with augmentation | | `cell4_vae_pipeline.py` | Multi-scale batched extraction | | `cell5_quad_vae_geometric_analysis.py` | Single VAE analysis script | | `cell6_quad_vae_analysis_mega_liminal.py` | Multi-VAE comparison script | | `liminal.zip` | Test image dataset (957 images) | | `mega_liminal_captioned.zip` | Extended dataset (2074 images) | | `multi_vae_comparison_*.json` | Raw comparison results | ## Training The classifier was trained on **76,000 synthetic shapes** (2000 per class × 38 classes) generated procedurally: ```python gen = ShapeGenerator(seed=42) train_data = gen.generate_dataset(n_per_class=2000, seed=42) ``` **Training config:** - 60 epochs, batch size 1024 - AdamW, lr=3e-3, cosine annealing - Multi-task loss: classification + dimension + curved + curvature type - Augmentation: voxel dropout, boundary addition, small translation ## Citation ```bibtex @misc{abstractphil2025geometric, author = {AbstractPhil}, title = {Grid Geometric Classifier: Reading VAE Latent Manifold Structure}, year = {2025}, publisher = {HuggingFace}, url = {https://huggingface.co/AbstractPhil/grid-geometric-classifier-sliding-proto} } ``` ## Related Work This classifier is part of a broader research program on **geometric deep learning with pentachoron structures** — replacing learned embeddings with navigable k-simplex lattices. Key results include: - **85% MNIST with ~750 parameters** (geometry encodes structure, learning only navigates) - **72KB ImageNet classification head** (parameter efficiency through geometric priors) - **Crystalline vocabulary systems** representing tokens as 5-vertex structures ## License MIT