--- license: apache-2.0 tags: - geometric-deep-learning - vae - patch-analysis - gate-vectors - text-to-geometry - rosetta-stone - multimodal - experimental - custom_code datasets: - AbstractPhil/synthetic-characters --- # GeoVocab Patch Maker **A geometric vocabulary extractor that reads structural properties from latent patches — and proved that text carries the same geometric structure as images.** This is a two-tier gated geometric transformer trained on 27 geometric primitives (point through channel) in 8×16×16 voxel grids. It extracts 17-dimensional gate vectors (explicit geometric properties) and 256-dimensional patch features (learned representations) from any compatible latent input. ## What It Does Takes an `(8, 16, 16)` tensor — originally voxel grids, but proven to work on adapted FLUX VAE latents and text-derived latent patches — and produces per-patch geometric descriptors: ```python from geometric_model import load_from_hub, extract_features model = load_from_hub() gate_vectors, patch_features = extract_features(model, patches) # gate_vectors: (N, 64, 17) — interpretable geometric properties # patch_features: (N, 64, 256) — learned representations ``` ### Gate Vector Anatomy (17 dimensions) | Dims | Property | Type | Meaning | |---|---|---|---| | 0–3 | dimensionality | softmax(4) | 0D point, 1D line, 2D surface, 3D volume | | 4–6 | curvature | softmax(3) | rigid, curved, combined | | 7 | boundary | sigmoid(1) | partial fill (surface patch) | | 8–10 | axis_active | sigmoid(3) | which axes have spatial extent | | 11–12 | topology | softmax(2) | open vs closed (neighbor-based) | | 13 | neighbor_density | sigmoid(1) | normalized neighbor count | | 14–16 | surface_role | softmax(3) | isolated, boundary, interior | Dimensions 0–10 are **local** (intrinsic to each patch, no cross-patch info). Dimensions 11–16 are **structural** (relational, computed after attention sees neighborhood context). ## Architecture ``` (8, 16, 16) input ↓ PatchEmbedding3D → (B, 64, 64) # 64 patches of 32 voxels each ↓ Stage 0: Local Encoder + Gate Heads # dims, curvature, boundary, axes ↓ proj([embedding, local_gates]) → (B, 64, 128) ↓ Stage 1: Bootstrap Transformer ×2 # standard attention with local context ↓ Stage 1.5: Structural Gate Heads # topology, neighbors, surface role ↓ Stage 2: Geometric Transformer ×2 # gated attention modulated by all 17 gates ↓ Stage 3: Classification Heads # 27-class shape recognition ``` The geometric transformer blocks use gate-modulated attention: Q and K are projected from `[hidden, all_gates]`, V is multiplicatively gated, and per-head compatibility scores are computed from gate interactions. ## The Rosetta Stone Discovery This model was used as the analyzer in the [GeoVAE Proto experiments](https://huggingface.co/AbstractPhil/geovae-proto), which proved that text descriptions produce **2.5–3.5× stronger geometric differentiation** than actual images when projected through a lightweight VAE into this model's patch space. | Source | patch_feat discriminability | |---|---| | FLUX images (49k) | +0.020 | | flan-t5-small text | +0.053 | | bert-base-uncased text | +0.053 | | bert-beatrix-2048 text | +0.050 | Three architecturally different text encoders converge to ±5% of each other — the geometric structure is in the language, not the encoder. This model reads it. ## Training Trained on procedurally generated multi-shape superposition grids (2–4 overlapping geometric primitives per sample, 27 shape classes). Two-tier gate supervision with ground truth computed from voxel analysis: - **Local gates**: dimensionality from axis extent, curvature from fill ratio, boundary from partial occupancy - **Structural gates**: topology from 3D convolution neighbor counting, surface role from neighbor density thresholds 200 epochs, achieving 93.8% recall on shape classification with explicit geometric property prediction as auxiliary objectives. ## Files | File | Description | |---|---| | `geometric_model.py` | Standalone model + `load_from_hub()` + `extract_features()` | | `model.pt` | Pretrained weights (epoch 200) | ## Usage ```python import torch from geometric_model import SuperpositionPatchClassifier, load_from_hub, extract_features # Load pretrained model = load_from_hub() # From any (8, 16, 16) source patches = torch.randn(16, 8, 16, 16).cuda() gate_vectors, patch_features = extract_features(model, patches) # Or full output dict out = model(patches) out["local_dim_logits"] # (B, 64, 4) dimensionality out["local_curv_logits"] # (B, 64, 3) curvature out["struct_topo_logits"] # (B, 64, 2) topology out["patch_features"] # (B, 64, 128) learned features out["patch_shape_logits"] # (B, 64, 27) shape classification ``` ## Related - [AbstractPhil/geovae-proto](https://huggingface.co/AbstractPhil/geovae-proto) — The Rosetta Stone experiments (text→geometry VAEs) - [AbstractPhil/synthetic-characters](https://huggingface.co/datasets/AbstractPhil/synthetic-characters) — 49k FLUX-generated character dataset - [AbstractPhil/grid-geometric-multishape](https://huggingface.co/AbstractPhil/grid-geometric-multishape) — Original training repo with checkpoints ## Citation Geometric deep learning research by [AbstractPhil](https://huggingface.co/AbstractPhil). The model demonstrates that geometric structure is a universal language bridging text and visual modalities — symbolic association through geometric language.