AbstractPhil
/

geovocab-patch-maker

+---
+license: apache-2.0
+tags:
+  - geometric-deep-learning
+  - vae
+  - patch-analysis
+  - gate-vectors
+  - text-to-geometry
+  - rosetta-stone
+  - multimodal
+  - experimental
+  - custom_code
+datasets:
+  - AbstractPhil/synthetic-characters
+---
+# GeoVocab Patch Maker
+**A geometric vocabulary extractor that reads structural properties from latent patches — and proved that text carries the same geometric structure as images.**
+This is a two-tier gated geometric transformer trained on 27 geometric primitives (point through channel) in 8×16×16 voxel grids. It extracts 17-dimensional gate vectors (explicit geometric properties) and 256-dimensional patch features (learned representations) from any compatible latent input.
+## What It Does
+Takes an `(8, 16, 16)` tensor — originally voxel grids, but proven to work on adapted FLUX VAE latents and text-derived latent patches — and produces per-patch geometric descriptors:
+```python
+from geometric_model import load_from_hub, extract_features
+model = load_from_hub()
+gate_vectors, patch_features = extract_features(model, patches)
+# gate_vectors:   (N, 64, 17)  — interpretable geometric properties
+# patch_features: (N, 64, 256) — learned representations
+```
+### Gate Vector Anatomy (17 dimensions)
+| Dims | Property | Type | Meaning |
+|---|---|---|---|
+| 0–3 | dimensionality | softmax(4) | 0D point, 1D line, 2D surface, 3D volume |
+| 4–6 | curvature | softmax(3) | rigid, curved, combined |
+| 7 | boundary | sigmoid(1) | partial fill (surface patch) |
+| 8–10 | axis_active | sigmoid(3) | which axes have spatial extent |
+| 11–12 | topology | softmax(2) | open vs closed (neighbor-based) |
+| 13 | neighbor_density | sigmoid(1) | normalized neighbor count |
+| 14–16 | surface_role | softmax(3) | isolated, boundary, interior |
+Dimensions 0–10 are **local** (intrinsic to each patch, no cross-patch info). Dimensions 11–16 are **structural** (relational, computed after attention sees neighborhood context).
+## Architecture
+```
+(8, 16, 16) input
+    ↓
+PatchEmbedding3D → (B, 64, 64)         # 64 patches of 32 voxels each
+    ↓
+Stage 0: Local Encoder + Gate Heads     # dims, curvature, boundary, axes
+    ↓
+proj([embedding, local_gates]) → (B, 64, 128)
+    ↓
+Stage 1: Bootstrap Transformer ×2       # standard attention with local context
+    ↓
+Stage 1.5: Structural Gate Heads        # topology, neighbors, surface role
+    ↓
+Stage 2: Geometric Transformer ×2       # gated attention modulated by all 17 gates
+    ↓
+Stage 3: Classification Heads           # 27-class shape recognition
+```
+The geometric transformer blocks use gate-modulated attention: Q and K are projected from `[hidden, all_gates]`, V is multiplicatively gated, and per-head compatibility scores are computed from gate interactions.
+## The Rosetta Stone Discovery
+This model was used as the analyzer in the [GeoVAE Proto experiments](https://huggingface.co/AbstractPhil/geovae-proto), which proved that text descriptions produce **2.5–3.5× stronger geometric differentiation** than actual images when projected through a lightweight VAE into this model's patch space.
+| Source | patch_feat discriminability |
+|---|---|
+| FLUX images (49k) | +0.020 |
+| flan-t5-small text | +0.053 |
+| bert-base-uncased text | +0.053 |
+| bert-beatrix-2048 text | +0.050 |
+Three architecturally different text encoders converge to ±5% of each other — the geometric structure is in the language, not the encoder. This model reads it.
+## Training
+Trained on procedurally generated multi-shape superposition grids (2–4 overlapping geometric primitives per sample, 27 shape classes). Two-tier gate supervision with ground truth computed from voxel analysis:
+- **Local gates**: dimensionality from axis extent, curvature from fill ratio, boundary from partial occupancy
+- **Structural gates**: topology from 3D convolution neighbor counting, surface role from neighbor density thresholds
+200 epochs, achieving 93.8% recall on shape classification with explicit geometric property prediction as auxiliary objectives.
+## Files
+| File | Description |
+|---|---|
+| `geometric_model.py` | Standalone model + `load_from_hub()` + `extract_features()` |
+| `model.pt` | Pretrained weights (epoch 200) |
+## Usage
+```python
+import torch
+from geometric_model import SuperpositionPatchClassifier, load_from_hub, extract_features
+# Load pretrained
+model = load_from_hub()
+# From any (8, 16, 16) source
+patches = torch.randn(16, 8, 16, 16).cuda()
+gate_vectors, patch_features = extract_features(model, patches)
+# Or full output dict
+out = model(patches)
+out["local_dim_logits"]       # (B, 64, 4)  dimensionality
+out["local_curv_logits"]      # (B, 64, 3)  curvature
+out["struct_topo_logits"]     # (B, 64, 2)  topology
+out["patch_features"]         # (B, 64, 128) learned features
+out["patch_shape_logits"]     # (B, 64, 27) shape classification
+```
+## Related
+- [AbstractPhil/geovae-proto](https://huggingface.co/AbstractPhil/geovae-proto) — The Rosetta Stone experiments (text→geometry VAEs)
+- [AbstractPhil/synthetic-characters](https://huggingface.co/datasets/AbstractPhil/synthetic-characters) — 49k FLUX-generated character dataset
+- [AbstractPhil/grid-geometric-multishape](https://huggingface.co/AbstractPhil/grid-geometric-multishape) — Original training repo with checkpoints
+## Citation
+Geometric deep learning research by [AbstractPhil](https://huggingface.co/AbstractPhil). The model demonstrates that geometric structure is a universal language bridging text and visual modalities — symbolic association through geometric language.