File size: 5,622 Bytes
4cd439a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
---
license: apache-2.0
tags:
  - geometric-deep-learning
  - vae
  - patch-analysis
  - gate-vectors
  - text-to-geometry
  - rosetta-stone
  - multimodal
  - experimental
  - custom_code
datasets:
  - AbstractPhil/synthetic-characters
---

# GeoVocab Patch Maker

**A geometric vocabulary extractor that reads structural properties from latent patches β€” and proved that text carries the same geometric structure as images.**

This is a two-tier gated geometric transformer trained on 27 geometric primitives (point through channel) in 8Γ—16Γ—16 voxel grids. It extracts 17-dimensional gate vectors (explicit geometric properties) and 256-dimensional patch features (learned representations) from any compatible latent input.

## What It Does

Takes an `(8, 16, 16)` tensor β€” originally voxel grids, but proven to work on adapted FLUX VAE latents and text-derived latent patches β€” and produces per-patch geometric descriptors:

```python
from geometric_model import load_from_hub, extract_features

model = load_from_hub()
gate_vectors, patch_features = extract_features(model, patches)
# gate_vectors:   (N, 64, 17)  β€” interpretable geometric properties
# patch_features: (N, 64, 256) β€” learned representations
```

### Gate Vector Anatomy (17 dimensions)

| Dims | Property | Type | Meaning |
|---|---|---|---|
| 0–3 | dimensionality | softmax(4) | 0D point, 1D line, 2D surface, 3D volume |
| 4–6 | curvature | softmax(3) | rigid, curved, combined |
| 7 | boundary | sigmoid(1) | partial fill (surface patch) |
| 8–10 | axis_active | sigmoid(3) | which axes have spatial extent |
| 11–12 | topology | softmax(2) | open vs closed (neighbor-based) |
| 13 | neighbor_density | sigmoid(1) | normalized neighbor count |
| 14–16 | surface_role | softmax(3) | isolated, boundary, interior |

Dimensions 0–10 are **local** (intrinsic to each patch, no cross-patch info). Dimensions 11–16 are **structural** (relational, computed after attention sees neighborhood context).

## Architecture

```
(8, 16, 16) input
    ↓
PatchEmbedding3D β†’ (B, 64, 64)         # 64 patches of 32 voxels each
    ↓
Stage 0: Local Encoder + Gate Heads     # dims, curvature, boundary, axes
    ↓
proj([embedding, local_gates]) β†’ (B, 64, 128)
    ↓
Stage 1: Bootstrap Transformer Γ—2       # standard attention with local context
    ↓
Stage 1.5: Structural Gate Heads        # topology, neighbors, surface role
    ↓
Stage 2: Geometric Transformer Γ—2       # gated attention modulated by all 17 gates
    ↓
Stage 3: Classification Heads           # 27-class shape recognition
```

The geometric transformer blocks use gate-modulated attention: Q and K are projected from `[hidden, all_gates]`, V is multiplicatively gated, and per-head compatibility scores are computed from gate interactions.

## The Rosetta Stone Discovery

This model was used as the analyzer in the [GeoVAE Proto experiments](https://huggingface.co/AbstractPhil/geovae-proto), which proved that text descriptions produce **2.5–3.5Γ— stronger geometric differentiation** than actual images when projected through a lightweight VAE into this model's patch space.

| Source | patch_feat discriminability |
|---|---|
| FLUX images (49k) | +0.020 |
| flan-t5-small text | +0.053 |
| bert-base-uncased text | +0.053 |
| bert-beatrix-2048 text | +0.050 |

Three architecturally different text encoders converge to Β±5% of each other β€” the geometric structure is in the language, not the encoder. This model reads it.

## Training

Trained on procedurally generated multi-shape superposition grids (2–4 overlapping geometric primitives per sample, 27 shape classes). Two-tier gate supervision with ground truth computed from voxel analysis:

- **Local gates**: dimensionality from axis extent, curvature from fill ratio, boundary from partial occupancy
- **Structural gates**: topology from 3D convolution neighbor counting, surface role from neighbor density thresholds

200 epochs, achieving 93.8% recall on shape classification with explicit geometric property prediction as auxiliary objectives.

## Files

| File | Description |
|---|---|
| `geometric_model.py` | Standalone model + `load_from_hub()` + `extract_features()` |
| `model.pt` | Pretrained weights (epoch 200) |

## Usage

```python
import torch
from geometric_model import SuperpositionPatchClassifier, load_from_hub, extract_features

# Load pretrained
model = load_from_hub()

# From any (8, 16, 16) source
patches = torch.randn(16, 8, 16, 16).cuda()
gate_vectors, patch_features = extract_features(model, patches)

# Or full output dict
out = model(patches)
out["local_dim_logits"]       # (B, 64, 4)  dimensionality
out["local_curv_logits"]      # (B, 64, 3)  curvature
out["struct_topo_logits"]     # (B, 64, 2)  topology
out["patch_features"]         # (B, 64, 128) learned features
out["patch_shape_logits"]     # (B, 64, 27) shape classification
```

## Related

- [AbstractPhil/geovae-proto](https://huggingface.co/AbstractPhil/geovae-proto) — The Rosetta Stone experiments (text→geometry VAEs)
- [AbstractPhil/synthetic-characters](https://huggingface.co/datasets/AbstractPhil/synthetic-characters) β€” 49k FLUX-generated character dataset
- [AbstractPhil/grid-geometric-multishape](https://huggingface.co/AbstractPhil/grid-geometric-multishape) β€” Original training repo with checkpoints

## Citation

Geometric deep learning research by [AbstractPhil](https://huggingface.co/AbstractPhil). The model demonstrates that geometric structure is a universal language bridging text and visual modalities β€” symbolic association through geometric language.