AbstractPhil commited on
Commit
4cd439a
Β·
verified Β·
1 Parent(s): 1fe1ab5

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +131 -3
README.md CHANGED
@@ -1,3 +1,131 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - geometric-deep-learning
5
+ - vae
6
+ - patch-analysis
7
+ - gate-vectors
8
+ - text-to-geometry
9
+ - rosetta-stone
10
+ - multimodal
11
+ - experimental
12
+ - custom_code
13
+ datasets:
14
+ - AbstractPhil/synthetic-characters
15
+ ---
16
+
17
+ # GeoVocab Patch Maker
18
+
19
+ **A geometric vocabulary extractor that reads structural properties from latent patches β€” and proved that text carries the same geometric structure as images.**
20
+
21
+ This is a two-tier gated geometric transformer trained on 27 geometric primitives (point through channel) in 8Γ—16Γ—16 voxel grids. It extracts 17-dimensional gate vectors (explicit geometric properties) and 256-dimensional patch features (learned representations) from any compatible latent input.
22
+
23
+ ## What It Does
24
+
25
+ Takes an `(8, 16, 16)` tensor β€” originally voxel grids, but proven to work on adapted FLUX VAE latents and text-derived latent patches β€” and produces per-patch geometric descriptors:
26
+
27
+ ```python
28
+ from geometric_model import load_from_hub, extract_features
29
+
30
+ model = load_from_hub()
31
+ gate_vectors, patch_features = extract_features(model, patches)
32
+ # gate_vectors: (N, 64, 17) β€” interpretable geometric properties
33
+ # patch_features: (N, 64, 256) β€” learned representations
34
+ ```
35
+
36
+ ### Gate Vector Anatomy (17 dimensions)
37
+
38
+ | Dims | Property | Type | Meaning |
39
+ |---|---|---|---|
40
+ | 0–3 | dimensionality | softmax(4) | 0D point, 1D line, 2D surface, 3D volume |
41
+ | 4–6 | curvature | softmax(3) | rigid, curved, combined |
42
+ | 7 | boundary | sigmoid(1) | partial fill (surface patch) |
43
+ | 8–10 | axis_active | sigmoid(3) | which axes have spatial extent |
44
+ | 11–12 | topology | softmax(2) | open vs closed (neighbor-based) |
45
+ | 13 | neighbor_density | sigmoid(1) | normalized neighbor count |
46
+ | 14–16 | surface_role | softmax(3) | isolated, boundary, interior |
47
+
48
+ Dimensions 0–10 are **local** (intrinsic to each patch, no cross-patch info). Dimensions 11–16 are **structural** (relational, computed after attention sees neighborhood context).
49
+
50
+ ## Architecture
51
+
52
+ ```
53
+ (8, 16, 16) input
54
+ ↓
55
+ PatchEmbedding3D β†’ (B, 64, 64) # 64 patches of 32 voxels each
56
+ ↓
57
+ Stage 0: Local Encoder + Gate Heads # dims, curvature, boundary, axes
58
+ ↓
59
+ proj([embedding, local_gates]) β†’ (B, 64, 128)
60
+ ↓
61
+ Stage 1: Bootstrap Transformer Γ—2 # standard attention with local context
62
+ ↓
63
+ Stage 1.5: Structural Gate Heads # topology, neighbors, surface role
64
+ ↓
65
+ Stage 2: Geometric Transformer Γ—2 # gated attention modulated by all 17 gates
66
+ ↓
67
+ Stage 3: Classification Heads # 27-class shape recognition
68
+ ```
69
+
70
+ The geometric transformer blocks use gate-modulated attention: Q and K are projected from `[hidden, all_gates]`, V is multiplicatively gated, and per-head compatibility scores are computed from gate interactions.
71
+
72
+ ## The Rosetta Stone Discovery
73
+
74
+ This model was used as the analyzer in the [GeoVAE Proto experiments](https://huggingface.co/AbstractPhil/geovae-proto), which proved that text descriptions produce **2.5–3.5Γ— stronger geometric differentiation** than actual images when projected through a lightweight VAE into this model's patch space.
75
+
76
+ | Source | patch_feat discriminability |
77
+ |---|---|
78
+ | FLUX images (49k) | +0.020 |
79
+ | flan-t5-small text | +0.053 |
80
+ | bert-base-uncased text | +0.053 |
81
+ | bert-beatrix-2048 text | +0.050 |
82
+
83
+ Three architecturally different text encoders converge to Β±5% of each other β€” the geometric structure is in the language, not the encoder. This model reads it.
84
+
85
+ ## Training
86
+
87
+ Trained on procedurally generated multi-shape superposition grids (2–4 overlapping geometric primitives per sample, 27 shape classes). Two-tier gate supervision with ground truth computed from voxel analysis:
88
+
89
+ - **Local gates**: dimensionality from axis extent, curvature from fill ratio, boundary from partial occupancy
90
+ - **Structural gates**: topology from 3D convolution neighbor counting, surface role from neighbor density thresholds
91
+
92
+ 200 epochs, achieving 93.8% recall on shape classification with explicit geometric property prediction as auxiliary objectives.
93
+
94
+ ## Files
95
+
96
+ | File | Description |
97
+ |---|---|
98
+ | `geometric_model.py` | Standalone model + `load_from_hub()` + `extract_features()` |
99
+ | `model.pt` | Pretrained weights (epoch 200) |
100
+
101
+ ## Usage
102
+
103
+ ```python
104
+ import torch
105
+ from geometric_model import SuperpositionPatchClassifier, load_from_hub, extract_features
106
+
107
+ # Load pretrained
108
+ model = load_from_hub()
109
+
110
+ # From any (8, 16, 16) source
111
+ patches = torch.randn(16, 8, 16, 16).cuda()
112
+ gate_vectors, patch_features = extract_features(model, patches)
113
+
114
+ # Or full output dict
115
+ out = model(patches)
116
+ out["local_dim_logits"] # (B, 64, 4) dimensionality
117
+ out["local_curv_logits"] # (B, 64, 3) curvature
118
+ out["struct_topo_logits"] # (B, 64, 2) topology
119
+ out["patch_features"] # (B, 64, 128) learned features
120
+ out["patch_shape_logits"] # (B, 64, 27) shape classification
121
+ ```
122
+
123
+ ## Related
124
+
125
+ - [AbstractPhil/geovae-proto](https://huggingface.co/AbstractPhil/geovae-proto) — The Rosetta Stone experiments (text→geometry VAEs)
126
+ - [AbstractPhil/synthetic-characters](https://huggingface.co/datasets/AbstractPhil/synthetic-characters) β€” 49k FLUX-generated character dataset
127
+ - [AbstractPhil/grid-geometric-multishape](https://huggingface.co/AbstractPhil/grid-geometric-multishape) β€” Original training repo with checkpoints
128
+
129
+ ## Citation
130
+
131
+ Geometric deep learning research by [AbstractPhil](https://huggingface.co/AbstractPhil). The model demonstrates that geometric structure is a universal language bridging text and visual modalities β€” symbolic association through geometric language.