Create README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,131 @@
|
|
| 1 |
-
---
|
| 2 |
-
license:
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- geometric-deep-learning
|
| 5 |
+
- vae
|
| 6 |
+
- patch-analysis
|
| 7 |
+
- gate-vectors
|
| 8 |
+
- text-to-geometry
|
| 9 |
+
- rosetta-stone
|
| 10 |
+
- multimodal
|
| 11 |
+
- experimental
|
| 12 |
+
- custom_code
|
| 13 |
+
datasets:
|
| 14 |
+
- AbstractPhil/synthetic-characters
|
| 15 |
+
---
|
| 16 |
+
|
| 17 |
+
# GeoVocab Patch Maker
|
| 18 |
+
|
| 19 |
+
**A geometric vocabulary extractor that reads structural properties from latent patches β and proved that text carries the same geometric structure as images.**
|
| 20 |
+
|
| 21 |
+
This is a two-tier gated geometric transformer trained on 27 geometric primitives (point through channel) in 8Γ16Γ16 voxel grids. It extracts 17-dimensional gate vectors (explicit geometric properties) and 256-dimensional patch features (learned representations) from any compatible latent input.
|
| 22 |
+
|
| 23 |
+
## What It Does
|
| 24 |
+
|
| 25 |
+
Takes an `(8, 16, 16)` tensor β originally voxel grids, but proven to work on adapted FLUX VAE latents and text-derived latent patches β and produces per-patch geometric descriptors:
|
| 26 |
+
|
| 27 |
+
```python
|
| 28 |
+
from geometric_model import load_from_hub, extract_features
|
| 29 |
+
|
| 30 |
+
model = load_from_hub()
|
| 31 |
+
gate_vectors, patch_features = extract_features(model, patches)
|
| 32 |
+
# gate_vectors: (N, 64, 17) β interpretable geometric properties
|
| 33 |
+
# patch_features: (N, 64, 256) β learned representations
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
### Gate Vector Anatomy (17 dimensions)
|
| 37 |
+
|
| 38 |
+
| Dims | Property | Type | Meaning |
|
| 39 |
+
|---|---|---|---|
|
| 40 |
+
| 0β3 | dimensionality | softmax(4) | 0D point, 1D line, 2D surface, 3D volume |
|
| 41 |
+
| 4β6 | curvature | softmax(3) | rigid, curved, combined |
|
| 42 |
+
| 7 | boundary | sigmoid(1) | partial fill (surface patch) |
|
| 43 |
+
| 8β10 | axis_active | sigmoid(3) | which axes have spatial extent |
|
| 44 |
+
| 11β12 | topology | softmax(2) | open vs closed (neighbor-based) |
|
| 45 |
+
| 13 | neighbor_density | sigmoid(1) | normalized neighbor count |
|
| 46 |
+
| 14β16 | surface_role | softmax(3) | isolated, boundary, interior |
|
| 47 |
+
|
| 48 |
+
Dimensions 0β10 are **local** (intrinsic to each patch, no cross-patch info). Dimensions 11β16 are **structural** (relational, computed after attention sees neighborhood context).
|
| 49 |
+
|
| 50 |
+
## Architecture
|
| 51 |
+
|
| 52 |
+
```
|
| 53 |
+
(8, 16, 16) input
|
| 54 |
+
β
|
| 55 |
+
PatchEmbedding3D β (B, 64, 64) # 64 patches of 32 voxels each
|
| 56 |
+
β
|
| 57 |
+
Stage 0: Local Encoder + Gate Heads # dims, curvature, boundary, axes
|
| 58 |
+
β
|
| 59 |
+
proj([embedding, local_gates]) β (B, 64, 128)
|
| 60 |
+
β
|
| 61 |
+
Stage 1: Bootstrap Transformer Γ2 # standard attention with local context
|
| 62 |
+
β
|
| 63 |
+
Stage 1.5: Structural Gate Heads # topology, neighbors, surface role
|
| 64 |
+
β
|
| 65 |
+
Stage 2: Geometric Transformer Γ2 # gated attention modulated by all 17 gates
|
| 66 |
+
β
|
| 67 |
+
Stage 3: Classification Heads # 27-class shape recognition
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
The geometric transformer blocks use gate-modulated attention: Q and K are projected from `[hidden, all_gates]`, V is multiplicatively gated, and per-head compatibility scores are computed from gate interactions.
|
| 71 |
+
|
| 72 |
+
## The Rosetta Stone Discovery
|
| 73 |
+
|
| 74 |
+
This model was used as the analyzer in the [GeoVAE Proto experiments](https://huggingface.co/AbstractPhil/geovae-proto), which proved that text descriptions produce **2.5β3.5Γ stronger geometric differentiation** than actual images when projected through a lightweight VAE into this model's patch space.
|
| 75 |
+
|
| 76 |
+
| Source | patch_feat discriminability |
|
| 77 |
+
|---|---|
|
| 78 |
+
| FLUX images (49k) | +0.020 |
|
| 79 |
+
| flan-t5-small text | +0.053 |
|
| 80 |
+
| bert-base-uncased text | +0.053 |
|
| 81 |
+
| bert-beatrix-2048 text | +0.050 |
|
| 82 |
+
|
| 83 |
+
Three architecturally different text encoders converge to Β±5% of each other β the geometric structure is in the language, not the encoder. This model reads it.
|
| 84 |
+
|
| 85 |
+
## Training
|
| 86 |
+
|
| 87 |
+
Trained on procedurally generated multi-shape superposition grids (2β4 overlapping geometric primitives per sample, 27 shape classes). Two-tier gate supervision with ground truth computed from voxel analysis:
|
| 88 |
+
|
| 89 |
+
- **Local gates**: dimensionality from axis extent, curvature from fill ratio, boundary from partial occupancy
|
| 90 |
+
- **Structural gates**: topology from 3D convolution neighbor counting, surface role from neighbor density thresholds
|
| 91 |
+
|
| 92 |
+
200 epochs, achieving 93.8% recall on shape classification with explicit geometric property prediction as auxiliary objectives.
|
| 93 |
+
|
| 94 |
+
## Files
|
| 95 |
+
|
| 96 |
+
| File | Description |
|
| 97 |
+
|---|---|
|
| 98 |
+
| `geometric_model.py` | Standalone model + `load_from_hub()` + `extract_features()` |
|
| 99 |
+
| `model.pt` | Pretrained weights (epoch 200) |
|
| 100 |
+
|
| 101 |
+
## Usage
|
| 102 |
+
|
| 103 |
+
```python
|
| 104 |
+
import torch
|
| 105 |
+
from geometric_model import SuperpositionPatchClassifier, load_from_hub, extract_features
|
| 106 |
+
|
| 107 |
+
# Load pretrained
|
| 108 |
+
model = load_from_hub()
|
| 109 |
+
|
| 110 |
+
# From any (8, 16, 16) source
|
| 111 |
+
patches = torch.randn(16, 8, 16, 16).cuda()
|
| 112 |
+
gate_vectors, patch_features = extract_features(model, patches)
|
| 113 |
+
|
| 114 |
+
# Or full output dict
|
| 115 |
+
out = model(patches)
|
| 116 |
+
out["local_dim_logits"] # (B, 64, 4) dimensionality
|
| 117 |
+
out["local_curv_logits"] # (B, 64, 3) curvature
|
| 118 |
+
out["struct_topo_logits"] # (B, 64, 2) topology
|
| 119 |
+
out["patch_features"] # (B, 64, 128) learned features
|
| 120 |
+
out["patch_shape_logits"] # (B, 64, 27) shape classification
|
| 121 |
+
```
|
| 122 |
+
|
| 123 |
+
## Related
|
| 124 |
+
|
| 125 |
+
- [AbstractPhil/geovae-proto](https://huggingface.co/AbstractPhil/geovae-proto) β The Rosetta Stone experiments (textβgeometry VAEs)
|
| 126 |
+
- [AbstractPhil/synthetic-characters](https://huggingface.co/datasets/AbstractPhil/synthetic-characters) β 49k FLUX-generated character dataset
|
| 127 |
+
- [AbstractPhil/grid-geometric-multishape](https://huggingface.co/AbstractPhil/grid-geometric-multishape) β Original training repo with checkpoints
|
| 128 |
+
|
| 129 |
+
## Citation
|
| 130 |
+
|
| 131 |
+
Geometric deep learning research by [AbstractPhil](https://huggingface.co/AbstractPhil). The model demonstrates that geometric structure is a universal language bridging text and visual modalities β symbolic association through geometric language.
|