Create README.md

4cd439a verified 2 days ago

5.62 kB

	---
	license: apache-2.0
	tags:
	- geometric-deep-learning
	- vae
	- patch-analysis
	- gate-vectors
	- text-to-geometry
	- rosetta-stone
	- multimodal
	- experimental
	- custom_code
	datasets:
	- AbstractPhil/synthetic-characters
	---

	# GeoVocab Patch Maker

	A geometric vocabulary extractor that reads structural properties from latent patches — and proved that text carries the same geometric structure as images.

	This is a two-tier gated geometric transformer trained on 27 geometric primitives (point through channel) in 8×16×16 voxel grids. It extracts 17-dimensional gate vectors (explicit geometric properties) and 256-dimensional patch features (learned representations) from any compatible latent input.

	## What It Does

	Takes an `(8, 16, 16)` tensor — originally voxel grids, but proven to work on adapted FLUX VAE latents and text-derived latent patches — and produces per-patch geometric descriptors:

	```python
	from geometric_model import load_from_hub, extract_features

	model = load_from_hub()
	gate_vectors, patch_features = extract_features(model, patches)
	# gate_vectors: (N, 64, 17) — interpretable geometric properties
	# patch_features: (N, 64, 256) — learned representations
	```

	### Gate Vector Anatomy (17 dimensions)

	\| Dims \| Property \| Type \| Meaning \|
	\|---\|---\|---\|---\|
	\| 0–3 \| dimensionality \| softmax(4) \| 0D point, 1D line, 2D surface, 3D volume \|
	\| 4–6 \| curvature \| softmax(3) \| rigid, curved, combined \|
	\| 7 \| boundary \| sigmoid(1) \| partial fill (surface patch) \|
	\| 8–10 \| axis_active \| sigmoid(3) \| which axes have spatial extent \|
	\| 11–12 \| topology \| softmax(2) \| open vs closed (neighbor-based) \|
	\| 13 \| neighbor_density \| sigmoid(1) \| normalized neighbor count \|
	\| 14–16 \| surface_role \| softmax(3) \| isolated, boundary, interior \|

	Dimensions 0–10 are local (intrinsic to each patch, no cross-patch info). Dimensions 11–16 are structural (relational, computed after attention sees neighborhood context).

	## Architecture

	```
	(8, 16, 16) input
	↓
	PatchEmbedding3D → (B, 64, 64) # 64 patches of 32 voxels each
	↓
	Stage 0: Local Encoder + Gate Heads # dims, curvature, boundary, axes
	↓
	proj([embedding, local_gates]) → (B, 64, 128)
	↓
	Stage 1: Bootstrap Transformer ×2 # standard attention with local context
	↓
	Stage 1.5: Structural Gate Heads # topology, neighbors, surface role
	↓
	Stage 2: Geometric Transformer ×2 # gated attention modulated by all 17 gates
	↓
	Stage 3: Classification Heads # 27-class shape recognition
	```

	The geometric transformer blocks use gate-modulated attention: Q and K are projected from `[hidden, all_gates]`, V is multiplicatively gated, and per-head compatibility scores are computed from gate interactions.

	## The Rosetta Stone Discovery

	This model was used as the analyzer in the [GeoVAE Proto experiments](https://huggingface.co/AbstractPhil/geovae-proto), which proved that text descriptions produce 2.5–3.5× stronger geometric differentiation than actual images when projected through a lightweight VAE into this model's patch space.

	\| Source \| patch_feat discriminability \|
	\|---\|---\|
	\| FLUX images (49k) \| +0.020 \|
	\| flan-t5-small text \| +0.053 \|
	\| bert-base-uncased text \| +0.053 \|
	\| bert-beatrix-2048 text \| +0.050 \|

	Three architecturally different text encoders converge to ±5% of each other — the geometric structure is in the language, not the encoder. This model reads it.

	## Training

	Trained on procedurally generated multi-shape superposition grids (2–4 overlapping geometric primitives per sample, 27 shape classes). Two-tier gate supervision with ground truth computed from voxel analysis:

	- Local gates: dimensionality from axis extent, curvature from fill ratio, boundary from partial occupancy
	- Structural gates: topology from 3D convolution neighbor counting, surface role from neighbor density thresholds

	200 epochs, achieving 93.8% recall on shape classification with explicit geometric property prediction as auxiliary objectives.

	## Files

	\| File \| Description \|
	\|---\|---\|
	\| `geometric_model.py` \| Standalone model + `load_from_hub()` + `extract_features()` \|
	\| `model.pt` \| Pretrained weights (epoch 200) \|

	## Usage

	```python
	import torch
	from geometric_model import SuperpositionPatchClassifier, load_from_hub, extract_features

	# Load pretrained
	model = load_from_hub()

	# From any (8, 16, 16) source
	patches = torch.randn(16, 8, 16, 16).cuda()
	gate_vectors, patch_features = extract_features(model, patches)

	# Or full output dict
	out = model(patches)
	out["local_dim_logits"] # (B, 64, 4) dimensionality
	out["local_curv_logits"] # (B, 64, 3) curvature
	out["struct_topo_logits"] # (B, 64, 2) topology
	out["patch_features"] # (B, 64, 128) learned features
	out["patch_shape_logits"] # (B, 64, 27) shape classification
	```

	## Related

	- [AbstractPhil/geovae-proto](https://huggingface.co/AbstractPhil/geovae-proto) — The Rosetta Stone experiments (text→geometry VAEs)
	- [AbstractPhil/synthetic-characters](https://huggingface.co/datasets/AbstractPhil/synthetic-characters) — 49k FLUX-generated character dataset
	- [AbstractPhil/grid-geometric-multishape](https://huggingface.co/AbstractPhil/grid-geometric-multishape) — Original training repo with checkpoints

	## Citation

	Geometric deep learning research by [AbstractPhil](https://huggingface.co/AbstractPhil). The model demonstrates that geometric structure is a universal language bridging text and visual modalities — symbolic association through geometric language.