Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,145 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
tags:
|
| 4 |
+
- geometric-deep-learning
|
| 5 |
+
- voxel-classifier
|
| 6 |
+
- cross-contrast
|
| 7 |
+
- pentachoron
|
| 8 |
+
- contrastive-learning
|
| 9 |
+
- 3d-classification
|
| 10 |
+
pipeline_tag: other
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
# Grid Geometric Classifier Proto
|
| 14 |
+
|
| 15 |
+
A prototype system for geometric primitive classification and textβgeometry alignment. A voxel classifier learns to identify 38 shape classes from 5Γ5Γ5 binary occupancy grids using capacity cascades, curvature analysis, differentiation gates, and a rectified flow arbiter. A cross-contrast module then aligns the classifier's learned features with Qwen 2.5-1.5B text embeddings via InfoNCE, producing a shared latent space where geometric structure and natural language descriptions are jointly represented.
|
| 16 |
+
|
| 17 |
+
This is a research prototype exploring whether a geometric vocabulary learned from pure structure can meaningfully align with linguistic semantics.
|
| 18 |
+
|
| 19 |
+
## Repository Structure
|
| 20 |
+
|
| 21 |
+
```
|
| 22 |
+
geometric_classifier/ β Voxel classifier (~1.85M params)
|
| 23 |
+
βββ config.json # Architecture: dims, classes, shape catalog
|
| 24 |
+
βββ training_config.json # Hyperparams, loss weights, results
|
| 25 |
+
βββ model.safetensors # Weights
|
| 26 |
+
|
| 27 |
+
crosscontrast/ β TextβVoxel alignment heads
|
| 28 |
+
βββ config.json # Projection dims, latent space config
|
| 29 |
+
βββ training_config.json # Contrastive training params & results
|
| 30 |
+
βββ text_proj.safetensors # Text β latent projection
|
| 31 |
+
βββ voxel_proj.safetensors # Voxel β latent projection
|
| 32 |
+
βββ temperature.safetensors # Learned temperature scalar
|
| 33 |
+
|
| 34 |
+
qwen_embeddings/ β Cached Qwen 2.5-1.5B embeddings
|
| 35 |
+
βββ config.json # Model name, hidden dim, extraction method
|
| 36 |
+
βββ embeddings.safetensors # (38, 1536) class embeddings
|
| 37 |
+
βββ descriptions.json # Natural language shape descriptions
|
| 38 |
+
```
|
| 39 |
+
|
| 40 |
+
## Shape Vocabulary: 38 Classes
|
| 41 |
+
|
| 42 |
+
The vocabulary spans 0Dβ3D primitives, both rigid and curved, organized by intrinsic dimensionality:
|
| 43 |
+
|
| 44 |
+
| Dim | Rigid | Curved |
|
| 45 |
+
|-----|-------|--------|
|
| 46 |
+
| 0D | point | β |
|
| 47 |
+
| 1D | line_x, line_y, line_z, line_diag, cross, l_shape, collinear | arc, helix |
|
| 48 |
+
| 2D | triangle_xy, triangle_xz, triangle_3d, square_xy, square_xz, rectangle, coplanar, plane | circle, ellipse, disc |
|
| 49 |
+
| 3D | tetrahedron, pyramid, pentachoron, cube, cuboid, triangular_prism, octahedron | sphere, hemisphere, cylinder, cone, capsule, torus, shell, tube, bowl, saddle |
|
| 50 |
+
|
| 51 |
+
Eight curvature types: `none`, `convex`, `concave`, `cylindrical`, `conical`, `toroidal`, `hyperbolic`, `helical`.
|
| 52 |
+
|
| 53 |
+
## Architecture
|
| 54 |
+
|
| 55 |
+
### GeometricShapeClassifier (v8)
|
| 56 |
+
|
| 57 |
+
Input is a 5Γ5Γ5 binary voxel grid. The forward pass has four stages:
|
| 58 |
+
|
| 59 |
+
**1. Tracer Attention** β 5 learned tracer tokens attend over 125 voxel embeddings (occupancy + normalized 3D position β 64-dim via MLP). All C(5,2)=10 tracer pairs compute interaction features and edge detection scores via SwiGLU heads. Pool dimension: 320 (5 tracers Γ 64-dim).
|
| 60 |
+
|
| 61 |
+
**2. Capacity Cascade** β Four `CapacityHead` modules with learned capacities (initialized at 0.5, 1.0, 1.5, 2.0) process features sequentially. Each outputs a fill ratio (sigmoid), overflow signal, and residual features. The cascade partitions representation capacity across intrinsic dimensions (0Dβ3D), with fill ratios serving as soft dimensionality indicators.
|
| 62 |
+
|
| 63 |
+
**3. Curvature Analysis** β A `DifferentiationGate` computes radial distance profiles binned into 5 shells, producing sigmoid gates and additive directional features that differentiate convex/concave curvature. A `CurvatureHead` combines rigid features with gated curvature features to predict: is_curved (binary), curvature_type (8-class), and a curvature embedding used downstream.
|
| 64 |
+
|
| 65 |
+
**4. Rectified Flow Arbiter** β For ambiguous cases, a `RectifiedFlowArbiter` integrates a learned velocity field over 4 flow-matching steps from noise to class prototypes. Produces refined logits, trajectory logits at each step, confidence scores, and a blend weight that gates between initial and refined predictions. Trained with OT-conditioned flow matching loss.
|
| 66 |
+
|
| 67 |
+
The final class prediction blends initial and arbiter-refined logits via the learned blend weight.
|
| 68 |
+
|
| 69 |
+
### CrossContrastModel
|
| 70 |
+
|
| 71 |
+
Two MLP projection heads map frozen voxel features (645-dim) and frozen Qwen text embeddings (1536-dim) into a shared 256-dim latent space. Architecture per head: `Linear β LayerNorm β GELU β Linear β LayerNorm β GELU β Linear`. Trained with symmetric InfoNCE loss and a learned temperature parameter.
|
| 72 |
+
|
| 73 |
+
### Text Embeddings
|
| 74 |
+
|
| 75 |
+
Class descriptions are encoded by Qwen 2.5-1.5B-Instruct using mean-pooled last hidden states. Each of the 38 classes has a 2-shot geometric description (e.g., *"A flat triangular outline formed by three connected edges lying in the horizontal xy-plane, the simplest polygon"*).
|
| 76 |
+
|
| 77 |
+
## Training
|
| 78 |
+
|
| 79 |
+
### Classifier (Cell 3)
|
| 80 |
+
|
| 81 |
+
| Parameter | Value |
|
| 82 |
+
|-----------|-------|
|
| 83 |
+
| Dataset | 500K procedurally generated samples (400K train / 100K val) |
|
| 84 |
+
| Grid size | 5Γ5Γ5 binary occupancy |
|
| 85 |
+
| Batch size | 4,096 |
|
| 86 |
+
| Optimizer | AdamW (lr=3e-3, wd=1e-4) |
|
| 87 |
+
| Schedule | Cosine with 5-epoch warmup |
|
| 88 |
+
| Precision | BF16 autocast (no GradScaler) |
|
| 89 |
+
| Compile | torch.compile (default mode) |
|
| 90 |
+
| Augmentation | Voxel dropout (5%), random addition (5%), spatial shift (8%) |
|
| 91 |
+
| Epochs | 80 |
|
| 92 |
+
|
| 93 |
+
The classifier is trained with a composite loss: cross-entropy on initial and refined logits, capacity fill ratio supervision, peak dimension classification, overflow regularization, capacity diversity, volume regression (log1p MSE), Cayley-Menger determinant sign prediction, curvature binary/type classification, flow matching loss, arbiter confidence calibration, and blend weight supervision. 13 weighted terms total.
|
| 94 |
+
|
| 95 |
+
### Cross-Contrast (Cell 4)
|
| 96 |
+
|
| 97 |
+
| Parameter | Value |
|
| 98 |
+
|-----------|-------|
|
| 99 |
+
| Dataset | Reuses Cell 3 cached dataset |
|
| 100 |
+
| Voxel encoder | Frozen GeometricShapeClassifier |
|
| 101 |
+
| Text encoder | Frozen Qwen 2.5-1.5B-Instruct |
|
| 102 |
+
| Latent dim | 256 |
|
| 103 |
+
| Batch size | 4,096 |
|
| 104 |
+
| Optimizer | AdamW (lr=2e-3, wd=1e-4) |
|
| 105 |
+
| Schedule | Cosine with 3-epoch warmup |
|
| 106 |
+
| Loss | Symmetric InfoNCE |
|
| 107 |
+
| Temperature | Learned (init 0.07) |
|
| 108 |
+
| Epochs | 40 |
|
| 109 |
+
|
| 110 |
+
## Quick Start
|
| 111 |
+
|
| 112 |
+
```python
|
| 113 |
+
import torch
|
| 114 |
+
from safetensors.torch import load_file
|
| 115 |
+
|
| 116 |
+
# Load classifier
|
| 117 |
+
weights = load_file("geometric_classifier/model.safetensors")
|
| 118 |
+
# Instantiate GeometricShapeClassifier and load_state_dict(weights)
|
| 119 |
+
|
| 120 |
+
# Load cross-contrast
|
| 121 |
+
text_proj_w = load_file("crosscontrast/text_proj.safetensors")
|
| 122 |
+
voxel_proj_w = load_file("crosscontrast/voxel_proj.safetensors")
|
| 123 |
+
temp = load_file("crosscontrast/temperature.safetensors")
|
| 124 |
+
|
| 125 |
+
# Load cached embeddings
|
| 126 |
+
emb = load_file("qwen_embeddings/embeddings.safetensors")
|
| 127 |
+
text_embeddings = emb["embeddings"] # (38, 1536)
|
| 128 |
+
|
| 129 |
+
# Classify a voxel grid
|
| 130 |
+
grid = torch.zeros(1, 5, 5, 5) # your binary occupancy grid
|
| 131 |
+
grid[0, 2, 2, 2] = 1 # single point
|
| 132 |
+
with torch.no_grad():
|
| 133 |
+
out = model(grid)
|
| 134 |
+
predicted_class = out["class_logits"].argmax(1)
|
| 135 |
+
```
|
| 136 |
+
|
| 137 |
+
## What This Is (and Isn't)
|
| 138 |
+
|
| 139 |
+
This is a **prototype** exploring geometricβlinguistic alignment at small scale. The 5Γ5Γ5 grid is intentionally minimal β large enough to represent 38 distinct geometric primitives with curvature distinctions, small enough to train in minutes on a single GPU. The interesting questions are about the structure of the shared latent space: whether text-space confusions mirror geometric failure modes, whether the alignment generalizes beyond the training vocabulary, and what happens at scale.
|
| 140 |
+
|
| 141 |
+
This is not a production classifier. The procedural dataset is synthetic, the grid resolution is toy-scale, and the cross-contrast vocabulary is fixed at 38 classes.
|
| 142 |
+
|
| 143 |
+
## License
|
| 144 |
+
|
| 145 |
+
MIT
|