bayang
/

shape-foundation-small-v3

@@ -1,24 +1,28 @@
 ---
-license: apache-2.0
 language:
-  - en
 library_name: pytorch
 tags:
-  - 3d
-  - geometry
-  - point-cloud
-  - mesh
-  - cad
-  - foundation-model
-  - self-supervised
-  - masked-modeling
-  - contrastive-learning
-pipeline_tag: feature-extraction
 ---
 # Shape Foundation Model — Small v3
-A 3D geometry foundation model for industrial CAD analysis. Takes a mesh and produces dense geometric embeddings plus a self-supervised reconstruction prior that enables per-token attribution for explainable predictions.
 ## Model Details
@@ -26,67 +30,30 @@ A 3D geometry foundation model for industrial CAD analysis. Takes a mesh and pro
 |---|---|
 | **Architecture** | GAOTBackbone (MAGNO Encoder → Transformer Processor → Task Heads) |
 | **Parameters** | 10,913,297 |
-| **Training objective** | Self-supervised masked token reconstruction (SmoothL1 β=1.0, per-dim normalized targets) + multi-resolution contrastive learning (InfoNCE τ=0.07) |
 | **Training data** | 61,052 industrial CAD meshes from Fusion360, MFCAD, and Thingi10K |
 | **Precision** | bf16 mixed precision |
-| **Compute** | 8 × NVIDIA H100 80GB, 50 epochs |
-| **Val reconstruction R²** | **0.729** |
-| **Val SmoothL1 (β=1.0)** | **0.024** |
-| **Contrastive top-1 accuracy** | **98.1%** |
-| **Status** | Self-supervised backbone only — supervised task heads are present but disabled (see Limitations) |
 ## Evaluation Results
 Metrics on the held-out validation split (N = 2,983 meshes, deterministic hash-based split):
-**Reconstruction (pretraining objective, in normalized target space):**
 | Metric | Value |
 |---|---:|
 | SmoothL1 loss (β=1.0) at masked positions | 0.024 |
-| MSE at masked positions | 0.326 |
 | Coefficient of determination (R²) | **0.729** |
-**Contrastive embedding quality** (Wang & Isola 2020 framework, pool size 2048):
 | Metric | Value |
 |---|---:|
 | Top-1 positive-pair retrieval accuracy | **98.1%** |
-| InfoNCE loss (τ=0.07) | 0.146 |
 | Alignment (positive pairs) | 0.132 |
 | Uniformity (random pairs) | −3.84 |
-**Embedding geometry:**
-| Metric | Value |
-|---|---:|
-| Random-pair cosine mean | 0.002 |
-| Random-pair cosine std | 0.139 |
-| Embedding L2 norm (mean) | 1.33 |
-## Ablation: what made training work
-A controlled 2×2 ablation over `{MSE, SmoothL1} × {raw targets, per-dim normalized targets}` — 20 epochs per variant, identical model, data, optimizer — shows that **per-dimension target normalization is the single decisive intervention**:
-| Loss | Target norm | R² | Top-1 | Alignment | Uniformity |
-|---|---|---:|---:|---:|---:|
-| MSE | none | 0.133 | 76.1% | 0.366 | −3.37 |
-| SmoothL1 | none | 0.061 | 87.6% | 0.318 | −3.71 |
-| MSE | per-dim | **0.777** | 96.7% | 0.191 | −3.80 |
-| SmoothL1 (this model) | per-dim | 0.702 | **97.1%** | **0.180** | **−3.82** |
-Without normalization both losses fail (R² < 0.14, top-1 < 88%); with normalization both succeed (R² > 0.70, top-1 > 96%). The choice of SmoothL1 over MSE is a secondary stability hedge for long bf16 training runs, not a primary performance driver. R² is reported in the target space each variant was trained on (raw for no-normalization rows, z-scored per-dimension for normalized rows); top-1, alignment, and uniformity are scale-free and directly comparable across all rows.
-## Files
-| File | Size | Purpose |
-|---|---|---|
-| `checkpoint_final.pt` | ~45 MB | Full model state (backbone + loss_computer + optimizer + config) |
-| `small.yaml` | 2 KB | Training config (required to instantiate the model) |
-| `embeddings.npy` | 31 MB | Precomputed 128-dim pooled embeddings for all 61,052 training meshes |
-| `point_clouds.npy` | 358 MB | 512-point samples per training mesh (for retrieval visualization) |
-| `metadata.json` | 6 MB | File names and source dataset per training mesh |
 ## Usage
 ### Install dependencies
@@ -95,24 +62,7 @@ Without normalization both losses fail (R² < 0.14, top-1 < 88%); with normaliza
 pip install torch trimesh einops numpy scipy huggingface-hub
 ```
-You also need the `shape_foundation` package from the [training repo](https://github.com/simd-ai/shape-v2).
-### Download the model
-```python
-from huggingface_hub import snapshot_download
-local_dir = snapshot_download(
-    repo_id="bayang/shape-foundation-small-v3",
-    local_dir="./shape-foundation-small-v3",
-)
-```
-Or from the command line:
-```bash
-hf download bayang/shape-foundation-small-v3 --local-dir ./shape-foundation-small-v3
-```
 ### Load and run inference
@@ -158,71 +108,16 @@ pooled = out["pooled_embedding"]        # (1, 128) — global mesh embedding
 tokens = out["token_embeddings"]        # (1, 13824, 128) — per-token features
 ```
-### Shape retrieval
-Use the precomputed embedding index to find similar shapes:
-```python
-import numpy as np
-import json
-embeddings = np.load("shape-foundation-small-v3/embeddings.npy")  # (61052, 128)
-with open("shape-foundation-small-v3/metadata.json") as f:
-    metadata = json.load(f)
-query = pooled.squeeze(0).cpu().numpy()
-query = query / (np.linalg.norm(query) + 1e-8)
-index_norm = embeddings / (np.linalg.norm(embeddings, axis=1, keepdims=True) + 1e-8)
-similarities = index_norm @ query
-top_k = np.argsort(similarities)[::-1][:5]
-for idx in top_k:
-    print(f"{metadata[idx]['name']:30s}  {metadata[idx]['source']:12s}  {similarities[idx]:.3f}")
-```
-### Masked reconstruction heatmap (explainability)
-The model was pretrained to reconstruct masked token geometry statistics from surrounding context — this gives you a built-in per-region attribution map. High reconstruction error regions are where the model's learned geometric prior considers the input novel or surprising.
-## Training Data
-| Dataset | Meshes | Share | Domain |
-|---|---|---|---|
-| **Fusion360** | 35,681 | 58.4% | Parametric CAD designs |
-| **MFCAD** | 15,488 | 25.4% | Manufacturing CAD parts |
-| **Thingi10K** | 9,883 | 16.2% | Community 3D printing / misc geometries |
-| **Total** | **61,052** | 100% | — |
-All three sources are industrial / CAD-focused, aligned with the target application domain (engineering, manufacturing, simulation setup).
-Train / val split is deterministic (58,069 train / 2,983 val, 4.89%) using md5 hashing of file paths so assignments are stable across runs, ranks, and machines.
-## Training Details
-- **Masked token reconstruction** (weight 1.0): 50% of latent tokens masked, SmoothL1 loss (β=1.0) on normalized geometry statistics
-- **Multi-resolution contrastive** (weight 0.2): InfoNCE with jitter σ=0.02 and 30% point dropout
-- **Per-dimension target normalization**: calibrated once on 56M tokens from the training split, stored as buffers on the loss computer
-- **Optimizer**: AdamW, lr 3e-4, cosine schedule, 500 warmup steps
-- **Epochs**: 50
-- **Mixed precision**: bf16 + DDP + torch.compile on 8 × H100 80GB
 ## Limitations
-**Supervised task heads are disabled.** The checkpoint contains symmetry, primitive, part, and reduction heads from the architecture, but all supervised loss weights are set to `0.0` during training. Attempting to use these heads for inference will return near-random outputs because they were never updated by gradient descent. The stock synthetic labels in the training data do not generalize across unseen meshes (train CE ~1e-4 vs val CE ~2.5 in prior runs), which is why they were disabled. Only use the backbone embeddings and the reconstruction head.
-**Domain is industrial CAD.** The training data is 100% CAD / engineering parts. The model will transfer poorly to organic shapes (humans, animals, plants) or to reconstructed 3D scans with heavy noise. If your target domain differs, you should fine-tune or retrain.
-**Contrastive signal saturates.** With per-rank batch size 16, the InfoNCE objective only has 15 negatives per anchor. This is too easy once the backbone has basic shape awareness. The embeddings are still useful for retrieval but the contrastive loss stops providing gradient after the first few epochs.
-## Intended Use
-- Dense geometric feature extraction for downstream CAD / engineering tasks
-- Shape retrieval via learned embedding similarity
-- Per-region anomaly detection via masked reconstruction error heatmaps
-- Foundation for fine-tuning on domain-specific labels (once high-quality labels are available)
-**Not suitable for**: reconstructing 3D geometry from scratch, generating new meshes, classifying general 3D objects outside the CAD domain, or any task that requires the supervised heads to be active.
 ## Citation
@@ -233,5 +128,4 @@ Train / val split is deterministic (58,069 train / 2,983 val, 4.89%) using md5 h
   year   = {2026},
   url    = {https://huggingface.co/bayang/shape-foundation-small-v3}
 }
-```

 ---
 language:
+- en
 library_name: pytorch
+license: apache-2.0
+pipeline_tag: other
 tags:
+- 3d
+- geometry
+- point-cloud
+- mesh
+- cad
+- foundation-model
+- self-supervised
+- masked-modeling
+- contrastive-learning
 ---
 # Shape Foundation Model — Small v3
+Shape is a self-supervised foundation model that converts surface meshes into dense per-token embeddings for industrial CAD analysis. It combines a structured 3D latent grid, a multi-scale geometry-aware tokenizer (MAGNO), and a transformer processor to enable accurate geometric representations and explainable predictions through a learned reconstruction prior.
+The model was introduced in the paper: [Shape: A Self-Supervised 3D Geometry Foundation Model for Industrial CAD Analysis](https://huggingface.co/papers/2604.22826).
+[Code](https://github.com/simd-ai/shape) | [Project Page & Demo](https://shape.simd.space)
 ## Model Details
 |---|---|
 | **Architecture** | GAOTBackbone (MAGNO Encoder → Transformer Processor → Task Heads) |
 | **Parameters** | 10,913,297 |
+| **Training objective** | Self-supervised masked token reconstruction + multi-resolution contrastive learning |
 | **Training data** | 61,052 industrial CAD meshes from Fusion360, MFCAD, and Thingi10K |
 | **Precision** | bf16 mixed precision |
+| **Status** | Self-supervised backbone only — supervised task heads are present but disabled |
 ## Evaluation Results
 Metrics on the held-out validation split (N = 2,983 meshes, deterministic hash-based split):
+**Reconstruction (pretraining objective):**
 | Metric | Value |
 |---|---:|
 | SmoothL1 loss (β=1.0) at masked positions | 0.024 |
 | Coefficient of determination (R²) | **0.729** |
+**Contrastive embedding quality (Wang & Isola 2020):**
 | Metric | Value |
 |---|---:|
 | Top-1 positive-pair retrieval accuracy | **98.1%** |
 | Alignment (positive pairs) | 0.132 |
 | Uniformity (random pairs) | −3.84 |
 ## Usage
 ### Install dependencies
 pip install torch trimesh einops numpy scipy huggingface-hub
 ```
+Note: You also need the `shape_foundation` package from the [official repository](https://github.com/simd-ai/shape).
 ### Load and run inference
 tokens = out["token_embeddings"]        # (1, 13824, 128) — per-token features
 ```
+## Intended Use
+- **Dense geometric feature extraction** for downstream CAD / engineering tasks.
+- **Shape retrieval** via learned embedding similarity.
+- **Per-region anomaly detection** via masked reconstruction error heatmaps.
 ## Limitations
+- **Supervised task heads are disabled.** This checkpoint only supports backbone embeddings and masked reconstruction.
+- **Domain specificity.** The model is trained on 100% industrial CAD data and will transfer poorly to organic shapes or noisy 3D scans.
 ## Citation
   year   = {2026},
   url    = {https://huggingface.co/bayang/shape-foundation-small-v3}
 }
+```