Add paper link, GitHub link, and update metadata
#1
by nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,24 +1,28 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
language:
|
| 4 |
-
|
| 5 |
library_name: pytorch
|
|
|
|
|
|
|
| 6 |
tags:
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
pipeline_tag: feature-extraction
|
| 17 |
---
|
| 18 |
|
| 19 |
# Shape Foundation Model — Small v3
|
| 20 |
|
| 21 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
## Model Details
|
| 24 |
|
|
@@ -26,67 +30,30 @@ A 3D geometry foundation model for industrial CAD analysis. Takes a mesh and pro
|
|
| 26 |
|---|---|
|
| 27 |
| **Architecture** | GAOTBackbone (MAGNO Encoder → Transformer Processor → Task Heads) |
|
| 28 |
| **Parameters** | 10,913,297 |
|
| 29 |
-
| **Training objective** | Self-supervised masked token reconstruction
|
| 30 |
| **Training data** | 61,052 industrial CAD meshes from Fusion360, MFCAD, and Thingi10K |
|
| 31 |
| **Precision** | bf16 mixed precision |
|
| 32 |
-
| **
|
| 33 |
-
| **Val reconstruction R²** | **0.729** |
|
| 34 |
-
| **Val SmoothL1 (β=1.0)** | **0.024** |
|
| 35 |
-
| **Contrastive top-1 accuracy** | **98.1%** |
|
| 36 |
-
| **Status** | Self-supervised backbone only — supervised task heads are present but disabled (see Limitations) |
|
| 37 |
|
| 38 |
## Evaluation Results
|
| 39 |
|
| 40 |
Metrics on the held-out validation split (N = 2,983 meshes, deterministic hash-based split):
|
| 41 |
|
| 42 |
-
**Reconstruction (pretraining objective
|
| 43 |
|
| 44 |
| Metric | Value |
|
| 45 |
|---|---:|
|
| 46 |
| SmoothL1 loss (β=1.0) at masked positions | 0.024 |
|
| 47 |
-
| MSE at masked positions | 0.326 |
|
| 48 |
| Coefficient of determination (R²) | **0.729** |
|
| 49 |
|
| 50 |
-
**Contrastive embedding quality
|
| 51 |
|
| 52 |
| Metric | Value |
|
| 53 |
|---|---:|
|
| 54 |
| Top-1 positive-pair retrieval accuracy | **98.1%** |
|
| 55 |
-
| InfoNCE loss (τ=0.07) | 0.146 |
|
| 56 |
| Alignment (positive pairs) | 0.132 |
|
| 57 |
| Uniformity (random pairs) | −3.84 |
|
| 58 |
|
| 59 |
-
**Embedding geometry:**
|
| 60 |
-
|
| 61 |
-
| Metric | Value |
|
| 62 |
-
|---|---:|
|
| 63 |
-
| Random-pair cosine mean | 0.002 |
|
| 64 |
-
| Random-pair cosine std | 0.139 |
|
| 65 |
-
| Embedding L2 norm (mean) | 1.33 |
|
| 66 |
-
|
| 67 |
-
## Ablation: what made training work
|
| 68 |
-
|
| 69 |
-
A controlled 2×2 ablation over `{MSE, SmoothL1} × {raw targets, per-dim normalized targets}` — 20 epochs per variant, identical model, data, optimizer — shows that **per-dimension target normalization is the single decisive intervention**:
|
| 70 |
-
|
| 71 |
-
| Loss | Target norm | R² | Top-1 | Alignment | Uniformity |
|
| 72 |
-
|---|---|---:|---:|---:|---:|
|
| 73 |
-
| MSE | none | 0.133 | 76.1% | 0.366 | −3.37 |
|
| 74 |
-
| SmoothL1 | none | 0.061 | 87.6% | 0.318 | −3.71 |
|
| 75 |
-
| MSE | per-dim | **0.777** | 96.7% | 0.191 | −3.80 |
|
| 76 |
-
| SmoothL1 (this model) | per-dim | 0.702 | **97.1%** | **0.180** | **−3.82** |
|
| 77 |
-
|
| 78 |
-
Without normalization both losses fail (R² < 0.14, top-1 < 88%); with normalization both succeed (R² > 0.70, top-1 > 96%). The choice of SmoothL1 over MSE is a secondary stability hedge for long bf16 training runs, not a primary performance driver. R² is reported in the target space each variant was trained on (raw for no-normalization rows, z-scored per-dimension for normalized rows); top-1, alignment, and uniformity are scale-free and directly comparable across all rows.
|
| 79 |
-
|
| 80 |
-
## Files
|
| 81 |
-
|
| 82 |
-
| File | Size | Purpose |
|
| 83 |
-
|---|---|---|
|
| 84 |
-
| `checkpoint_final.pt` | ~45 MB | Full model state (backbone + loss_computer + optimizer + config) |
|
| 85 |
-
| `small.yaml` | 2 KB | Training config (required to instantiate the model) |
|
| 86 |
-
| `embeddings.npy` | 31 MB | Precomputed 128-dim pooled embeddings for all 61,052 training meshes |
|
| 87 |
-
| `point_clouds.npy` | 358 MB | 512-point samples per training mesh (for retrieval visualization) |
|
| 88 |
-
| `metadata.json` | 6 MB | File names and source dataset per training mesh |
|
| 89 |
-
|
| 90 |
## Usage
|
| 91 |
|
| 92 |
### Install dependencies
|
|
@@ -95,24 +62,7 @@ Without normalization both losses fail (R² < 0.14, top-1 < 88%); with normaliza
|
|
| 95 |
pip install torch trimesh einops numpy scipy huggingface-hub
|
| 96 |
```
|
| 97 |
|
| 98 |
-
You also need the `shape_foundation` package from the [
|
| 99 |
-
|
| 100 |
-
### Download the model
|
| 101 |
-
|
| 102 |
-
```python
|
| 103 |
-
from huggingface_hub import snapshot_download
|
| 104 |
-
|
| 105 |
-
local_dir = snapshot_download(
|
| 106 |
-
repo_id="bayang/shape-foundation-small-v3",
|
| 107 |
-
local_dir="./shape-foundation-small-v3",
|
| 108 |
-
)
|
| 109 |
-
```
|
| 110 |
-
|
| 111 |
-
Or from the command line:
|
| 112 |
-
|
| 113 |
-
```bash
|
| 114 |
-
hf download bayang/shape-foundation-small-v3 --local-dir ./shape-foundation-small-v3
|
| 115 |
-
```
|
| 116 |
|
| 117 |
### Load and run inference
|
| 118 |
|
|
@@ -158,71 +108,16 @@ pooled = out["pooled_embedding"] # (1, 128) — global mesh embedding
|
|
| 158 |
tokens = out["token_embeddings"] # (1, 13824, 128) — per-token features
|
| 159 |
```
|
| 160 |
|
| 161 |
-
##
|
| 162 |
-
|
| 163 |
-
Use the precomputed embedding index to find similar shapes:
|
| 164 |
-
|
| 165 |
-
```python
|
| 166 |
-
import numpy as np
|
| 167 |
-
import json
|
| 168 |
-
|
| 169 |
-
embeddings = np.load("shape-foundation-small-v3/embeddings.npy") # (61052, 128)
|
| 170 |
-
with open("shape-foundation-small-v3/metadata.json") as f:
|
| 171 |
-
metadata = json.load(f)
|
| 172 |
-
|
| 173 |
-
query = pooled.squeeze(0).cpu().numpy()
|
| 174 |
-
query = query / (np.linalg.norm(query) + 1e-8)
|
| 175 |
-
index_norm = embeddings / (np.linalg.norm(embeddings, axis=1, keepdims=True) + 1e-8)
|
| 176 |
-
|
| 177 |
-
similarities = index_norm @ query
|
| 178 |
-
top_k = np.argsort(similarities)[::-1][:5]
|
| 179 |
-
|
| 180 |
-
for idx in top_k:
|
| 181 |
-
print(f"{metadata[idx]['name']:30s} {metadata[idx]['source']:12s} {similarities[idx]:.3f}")
|
| 182 |
-
```
|
| 183 |
-
|
| 184 |
-
### Masked reconstruction heatmap (explainability)
|
| 185 |
-
|
| 186 |
-
The model was pretrained to reconstruct masked token geometry statistics from surrounding context — this gives you a built-in per-region attribution map. High reconstruction error regions are where the model's learned geometric prior considers the input novel or surprising.
|
| 187 |
-
|
| 188 |
-
## Training Data
|
| 189 |
-
|
| 190 |
-
| Dataset | Meshes | Share | Domain |
|
| 191 |
-
|---|---|---|---|
|
| 192 |
-
| **Fusion360** | 35,681 | 58.4% | Parametric CAD designs |
|
| 193 |
-
| **MFCAD** | 15,488 | 25.4% | Manufacturing CAD parts |
|
| 194 |
-
| **Thingi10K** | 9,883 | 16.2% | Community 3D printing / misc geometries |
|
| 195 |
-
| **Total** | **61,052** | 100% | — |
|
| 196 |
-
|
| 197 |
-
All three sources are industrial / CAD-focused, aligned with the target application domain (engineering, manufacturing, simulation setup).
|
| 198 |
-
|
| 199 |
-
Train / val split is deterministic (58,069 train / 2,983 val, 4.89%) using md5 hashing of file paths so assignments are stable across runs, ranks, and machines.
|
| 200 |
-
|
| 201 |
-
## Training Details
|
| 202 |
|
| 203 |
-
- **
|
| 204 |
-
- **
|
| 205 |
-
- **Per-
|
| 206 |
-
- **Optimizer**: AdamW, lr 3e-4, cosine schedule, 500 warmup steps
|
| 207 |
-
- **Epochs**: 50
|
| 208 |
-
- **Mixed precision**: bf16 + DDP + torch.compile on 8 × H100 80GB
|
| 209 |
|
| 210 |
## Limitations
|
| 211 |
|
| 212 |
-
**Supervised task heads are disabled.**
|
| 213 |
-
|
| 214 |
-
**Domain is industrial CAD.** The training data is 100% CAD / engineering parts. The model will transfer poorly to organic shapes (humans, animals, plants) or to reconstructed 3D scans with heavy noise. If your target domain differs, you should fine-tune or retrain.
|
| 215 |
-
|
| 216 |
-
**Contrastive signal saturates.** With per-rank batch size 16, the InfoNCE objective only has 15 negatives per anchor. This is too easy once the backbone has basic shape awareness. The embeddings are still useful for retrieval but the contrastive loss stops providing gradient after the first few epochs.
|
| 217 |
-
|
| 218 |
-
## Intended Use
|
| 219 |
-
|
| 220 |
-
- Dense geometric feature extraction for downstream CAD / engineering tasks
|
| 221 |
-
- Shape retrieval via learned embedding similarity
|
| 222 |
-
- Per-region anomaly detection via masked reconstruction error heatmaps
|
| 223 |
-
- Foundation for fine-tuning on domain-specific labels (once high-quality labels are available)
|
| 224 |
-
|
| 225 |
-
**Not suitable for**: reconstructing 3D geometry from scratch, generating new meshes, classifying general 3D objects outside the CAD domain, or any task that requires the supervised heads to be active.
|
| 226 |
|
| 227 |
## Citation
|
| 228 |
|
|
@@ -233,5 +128,4 @@ Train / val split is deterministic (58,069 train / 2,983 val, 4.89%) using md5 h
|
|
| 233 |
year = {2026},
|
| 234 |
url = {https://huggingface.co/bayang/shape-foundation-small-v3}
|
| 235 |
}
|
| 236 |
-
```
|
| 237 |
-
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
language:
|
| 3 |
+
- en
|
| 4 |
library_name: pytorch
|
| 5 |
+
license: apache-2.0
|
| 6 |
+
pipeline_tag: other
|
| 7 |
tags:
|
| 8 |
+
- 3d
|
| 9 |
+
- geometry
|
| 10 |
+
- point-cloud
|
| 11 |
+
- mesh
|
| 12 |
+
- cad
|
| 13 |
+
- foundation-model
|
| 14 |
+
- self-supervised
|
| 15 |
+
- masked-modeling
|
| 16 |
+
- contrastive-learning
|
|
|
|
| 17 |
---
|
| 18 |
|
| 19 |
# Shape Foundation Model — Small v3
|
| 20 |
|
| 21 |
+
Shape is a self-supervised foundation model that converts surface meshes into dense per-token embeddings for industrial CAD analysis. It combines a structured 3D latent grid, a multi-scale geometry-aware tokenizer (MAGNO), and a transformer processor to enable accurate geometric representations and explainable predictions through a learned reconstruction prior.
|
| 22 |
+
|
| 23 |
+
The model was introduced in the paper: [Shape: A Self-Supervised 3D Geometry Foundation Model for Industrial CAD Analysis](https://huggingface.co/papers/2604.22826).
|
| 24 |
+
|
| 25 |
+
[Code](https://github.com/simd-ai/shape) | [Project Page & Demo](https://shape.simd.space)
|
| 26 |
|
| 27 |
## Model Details
|
| 28 |
|
|
|
|
| 30 |
|---|---|
|
| 31 |
| **Architecture** | GAOTBackbone (MAGNO Encoder → Transformer Processor → Task Heads) |
|
| 32 |
| **Parameters** | 10,913,297 |
|
| 33 |
+
| **Training objective** | Self-supervised masked token reconstruction + multi-resolution contrastive learning |
|
| 34 |
| **Training data** | 61,052 industrial CAD meshes from Fusion360, MFCAD, and Thingi10K |
|
| 35 |
| **Precision** | bf16 mixed precision |
|
| 36 |
+
| **Status** | Self-supervised backbone only — supervised task heads are present but disabled |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
## Evaluation Results
|
| 39 |
|
| 40 |
Metrics on the held-out validation split (N = 2,983 meshes, deterministic hash-based split):
|
| 41 |
|
| 42 |
+
**Reconstruction (pretraining objective):**
|
| 43 |
|
| 44 |
| Metric | Value |
|
| 45 |
|---|---:|
|
| 46 |
| SmoothL1 loss (β=1.0) at masked positions | 0.024 |
|
|
|
|
| 47 |
| Coefficient of determination (R²) | **0.729** |
|
| 48 |
|
| 49 |
+
**Contrastive embedding quality (Wang & Isola 2020):**
|
| 50 |
|
| 51 |
| Metric | Value |
|
| 52 |
|---|---:|
|
| 53 |
| Top-1 positive-pair retrieval accuracy | **98.1%** |
|
|
|
|
| 54 |
| Alignment (positive pairs) | 0.132 |
|
| 55 |
| Uniformity (random pairs) | −3.84 |
|
| 56 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
## Usage
|
| 58 |
|
| 59 |
### Install dependencies
|
|
|
|
| 62 |
pip install torch trimesh einops numpy scipy huggingface-hub
|
| 63 |
```
|
| 64 |
|
| 65 |
+
Note: You also need the `shape_foundation` package from the [official repository](https://github.com/simd-ai/shape).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66 |
|
| 67 |
### Load and run inference
|
| 68 |
|
|
|
|
| 108 |
tokens = out["token_embeddings"] # (1, 13824, 128) — per-token features
|
| 109 |
```
|
| 110 |
|
| 111 |
+
## Intended Use
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 112 |
|
| 113 |
+
- **Dense geometric feature extraction** for downstream CAD / engineering tasks.
|
| 114 |
+
- **Shape retrieval** via learned embedding similarity.
|
| 115 |
+
- **Per-region anomaly detection** via masked reconstruction error heatmaps.
|
|
|
|
|
|
|
|
|
|
| 116 |
|
| 117 |
## Limitations
|
| 118 |
|
| 119 |
+
- **Supervised task heads are disabled.** This checkpoint only supports backbone embeddings and masked reconstruction.
|
| 120 |
+
- **Domain specificity.** The model is trained on 100% industrial CAD data and will transfer poorly to organic shapes or noisy 3D scans.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 121 |
|
| 122 |
## Citation
|
| 123 |
|
|
|
|
| 128 |
year = {2026},
|
| 129 |
url = {https://huggingface.co/bayang/shape-foundation-small-v3}
|
| 130 |
}
|
| 131 |
+
```
|
|
|