Add paper link, GitHub link, and update metadata

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +29 -135
README.md CHANGED
@@ -1,24 +1,28 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
- - en
5
  library_name: pytorch
 
 
6
  tags:
7
- - 3d
8
- - geometry
9
- - point-cloud
10
- - mesh
11
- - cad
12
- - foundation-model
13
- - self-supervised
14
- - masked-modeling
15
- - contrastive-learning
16
- pipeline_tag: feature-extraction
17
  ---
18
 
19
  # Shape Foundation Model — Small v3
20
 
21
- A 3D geometry foundation model for industrial CAD analysis. Takes a mesh and produces dense geometric embeddings plus a self-supervised reconstruction prior that enables per-token attribution for explainable predictions.
 
 
 
 
22
 
23
  ## Model Details
24
 
@@ -26,67 +30,30 @@ A 3D geometry foundation model for industrial CAD analysis. Takes a mesh and pro
26
  |---|---|
27
  | **Architecture** | GAOTBackbone (MAGNO Encoder → Transformer Processor → Task Heads) |
28
  | **Parameters** | 10,913,297 |
29
- | **Training objective** | Self-supervised masked token reconstruction (SmoothL1 β=1.0, per-dim normalized targets) + multi-resolution contrastive learning (InfoNCE τ=0.07) |
30
  | **Training data** | 61,052 industrial CAD meshes from Fusion360, MFCAD, and Thingi10K |
31
  | **Precision** | bf16 mixed precision |
32
- | **Compute** | 8 × NVIDIA H100 80GB, 50 epochs |
33
- | **Val reconstruction R²** | **0.729** |
34
- | **Val SmoothL1 (β=1.0)** | **0.024** |
35
- | **Contrastive top-1 accuracy** | **98.1%** |
36
- | **Status** | Self-supervised backbone only — supervised task heads are present but disabled (see Limitations) |
37
 
38
  ## Evaluation Results
39
 
40
  Metrics on the held-out validation split (N = 2,983 meshes, deterministic hash-based split):
41
 
42
- **Reconstruction (pretraining objective, in normalized target space):**
43
 
44
  | Metric | Value |
45
  |---|---:|
46
  | SmoothL1 loss (β=1.0) at masked positions | 0.024 |
47
- | MSE at masked positions | 0.326 |
48
  | Coefficient of determination (R²) | **0.729** |
49
 
50
- **Contrastive embedding quality** (Wang & Isola 2020 framework, pool size 2048):
51
 
52
  | Metric | Value |
53
  |---|---:|
54
  | Top-1 positive-pair retrieval accuracy | **98.1%** |
55
- | InfoNCE loss (τ=0.07) | 0.146 |
56
  | Alignment (positive pairs) | 0.132 |
57
  | Uniformity (random pairs) | −3.84 |
58
 
59
- **Embedding geometry:**
60
-
61
- | Metric | Value |
62
- |---|---:|
63
- | Random-pair cosine mean | 0.002 |
64
- | Random-pair cosine std | 0.139 |
65
- | Embedding L2 norm (mean) | 1.33 |
66
-
67
- ## Ablation: what made training work
68
-
69
- A controlled 2×2 ablation over `{MSE, SmoothL1} × {raw targets, per-dim normalized targets}` — 20 epochs per variant, identical model, data, optimizer — shows that **per-dimension target normalization is the single decisive intervention**:
70
-
71
- | Loss | Target norm | R² | Top-1 | Alignment | Uniformity |
72
- |---|---|---:|---:|---:|---:|
73
- | MSE | none | 0.133 | 76.1% | 0.366 | −3.37 |
74
- | SmoothL1 | none | 0.061 | 87.6% | 0.318 | −3.71 |
75
- | MSE | per-dim | **0.777** | 96.7% | 0.191 | −3.80 |
76
- | SmoothL1 (this model) | per-dim | 0.702 | **97.1%** | **0.180** | **−3.82** |
77
-
78
- Without normalization both losses fail (R² < 0.14, top-1 < 88%); with normalization both succeed (R² > 0.70, top-1 > 96%). The choice of SmoothL1 over MSE is a secondary stability hedge for long bf16 training runs, not a primary performance driver. R² is reported in the target space each variant was trained on (raw for no-normalization rows, z-scored per-dimension for normalized rows); top-1, alignment, and uniformity are scale-free and directly comparable across all rows.
79
-
80
- ## Files
81
-
82
- | File | Size | Purpose |
83
- |---|---|---|
84
- | `checkpoint_final.pt` | ~45 MB | Full model state (backbone + loss_computer + optimizer + config) |
85
- | `small.yaml` | 2 KB | Training config (required to instantiate the model) |
86
- | `embeddings.npy` | 31 MB | Precomputed 128-dim pooled embeddings for all 61,052 training meshes |
87
- | `point_clouds.npy` | 358 MB | 512-point samples per training mesh (for retrieval visualization) |
88
- | `metadata.json` | 6 MB | File names and source dataset per training mesh |
89
-
90
  ## Usage
91
 
92
  ### Install dependencies
@@ -95,24 +62,7 @@ Without normalization both losses fail (R² < 0.14, top-1 < 88%); with normaliza
95
  pip install torch trimesh einops numpy scipy huggingface-hub
96
  ```
97
 
98
- You also need the `shape_foundation` package from the [training repo](https://github.com/simd-ai/shape-v2).
99
-
100
- ### Download the model
101
-
102
- ```python
103
- from huggingface_hub import snapshot_download
104
-
105
- local_dir = snapshot_download(
106
- repo_id="bayang/shape-foundation-small-v3",
107
- local_dir="./shape-foundation-small-v3",
108
- )
109
- ```
110
-
111
- Or from the command line:
112
-
113
- ```bash
114
- hf download bayang/shape-foundation-small-v3 --local-dir ./shape-foundation-small-v3
115
- ```
116
 
117
  ### Load and run inference
118
 
@@ -158,71 +108,16 @@ pooled = out["pooled_embedding"] # (1, 128) — global mesh embedding
158
  tokens = out["token_embeddings"] # (1, 13824, 128) — per-token features
159
  ```
160
 
161
- ### Shape retrieval
162
-
163
- Use the precomputed embedding index to find similar shapes:
164
-
165
- ```python
166
- import numpy as np
167
- import json
168
-
169
- embeddings = np.load("shape-foundation-small-v3/embeddings.npy") # (61052, 128)
170
- with open("shape-foundation-small-v3/metadata.json") as f:
171
- metadata = json.load(f)
172
-
173
- query = pooled.squeeze(0).cpu().numpy()
174
- query = query / (np.linalg.norm(query) + 1e-8)
175
- index_norm = embeddings / (np.linalg.norm(embeddings, axis=1, keepdims=True) + 1e-8)
176
-
177
- similarities = index_norm @ query
178
- top_k = np.argsort(similarities)[::-1][:5]
179
-
180
- for idx in top_k:
181
- print(f"{metadata[idx]['name']:30s} {metadata[idx]['source']:12s} {similarities[idx]:.3f}")
182
- ```
183
-
184
- ### Masked reconstruction heatmap (explainability)
185
-
186
- The model was pretrained to reconstruct masked token geometry statistics from surrounding context — this gives you a built-in per-region attribution map. High reconstruction error regions are where the model's learned geometric prior considers the input novel or surprising.
187
-
188
- ## Training Data
189
-
190
- | Dataset | Meshes | Share | Domain |
191
- |---|---|---|---|
192
- | **Fusion360** | 35,681 | 58.4% | Parametric CAD designs |
193
- | **MFCAD** | 15,488 | 25.4% | Manufacturing CAD parts |
194
- | **Thingi10K** | 9,883 | 16.2% | Community 3D printing / misc geometries |
195
- | **Total** | **61,052** | 100% | — |
196
-
197
- All three sources are industrial / CAD-focused, aligned with the target application domain (engineering, manufacturing, simulation setup).
198
-
199
- Train / val split is deterministic (58,069 train / 2,983 val, 4.89%) using md5 hashing of file paths so assignments are stable across runs, ranks, and machines.
200
-
201
- ## Training Details
202
 
203
- - **Masked token reconstruction** (weight 1.0): 50% of latent tokens masked, SmoothL1 loss (β=1.0) on normalized geometry statistics
204
- - **Multi-resolution contrastive** (weight 0.2): InfoNCE with jitter σ=0.02 and 30% point dropout
205
- - **Per-dimension target normalization**: calibrated once on 56M tokens from the training split, stored as buffers on the loss computer
206
- - **Optimizer**: AdamW, lr 3e-4, cosine schedule, 500 warmup steps
207
- - **Epochs**: 50
208
- - **Mixed precision**: bf16 + DDP + torch.compile on 8 × H100 80GB
209
 
210
  ## Limitations
211
 
212
- **Supervised task heads are disabled.** The checkpoint contains symmetry, primitive, part, and reduction heads from the architecture, but all supervised loss weights are set to `0.0` during training. Attempting to use these heads for inference will return near-random outputs because they were never updated by gradient descent. The stock synthetic labels in the training data do not generalize across unseen meshes (train CE ~1e-4 vs val CE ~2.5 in prior runs), which is why they were disabled. Only use the backbone embeddings and the reconstruction head.
213
-
214
- **Domain is industrial CAD.** The training data is 100% CAD / engineering parts. The model will transfer poorly to organic shapes (humans, animals, plants) or to reconstructed 3D scans with heavy noise. If your target domain differs, you should fine-tune or retrain.
215
-
216
- **Contrastive signal saturates.** With per-rank batch size 16, the InfoNCE objective only has 15 negatives per anchor. This is too easy once the backbone has basic shape awareness. The embeddings are still useful for retrieval but the contrastive loss stops providing gradient after the first few epochs.
217
-
218
- ## Intended Use
219
-
220
- - Dense geometric feature extraction for downstream CAD / engineering tasks
221
- - Shape retrieval via learned embedding similarity
222
- - Per-region anomaly detection via masked reconstruction error heatmaps
223
- - Foundation for fine-tuning on domain-specific labels (once high-quality labels are available)
224
-
225
- **Not suitable for**: reconstructing 3D geometry from scratch, generating new meshes, classifying general 3D objects outside the CAD domain, or any task that requires the supervised heads to be active.
226
 
227
  ## Citation
228
 
@@ -233,5 +128,4 @@ Train / val split is deterministic (58,069 train / 2,983 val, 4.89%) using md5 h
233
  year = {2026},
234
  url = {https://huggingface.co/bayang/shape-foundation-small-v3}
235
  }
236
- ```
237
-
 
1
  ---
 
2
  language:
3
+ - en
4
  library_name: pytorch
5
+ license: apache-2.0
6
+ pipeline_tag: other
7
  tags:
8
+ - 3d
9
+ - geometry
10
+ - point-cloud
11
+ - mesh
12
+ - cad
13
+ - foundation-model
14
+ - self-supervised
15
+ - masked-modeling
16
+ - contrastive-learning
 
17
  ---
18
 
19
  # Shape Foundation Model — Small v3
20
 
21
+ Shape is a self-supervised foundation model that converts surface meshes into dense per-token embeddings for industrial CAD analysis. It combines a structured 3D latent grid, a multi-scale geometry-aware tokenizer (MAGNO), and a transformer processor to enable accurate geometric representations and explainable predictions through a learned reconstruction prior.
22
+
23
+ The model was introduced in the paper: [Shape: A Self-Supervised 3D Geometry Foundation Model for Industrial CAD Analysis](https://huggingface.co/papers/2604.22826).
24
+
25
+ [Code](https://github.com/simd-ai/shape) | [Project Page & Demo](https://shape.simd.space)
26
 
27
  ## Model Details
28
 
 
30
  |---|---|
31
  | **Architecture** | GAOTBackbone (MAGNO Encoder → Transformer Processor → Task Heads) |
32
  | **Parameters** | 10,913,297 |
33
+ | **Training objective** | Self-supervised masked token reconstruction + multi-resolution contrastive learning |
34
  | **Training data** | 61,052 industrial CAD meshes from Fusion360, MFCAD, and Thingi10K |
35
  | **Precision** | bf16 mixed precision |
36
+ | **Status** | Self-supervised backbone only supervised task heads are present but disabled |
 
 
 
 
37
 
38
  ## Evaluation Results
39
 
40
  Metrics on the held-out validation split (N = 2,983 meshes, deterministic hash-based split):
41
 
42
+ **Reconstruction (pretraining objective):**
43
 
44
  | Metric | Value |
45
  |---|---:|
46
  | SmoothL1 loss (β=1.0) at masked positions | 0.024 |
 
47
  | Coefficient of determination (R²) | **0.729** |
48
 
49
+ **Contrastive embedding quality (Wang & Isola 2020):**
50
 
51
  | Metric | Value |
52
  |---|---:|
53
  | Top-1 positive-pair retrieval accuracy | **98.1%** |
 
54
  | Alignment (positive pairs) | 0.132 |
55
  | Uniformity (random pairs) | −3.84 |
56
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
  ## Usage
58
 
59
  ### Install dependencies
 
62
  pip install torch trimesh einops numpy scipy huggingface-hub
63
  ```
64
 
65
+ Note: You also need the `shape_foundation` package from the [official repository](https://github.com/simd-ai/shape).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
 
67
  ### Load and run inference
68
 
 
108
  tokens = out["token_embeddings"] # (1, 13824, 128) — per-token features
109
  ```
110
 
111
+ ## Intended Use
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
112
 
113
+ - **Dense geometric feature extraction** for downstream CAD / engineering tasks.
114
+ - **Shape retrieval** via learned embedding similarity.
115
+ - **Per-region anomaly detection** via masked reconstruction error heatmaps.
 
 
 
116
 
117
  ## Limitations
118
 
119
+ - **Supervised task heads are disabled.** This checkpoint only supports backbone embeddings and masked reconstruction.
120
+ - **Domain specificity.** The model is trained on 100% industrial CAD data and will transfer poorly to organic shapes or noisy 3D scans.
 
 
 
 
 
 
 
 
 
 
 
 
121
 
122
  ## Citation
123
 
 
128
  year = {2026},
129
  url = {https://huggingface.co/bayang/shape-foundation-small-v3}
130
  }
131
+ ```