donglaix commited on
Commit
ccf24c1
·
verified ·
1 Parent(s): 624ee0f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -16
README.md CHANGED
@@ -14,9 +14,18 @@ This model is ready for commercial/non-commercial use.
14
 
15
  ## License/Terms of Use:
16
 
17
- This model is released under the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
18
 
19
- Important Note: If you bypass, disable, reduce the efficacy of, or circumvent any technical limitation, safety guardrail or associated safety guardrail hyperparameter, encryption, security, digital rights management, or authentication mechanism contained in the Model, your rights under NVIDIA Open Model License Agreement will automatically terminate.
 
 
 
 
 
 
 
 
 
20
 
21
  ## Deployment Geography:
22
  Global
@@ -47,7 +56,7 @@ VoMP: Predicting Volumetric Mechanical Property Fields. Rishit Dagli, Donglai Xi
47
  ### MatVAE (Material VAE)
48
  - Role: Defines the material manifold and decodes latents to (E, ν, ρ)
49
  - Design: 2‑D latent, 3‑layer MLP encoder/decoder (256 hidden), radial normalizing flow posterior
50
- - Objective: Modified β‑TC‑VAE with KL/MI/TC weighting and annealing
51
 
52
  Training uses AdamW optimizer with learning rate 1×10^-4, weight decay 5×10^-2 for Geometry Transformer and 1×10^-4 for MatVAE. Geometry Transformer trained with FP16 precision for 200,000 steps with batch size 16 (4 per GPU on 4 A100s), gradient clipping at 1.0, cosine annealing LR scheduler, EMA rate 0.9999, and ℓ2 loss. MatVAE trained with FP32 precision for 850 epochs with batch size 256, gradient clipping at 5.0, cosine annealing to final LR 1×10^-5, modified β-TC-VAE objective with weights α=1.0 (KL), β=2.0 (TC), γ=1.0 (MI), free nats constraint δ=0.1, KL annealing over 200 epochs, and log min-max normalization. Regularization via weight decay, dropout rate 0.05 in MatVAE, capacity constraints to prevent posterior collapse, and radial normalizing flow for flexible posterior distribution.
53
 
@@ -123,18 +132,18 @@ Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated sys
123
 
124
  **Preferred/Supported Operating System(s):**
125
  * Linux (development and testing performed on Linux environments)
126
- * Should work on Windows and macOS but not tested
127
 
128
- The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
129
 
130
  The VOMP model can be integrated into an AI system by: (1) Using the voxelization pipeline to convert input 3D geometry into 64³ voxel grid, (2) Rendering 150 multi-view images using provided camera sampling, (3) Extracting DINOv2 features and aggregating to voxel centers, (4) Running the Geometry Transformer to predict per-voxel material latent codes, (5) Decoding latents with MatVAE to obtain (E, ν, ρ) values, (6) Transferring voxel materials to target representation via nearest-neighbor interpolation.
131
 
132
  ## Model Version(s):
133
- v1.0
134
 
135
  ## Training Dataset (Per Model):
136
  ### For Geometry Transformer — GVM
137
- **Link:** [GVM (Geometry with Volumetric Materials)](https://huggingface.co/datasets/nvidia/VoMP-GVM-Dataset) — NVIDIA assets with VLM‑assisted labels (Qwen 2.5 VL‑72B)
138
 
139
  **Data Modality:**
140
  * 3D Geometry (meshes with part segmentation)
@@ -166,9 +175,7 @@ v1.0
166
  Training set contains 1,333 objects with 6,477 segments and 28.7M voxels. Data modalities include (i) 3D meshes (triangle meshes in USD format with part-level segmentation), multi-view RGB images (512×512 rendered views), text (English material names), and tabular material properties (E, ν, ρ triplets); (ii) Nature of content is non-personal, proprietary 3D assets (NVIDIA asset libraries) combined with public domain material science data, no copyright-protected creative content, machine-generated annotations (VLM) constrained by human-measured physical properties; (iii) Linguistic characteristics include English material names and semantic object labels. No sensors were used for data collection; 3D assets are human-modeled, material properties are from laboratory measurements (ASTM standard testing), and images are path-traced renders. Average 4.86 segments per object (±11.97 std dev), 21,537 voxels per object (±23,431 std dev). Material property ranges: E [1.0×10^5, 2.8×10^11 Pa], ν [0.16, 0.49], ρ [50, 19,300 kg/m³].
167
 
168
  ## Testing Dataset:
169
- **Link:**
170
- - [GVM test split](https://huggingface.co/datasets/nvidia/VoMP-GVM-Dataset)
171
- - [ABO-500](https://amazon-berkeley-objects.s3.amazonaws.com/index.html) - Amazon Berkeley Objects
172
 
173
  **Data Collection Method by dataset:**
174
  * Hybrid: Human, Automated (VLM-assisted)
@@ -181,7 +188,7 @@ Test set contains 166 objects with 1,060 segments and 4.9M voxels (13.1% of tota
181
 
182
  ## Evaluation Dataset:
183
  **Link:**
184
- - Primary: [GVM test split](https://huggingface.co/datasets/nvidia/VoMP-GVM-Dataset) (166 objects, 4.9M voxel annotations)
185
  - Secondary: [ABO-500](https://amazon-berkeley-objects.s3.amazonaws.com/index.html) mass estimation benchmark - 500 objects with ground truth mass labels
186
 
187
  **Benchmark Score:**
@@ -191,7 +198,7 @@ Test set contains 166 objects with 1,060 segments and 4.9M voxels (13.1% of tota
191
  - ARE: Average Relative Error
192
  - MnRE: Minimum Ratio Error
193
 
194
- On [GVM test set](https://huggingface.co/datasets/nvidia/VoMP-GVM-Dataset) (per-object averaged):
195
  - Young's Modulus: ALDE 0.3793 (±0.29), ALRE 0.0409 (±0.04)
196
  - Poisson's Ratio: ADE 0.0241 (±0.01), ARE 0.0818 (±0.03)
197
  - Density: ADE 142.69 kg/m³ (±166.90), ARE 0.0921 (±0.07)
@@ -201,10 +208,10 @@ On ABO-500 mass estimation benchmark:
201
  - ALDE: 0.631, ADE: 8.433, ARE: 0.887, MnRE: 0.576
202
 
203
  **Data Collection Method by dataset:**
204
- * Hybrid: Human, Automated (for [GVM test set](https://huggingface.co/datasets/nvidia/VoMP-GVM-Dataset)); Human (for ABO-500)
205
 
206
  **Labeling Method by dataset:**
207
- * Hybrid: Human, Automated (for [GVM test set](https://huggingface.co/datasets/nvidia/VoMP-GVM-Dataset)); Human (for ABO-500 ground truth mass)
208
 
209
  **Properties (Quantity, Dataset Descriptions, Sensor(s)):**
210
  GVM test set: 166 objects, 4.9M annotated voxels with ground truth (E, ν, ρ) per voxel. Significantly larger evaluation benchmark than prior works (e.g., NeRF2Physics used 31 points across 11 objects). ABO-500: 500 Amazon objects with measured mass labels, used for density validation via volume integration. Both evaluation sets contain non-personal, non-sensitive 3D geometry and material/mass labels.
@@ -224,6 +231,14 @@ GVM test set: 166 objects, 4.9M annotated voxels with ground truth (E, ν, ρ) p
224
  - Geometry Transformer: 0.0082s
225
  - MatVAE decoding: 0.00032s
226
 
 
 
 
 
 
 
 
 
227
  ## Ethical Considerations:
228
 
229
  NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
@@ -254,7 +269,7 @@ Technical Limitations & Mitigation:
254
  Verified to have met prescribed NVIDIA quality standards: | Yes
255
  Performance Metrics: | Average Log Displacement Error (ALDE), Average Displacement Error (ADE), Average Log Relative Error (ALRE), Average Relative Error (ARE) for Young's modulus, Poisson's ratio, and density; Mass Estimation metrics (ALDE, ADE, ARE, MnRE); Material Validity (relative error to nearest real-world material range); Wall-clock inference time; Throughput; Simulation fidelity through visual inspection of elastodynamic simulations.
256
  Potential Known Risks: | If the model does not work as intended: (1) Inaccurate material property predictions may lead to unrealistic physics simulations with incorrect deformation behavior, affecting Digital Twin fidelity, Sim-2-Real transfer, and design validation workflows; (2) Overestimated stiffness (Young's modulus) may cause objects to appear overly rigid in simulation, while underestimation causes excessive deformation; (3) Incorrect density predictions impact mass distribution, gravitational effects, and momentum in simulations; (4) Invalid Poisson's ratio values could cause simulation instability or non-physical volume changes.
257
- Licensing: | This model is for commercial/non-commercial use under the [NV Open Model License license](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
258
 
259
  ## Privacy
260
 
@@ -275,7 +290,7 @@ Field | Response
275
  :---------------------------------------------------|:----------------------------------
276
  Model Application Field(s): | Robotics; Media & Entertainment; Digital Twins; Design and Engineering
277
  Describe the life critical impact (if present). | Not Applicable
278
- Use Case Restrictions: | Abide by NVIDIA Terms of Use (https://www.nvidia.com/en-us/about-nvidia/terms-of-use/) and [NV Open Model License license](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
279
  Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.
280
 
281
  ## Citation
 
14
 
15
  ## License/Terms of Use:
16
 
17
+ Use of the model is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
18
 
19
+ ## Models at a Glance
20
+ ### Geometry Transformer
21
+ - Predicts per-voxel 2D material latent codes from 3D geometry and multi-view images
22
+ - Consumes aggregated DINOv2 visual features and sparse voxel grids; scales to 32,768 non-empty voxels per object
23
+ - Drives end-to-end material field prediction when paired with MatVAE for decoding into (E, ν, ρ)
24
+
25
+ ### MatVAE (Material VAE)
26
+ - Decodes 2D material latents into physically valid mechanical property triplets (E, ν, ρ)
27
+ - Enforces real-world plausibility learned from MTD using VAE + flow posterior
28
+ - Provides a compact, continuous material manifold used by the Geometry Transformer at inference
29
 
30
  ## Deployment Geography:
31
  Global
 
56
  ### MatVAE (Material VAE)
57
  - Role: Defines the material manifold and decodes latents to (E, ν, ρ)
58
  - Design: 2‑D latent, 3‑layer MLP encoder/decoder (256 hidden), radial normalizing flow posterior
59
+ - Objective: Modified β‑TC‑VAE with KL/Ml/TC weighting and annealing
60
 
61
  Training uses AdamW optimizer with learning rate 1×10^-4, weight decay 5×10^-2 for Geometry Transformer and 1×10^-4 for MatVAE. Geometry Transformer trained with FP16 precision for 200,000 steps with batch size 16 (4 per GPU on 4 A100s), gradient clipping at 1.0, cosine annealing LR scheduler, EMA rate 0.9999, and ℓ2 loss. MatVAE trained with FP32 precision for 850 epochs with batch size 256, gradient clipping at 5.0, cosine annealing to final LR 1×10^-5, modified β-TC-VAE objective with weights α=1.0 (KL), β=2.0 (TC), γ=1.0 (MI), free nats constraint δ=0.1, KL annealing over 200 epochs, and log min-max normalization. Regularization via weight decay, dropout rate 0.05 in MatVAE, capacity constraints to prevent posterior collapse, and radial normalizing flow for flexible posterior distribution.
62
 
 
132
 
133
  **Preferred/Supported Operating System(s):**
134
  * Linux (development and testing performed on Linux environments)
135
+ * Should work on Windows and MacOS but not tested
136
 
137
+ **The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.**
138
 
139
  The VOMP model can be integrated into an AI system by: (1) Using the voxelization pipeline to convert input 3D geometry into 64³ voxel grid, (2) Rendering 150 multi-view images using provided camera sampling, (3) Extracting DINOv2 features and aggregating to voxel centers, (4) Running the Geometry Transformer to predict per-voxel material latent codes, (5) Decoding latents with MatVAE to obtain (E, ν, ρ) values, (6) Transferring voxel materials to target representation via nearest-neighbor interpolation.
140
 
141
  ## Model Version(s):
142
+ v1.0 - Initial research release with Geometry Transformer (85.81M parameters + 304.3M parameters for DINOv2) and MatVAE (403.7K parameters). Trained on GVM dataset (1,664 objects, 8,089 segments, 37M voxels) and MTD dataset (100,562 material triplets).
143
 
144
  ## Training Dataset (Per Model):
145
  ### For Geometry Transformer — GVM
146
+ **Link:** GVM (Geometry with Volumetric Materials) — NVIDIA assets with VLM‑assisted labels (Qwen 2.5 VL‑72B)
147
 
148
  **Data Modality:**
149
  * 3D Geometry (meshes with part segmentation)
 
175
  Training set contains 1,333 objects with 6,477 segments and 28.7M voxels. Data modalities include (i) 3D meshes (triangle meshes in USD format with part-level segmentation), multi-view RGB images (512×512 rendered views), text (English material names), and tabular material properties (E, ν, ρ triplets); (ii) Nature of content is non-personal, proprietary 3D assets (NVIDIA asset libraries) combined with public domain material science data, no copyright-protected creative content, machine-generated annotations (VLM) constrained by human-measured physical properties; (iii) Linguistic characteristics include English material names and semantic object labels. No sensors were used for data collection; 3D assets are human-modeled, material properties are from laboratory measurements (ASTM standard testing), and images are path-traced renders. Average 4.86 segments per object (±11.97 std dev), 21,537 voxels per object (±23,431 std dev). Material property ranges: E [1.0×10^5, 2.8×10^11 Pa], ν [0.16, 0.49], ρ [50, 19,300 kg/m³].
176
 
177
  ## Testing Dataset:
178
+ **Link:** GVM test split - Internal NVIDIA dataset, ABO-500 - [Amazon Berkeley Objects](https://amazon-berkeley-objects.s3.amazonaws.com/index.html)
 
 
179
 
180
  **Data Collection Method by dataset:**
181
  * Hybrid: Human, Automated (VLM-assisted)
 
188
 
189
  ## Evaluation Dataset:
190
  **Link:**
191
+ - Primary: GVM test split (166 objects, 4.9M voxel annotations)
192
  - Secondary: [ABO-500](https://amazon-berkeley-objects.s3.amazonaws.com/index.html) mass estimation benchmark - 500 objects with ground truth mass labels
193
 
194
  **Benchmark Score:**
 
198
  - ARE: Average Relative Error
199
  - MnRE: Minimum Ratio Error
200
 
201
+ On GVM test set (per-object averaged):
202
  - Young's Modulus: ALDE 0.3793 (±0.29), ALRE 0.0409 (±0.04)
203
  - Poisson's Ratio: ADE 0.0241 (±0.01), ARE 0.0818 (±0.03)
204
  - Density: ADE 142.69 kg/m³ (±166.90), ARE 0.0921 (±0.07)
 
208
  - ALDE: 0.631, ADE: 8.433, ARE: 0.887, MnRE: 0.576
209
 
210
  **Data Collection Method by dataset:**
211
+ * Hybrid: Human, Automated (for GVM test split); Human (for ABO-500)
212
 
213
  **Labeling Method by dataset:**
214
+ * Hybrid: Human, Automated (for GVM test split); Human (for ABO-500 ground truth mass)
215
 
216
  **Properties (Quantity, Dataset Descriptions, Sensor(s)):**
217
  GVM test set: 166 objects, 4.9M annotated voxels with ground truth (E, ν, ρ) per voxel. Significantly larger evaluation benchmark than prior works (e.g., NeRF2Physics used 31 points across 11 objects). ABO-500: 500 Amazon objects with measured mass labels, used for density validation via volume integration. Both evaluation sets contain non-personal, non-sensitive 3D geometry and material/mass labels.
 
231
  - Geometry Transformer: 0.0082s
232
  - MatVAE decoding: 0.00032s
233
 
234
+ ## Model Limitations and Responsible Use:
235
+
236
+ (1) Material property predictions should be validated through physical testing before deployment in safety-critical applications (structural engineering, robotics, medical devices), as inaccurate predictions could lead to unsafe designs if used without verification.
237
+ (2) The model is trained on common objects (furniture, architectural elements, vegetation) and may not generalize to specialized materials (advanced composites, metamaterials, biological tissues) outside the training distribution.
238
+ (3) Fixed 64³ voxel resolution limits capture of fine-grained heterogeneous material structures; users should assess whether resolution is sufficient for their application.
239
+ (4) Isotropic material assumption may not hold for anisotropic materials (wood grain, fiber composites); users should verify appropriateness for their use case.
240
+ (5) Model outputs (E, ν, ρ) are compatible with accurate FEM simulators but may require adaptation for fast approximate simulators (XPBD, MPM) that use simulator-specific parameter scales.
241
+
242
  ## Ethical Considerations:
243
 
244
  NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
 
269
  Verified to have met prescribed NVIDIA quality standards: | Yes
270
  Performance Metrics: | Average Log Displacement Error (ALDE), Average Displacement Error (ADE), Average Log Relative Error (ALRE), Average Relative Error (ARE) for Young's modulus, Poisson's ratio, and density; Mass Estimation metrics (ALDE, ADE, ARE, MnRE); Material Validity (relative error to nearest real-world material range); Wall-clock inference time; Throughput; Simulation fidelity through visual inspection of elastodynamic simulations.
271
  Potential Known Risks: | If the model does not work as intended: (1) Inaccurate material property predictions may lead to unrealistic physics simulations with incorrect deformation behavior, affecting Digital Twin fidelity, Sim-2-Real transfer, and design validation workflows; (2) Overestimated stiffness (Young's modulus) may cause objects to appear overly rigid in simulation, while underestimation causes excessive deformation; (3) Incorrect density predictions impact mass distribution, gravitational effects, and momentum in simulations; (4) Invalid Poisson's ratio values could cause simulation instability or non-physical volume changes.
272
+ Licensing: | This model is for commercial/Use of the model is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
273
 
274
  ## Privacy
275
 
 
290
  :---------------------------------------------------|:----------------------------------
291
  Model Application Field(s): | Robotics; Media & Entertainment; Digital Twins; Design and Engineering
292
  Describe the life critical impact (if present). | Not Applicable
293
+ Use Case Restrictions: | Abide by [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
294
  Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.
295
 
296
  ## Citation