nvidia
/

PhysicalAI-Simulation-VoMP-Model

Safetensors

Model card Files Files and versions

xet

Community

donglaix commited on 26 days ago

Commit

ccf24c1

verified ·

1 Parent(s): 624ee0f

Update README.md

Browse files

Files changed (1) hide show

README.md +31 -16

README.md CHANGED Viewed

@@ -14,9 +14,18 @@ This model is ready for commercial/non-commercial use.
 ## License/Terms of Use:
-This model is released under the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
-Important Note: If you bypass, disable, reduce the efficacy of, or circumvent any technical limitation, safety guardrail or associated safety guardrail hyperparameter, encryption, security, digital rights management, or authentication mechanism contained in the Model, your rights under NVIDIA Open Model License Agreement will automatically terminate.
 ## Deployment Geography:
 Global
@@ -47,7 +56,7 @@ VoMP: Predicting Volumetric Mechanical Property Fields. Rishit Dagli, Donglai Xi
 ### MatVAE (Material VAE)
 - Role: Defines the material manifold and decodes latents to (E, ν, ρ)
 - Design: 2‑D latent, 3‑layer MLP encoder/decoder (256 hidden), radial normalizing flow posterior
-- Objective: Modified β‑TC‑VAE with KL/MI/TC weighting and annealing
 Training uses AdamW optimizer with learning rate 1×10^-4, weight decay 5×10^-2 for Geometry Transformer and 1×10^-4 for MatVAE. Geometry Transformer trained with FP16 precision for 200,000 steps with batch size 16 (4 per GPU on 4 A100s), gradient clipping at 1.0, cosine annealing LR scheduler, EMA rate 0.9999, and ℓ2 loss. MatVAE trained with FP32 precision for 850 epochs with batch size 256, gradient clipping at 5.0, cosine annealing to final LR 1×10^-5, modified β-TC-VAE objective with weights α=1.0 (KL), β=2.0 (TC), γ=1.0 (MI), free nats constraint δ=0.1, KL annealing over 200 epochs, and log min-max normalization. Regularization via weight decay, dropout rate 0.05 in MatVAE, capacity constraints to prevent posterior collapse, and radial normalizing flow for flexible posterior distribution.
@@ -123,18 +132,18 @@ Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated sys
 **Preferred/Supported Operating System(s):**
 * Linux (development and testing performed on Linux environments)
-* Should work on Windows and macOS but not tested
-The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
 The VOMP model can be integrated into an AI system by: (1) Using the voxelization pipeline to convert input 3D geometry into 64³ voxel grid, (2) Rendering 150 multi-view images using provided camera sampling, (3) Extracting DINOv2 features and aggregating to voxel centers, (4) Running the Geometry Transformer to predict per-voxel material latent codes, (5) Decoding latents with MatVAE to obtain (E, ν, ρ) values, (6) Transferring voxel materials to target representation via nearest-neighbor interpolation.
 ## Model Version(s):
-v1.0
 ## Training Dataset (Per Model):
 ### For Geometry Transformer — GVM
-**Link:** [GVM (Geometry with Volumetric Materials)](https://huggingface.co/datasets/nvidia/VoMP-GVM-Dataset) — NVIDIA assets with VLM‑assisted labels (Qwen 2.5 VL‑72B)
 **Data Modality:**
 * 3D Geometry (meshes with part segmentation)
@@ -166,9 +175,7 @@ v1.0
 Training set contains 1,333 objects with 6,477 segments and 28.7M voxels. Data modalities include (i) 3D meshes (triangle meshes in USD format with part-level segmentation), multi-view RGB images (512×512 rendered views), text (English material names), and tabular material properties (E, ν, ρ triplets); (ii) Nature of content is non-personal, proprietary 3D assets (NVIDIA asset libraries) combined with public domain material science data, no copyright-protected creative content, machine-generated annotations (VLM) constrained by human-measured physical properties; (iii) Linguistic characteristics include English material names and semantic object labels. No sensors were used for data collection; 3D assets are human-modeled, material properties are from laboratory measurements (ASTM standard testing), and images are path-traced renders. Average 4.86 segments per object (±11.97 std dev), 21,537 voxels per object (±23,431 std dev). Material property ranges: E [1.0×10^5, 2.8×10^11 Pa], ν [0.16, 0.49], ρ [50, 19,300 kg/m³].
 ## Testing Dataset:
-**Link:**
-- [GVM test split](https://huggingface.co/datasets/nvidia/VoMP-GVM-Dataset)
-- [ABO-500](https://amazon-berkeley-objects.s3.amazonaws.com/index.html) - Amazon Berkeley Objects
 **Data Collection Method by dataset:**
 * Hybrid: Human, Automated (VLM-assisted)
@@ -181,7 +188,7 @@ Test set contains 166 objects with 1,060 segments and 4.9M voxels (13.1% of tota
 ## Evaluation Dataset:
 **Link:**
-- Primary: [GVM test split](https://huggingface.co/datasets/nvidia/VoMP-GVM-Dataset) (166 objects, 4.9M voxel annotations)
 - Secondary: [ABO-500](https://amazon-berkeley-objects.s3.amazonaws.com/index.html) mass estimation benchmark - 500 objects with ground truth mass labels
 **Benchmark Score:**
@@ -191,7 +198,7 @@ Test set contains 166 objects with 1,060 segments and 4.9M voxels (13.1% of tota
 - ARE: Average Relative Error
 - MnRE: Minimum Ratio Error
-On [GVM test set](https://huggingface.co/datasets/nvidia/VoMP-GVM-Dataset) (per-object averaged):
 - Young's Modulus: ALDE 0.3793 (±0.29), ALRE 0.0409 (±0.04)
 - Poisson's Ratio: ADE 0.0241 (±0.01), ARE 0.0818 (±0.03)
 - Density: ADE 142.69 kg/m³ (±166.90), ARE 0.0921 (±0.07)
@@ -201,10 +208,10 @@ On ABO-500 mass estimation benchmark:
 - ALDE: 0.631, ADE: 8.433, ARE: 0.887, MnRE: 0.576
 **Data Collection Method by dataset:**
-* Hybrid: Human, Automated (for [GVM test set](https://huggingface.co/datasets/nvidia/VoMP-GVM-Dataset)); Human (for ABO-500)
 **Labeling Method by dataset:**
-* Hybrid: Human, Automated (for [GVM test set](https://huggingface.co/datasets/nvidia/VoMP-GVM-Dataset)); Human (for ABO-500 ground truth mass)
 **Properties (Quantity, Dataset Descriptions, Sensor(s)):**
 GVM test set: 166 objects, 4.9M annotated voxels with ground truth (E, ν, ρ) per voxel. Significantly larger evaluation benchmark than prior works (e.g., NeRF2Physics used 31 points across 11 objects). ABO-500: 500 Amazon objects with measured mass labels, used for density validation via volume integration. Both evaluation sets contain non-personal, non-sensitive 3D geometry and material/mass labels.
@@ -224,6 +231,14 @@ GVM test set: 166 objects, 4.9M annotated voxels with ground truth (E, ν, ρ) p
 - Geometry Transformer: 0.0082s
 - MatVAE decoding: 0.00032s
 ## Ethical Considerations:
 NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
@@ -254,7 +269,7 @@ Technical Limitations & Mitigation:
 Verified to have met prescribed NVIDIA quality standards:  |  Yes
 Performance Metrics:                                                                                   |  Average Log Displacement Error (ALDE), Average Displacement Error (ADE), Average Log Relative Error (ALRE), Average Relative Error (ARE) for Young's modulus, Poisson's ratio, and density; Mass Estimation metrics (ALDE, ADE, ARE, MnRE); Material Validity (relative error to nearest real-world material range); Wall-clock inference time; Throughput; Simulation fidelity through visual inspection of elastodynamic simulations.
 Potential Known Risks:                                                                                 |  If the model does not work as intended: (1) Inaccurate material property predictions may lead to unrealistic physics simulations with incorrect deformation behavior, affecting Digital Twin fidelity, Sim-2-Real transfer, and design validation workflows; (2) Overestimated stiffness (Young's modulus) may cause objects to appear overly rigid in simulation, while underestimation causes excessive deformation; (3) Incorrect density predictions impact mass distribution, gravitational effects, and momentum in simulations; (4) Invalid Poisson's ratio values could cause simulation instability or non-physical volume changes.
-Licensing:                                                                                             | This model is for commercial/non-commercial use under the [NV Open Model License license](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
 ## Privacy
@@ -275,7 +290,7 @@ Field                                               |  Response
 :---------------------------------------------------|:----------------------------------
 Model Application Field(s):                               |  Robotics; Media & Entertainment; Digital Twins; Design and Engineering
 Describe the life critical impact (if present).   |  Not Applicable
-Use Case Restrictions:                              |  Abide by NVIDIA Terms of Use (https://www.nvidia.com/en-us/about-nvidia/terms-of-use/) and [NV Open Model License license](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
 Model and dataset restrictions:            |  The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.
 ## Citation

 ## License/Terms of Use:
+Use of the model is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
+## Models at a Glance
+### Geometry Transformer
+- Predicts per-voxel 2D material latent codes from 3D geometry and multi-view images
+- Consumes aggregated DINOv2 visual features and sparse voxel grids; scales to 32,768 non-empty voxels per object
+- Drives end-to-end material field prediction when paired with MatVAE for decoding into (E, ν, ρ)
+### MatVAE (Material VAE)
+- Decodes 2D material latents into physically valid mechanical property triplets (E, ν, ρ)
+- Enforces real-world plausibility learned from MTD using VAE + flow posterior
+- Provides a compact, continuous material manifold used by the Geometry Transformer at inference
 ## Deployment Geography:
 Global
 ### MatVAE (Material VAE)
 - Role: Defines the material manifold and decodes latents to (E, ν, ρ)
 - Design: 2‑D latent, 3‑layer MLP encoder/decoder (256 hidden), radial normalizing flow posterior
+- Objective: Modified β‑TC‑VAE with KL/Ml/TC weighting and annealing
 Training uses AdamW optimizer with learning rate 1×10^-4, weight decay 5×10^-2 for Geometry Transformer and 1×10^-4 for MatVAE. Geometry Transformer trained with FP16 precision for 200,000 steps with batch size 16 (4 per GPU on 4 A100s), gradient clipping at 1.0, cosine annealing LR scheduler, EMA rate 0.9999, and ℓ2 loss. MatVAE trained with FP32 precision for 850 epochs with batch size 256, gradient clipping at 5.0, cosine annealing to final LR 1×10^-5, modified β-TC-VAE objective with weights α=1.0 (KL), β=2.0 (TC), γ=1.0 (MI), free nats constraint δ=0.1, KL annealing over 200 epochs, and log min-max normalization. Regularization via weight decay, dropout rate 0.05 in MatVAE, capacity constraints to prevent posterior collapse, and radial normalizing flow for flexible posterior distribution.
 **Preferred/Supported Operating System(s):**
 * Linux (development and testing performed on Linux environments)
+* Should work on Windows and MacOS but not tested
+**The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.**
 The VOMP model can be integrated into an AI system by: (1) Using the voxelization pipeline to convert input 3D geometry into 64³ voxel grid, (2) Rendering 150 multi-view images using provided camera sampling, (3) Extracting DINOv2 features and aggregating to voxel centers, (4) Running the Geometry Transformer to predict per-voxel material latent codes, (5) Decoding latents with MatVAE to obtain (E, ν, ρ) values, (6) Transferring voxel materials to target representation via nearest-neighbor interpolation.
 ## Model Version(s):
+v1.0 - Initial research release with Geometry Transformer (85.81M parameters + 304.3M parameters for DINOv2) and MatVAE (403.7K parameters). Trained on GVM dataset (1,664 objects, 8,089 segments, 37M voxels) and MTD dataset (100,562 material triplets).
 ## Training Dataset (Per Model):
 ### For Geometry Transformer — GVM
+**Link:** GVM (Geometry with Volumetric Materials) — NVIDIA assets with VLM‑assisted labels (Qwen 2.5 VL‑72B)
 **Data Modality:**
 * 3D Geometry (meshes with part segmentation)
 Training set contains 1,333 objects with 6,477 segments and 28.7M voxels. Data modalities include (i) 3D meshes (triangle meshes in USD format with part-level segmentation), multi-view RGB images (512×512 rendered views), text (English material names), and tabular material properties (E, ν, ρ triplets); (ii) Nature of content is non-personal, proprietary 3D assets (NVIDIA asset libraries) combined with public domain material science data, no copyright-protected creative content, machine-generated annotations (VLM) constrained by human-measured physical properties; (iii) Linguistic characteristics include English material names and semantic object labels. No sensors were used for data collection; 3D assets are human-modeled, material properties are from laboratory measurements (ASTM standard testing), and images are path-traced renders. Average 4.86 segments per object (±11.97 std dev), 21,537 voxels per object (±23,431 std dev). Material property ranges: E [1.0×10^5, 2.8×10^11 Pa], ν [0.16, 0.49], ρ [50, 19,300 kg/m³].
 ## Testing Dataset:
+**Link:** GVM test split - Internal NVIDIA dataset, ABO-500 - [Amazon Berkeley Objects](https://amazon-berkeley-objects.s3.amazonaws.com/index.html)
 **Data Collection Method by dataset:**
 * Hybrid: Human, Automated (VLM-assisted)
 ## Evaluation Dataset:
 **Link:**
+- Primary: GVM test split (166 objects, 4.9M voxel annotations)
 - Secondary: [ABO-500](https://amazon-berkeley-objects.s3.amazonaws.com/index.html) mass estimation benchmark - 500 objects with ground truth mass labels
 **Benchmark Score:**
 - ARE: Average Relative Error
 - MnRE: Minimum Ratio Error
+On GVM test set (per-object averaged):
 - Young's Modulus: ALDE 0.3793 (±0.29), ALRE 0.0409 (±0.04)
 - Poisson's Ratio: ADE 0.0241 (±0.01), ARE 0.0818 (±0.03)
 - Density: ADE 142.69 kg/m³ (±166.90), ARE 0.0921 (±0.07)
 - ALDE: 0.631, ADE: 8.433, ARE: 0.887, MnRE: 0.576
 **Data Collection Method by dataset:**
+* Hybrid: Human, Automated (for GVM test split); Human (for ABO-500)
 **Labeling Method by dataset:**
+* Hybrid: Human, Automated (for GVM test split); Human (for ABO-500 ground truth mass)
 **Properties (Quantity, Dataset Descriptions, Sensor(s)):**
 GVM test set: 166 objects, 4.9M annotated voxels with ground truth (E, ν, ρ) per voxel. Significantly larger evaluation benchmark than prior works (e.g., NeRF2Physics used 31 points across 11 objects). ABO-500: 500 Amazon objects with measured mass labels, used for density validation via volume integration. Both evaluation sets contain non-personal, non-sensitive 3D geometry and material/mass labels.
 - Geometry Transformer: 0.0082s
 - MatVAE decoding: 0.00032s
+## Model Limitations and Responsible Use:
+(1) Material property predictions should be validated through physical testing before deployment in safety-critical applications (structural engineering, robotics, medical devices), as inaccurate predictions could lead to unsafe designs if used without verification.
+(2) The model is trained on common objects (furniture, architectural elements, vegetation) and may not generalize to specialized materials (advanced composites, metamaterials, biological tissues) outside the training distribution.
+(3) Fixed 64³ voxel resolution limits capture of fine-grained heterogeneous material structures; users should assess whether resolution is sufficient for their application.
+(4) Isotropic material assumption may not hold for anisotropic materials (wood grain, fiber composites); users should verify appropriateness for their use case.
+(5) Model outputs (E, ν, ρ) are compatible with accurate FEM simulators but may require adaptation for fast approximate simulators (XPBD, MPM) that use simulator-specific parameter scales.
 ## Ethical Considerations:
 NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
 Verified to have met prescribed NVIDIA quality standards:  |  Yes
 Performance Metrics:                                                                                   |  Average Log Displacement Error (ALDE), Average Displacement Error (ADE), Average Log Relative Error (ALRE), Average Relative Error (ARE) for Young's modulus, Poisson's ratio, and density; Mass Estimation metrics (ALDE, ADE, ARE, MnRE); Material Validity (relative error to nearest real-world material range); Wall-clock inference time; Throughput; Simulation fidelity through visual inspection of elastodynamic simulations.
 Potential Known Risks:                                                                                 |  If the model does not work as intended: (1) Inaccurate material property predictions may lead to unrealistic physics simulations with incorrect deformation behavior, affecting Digital Twin fidelity, Sim-2-Real transfer, and design validation workflows; (2) Overestimated stiffness (Young's modulus) may cause objects to appear overly rigid in simulation, while underestimation causes excessive deformation; (3) Incorrect density predictions impact mass distribution, gravitational effects, and momentum in simulations; (4) Invalid Poisson's ratio values could cause simulation instability or non-physical volume changes.
+Licensing:                                                                                             | This model is for commercial/Use of the model is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
 ## Privacy
 :---------------------------------------------------|:----------------------------------
 Model Application Field(s):                               |  Robotics; Media & Entertainment; Digital Twins; Design and Engineering
 Describe the life critical impact (if present).   |  Not Applicable
+Use Case Restrictions:                              |  Abide by [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
 Model and dataset restrictions:            |  The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.
 ## Citation