nvpanoptix-3d / README.md
Hien Mai
Create README.md
3ea1efa verified
|
raw
history blame
9.14 kB

Description

3D Panoptic Reconstruction model reconstructs complete 3D indoor scenes from single RGB images, simultaneously performing 2D panoptic segmentation, depth estimation, 3D scene reconstruction, and 3D panoptic segmentation. Built upon Uni-3D (ICCV 2023) baseline architecture, this model enhances 3D understanding by replacing the backbone with VGGT (Visual Geometry Grounded Transformer) and integrating multi-plane occupancy-aware lifting from BUOL (CVPR 2023) for improved 3D scene re-projection. The model reconstructs complete 3D scenes with both object instances (things) and scene layout (stuff) in a unified framework.

This model is ready for non-commercial use.

License/Terms of Use

GOVERNING TERMS: Use of this model is governed by NVIDIA License.  Additional Information:  Apache-2.0 License for https://github.com/mlpc-ucsd/Uni-3D?tab=readme-ov-file; https://github.com/facebookresearch/vggt/blob/main/LICENSE.txt for https://github.com/facebookresearch/vggt.

Deployment Geography

Global

Use Case

This model is intended for researchers and developers building 3D scene understanding applications for indoor environments, including robotics navigation, augmented reality, virtual reality, and architectural visualization.

Release Date

Hugging Face: 11/25/2025 via https://huggingface.co/nvidia/3d_panoptic_reconstruction

References

Model Architecture

Architecture Type: Two-Stage Architecture (Transformer + Sparse Convolutional Network)

Network Architecture:

  • 2D Stage: Transformer-based (VGGT Backbone + Mask2Former-style Decoder)

  • 3D Stage: Sparse 3D CNN Frustum Decoder

  • Number of parameters: 1.4*10^9

  • This model was developed based on: Uni-3D (ICCV 2023) with VGGT backbone replacement and BUOL occupancy-aware lifting integration.