metadata
tags:
- model_hub_mixin
- pytorch_model_hub_mixin
library_name: pytorch
pipeline_tag: image-to-3d
license: mit
Visual Geometry Grounded Transformer (VGGT, CVPR 2025) is a feed-forward neural network that directly infers all key 3D attributes of a scene, including extrinsic and intrinsic camera parameters, point maps, depth maps, and 3D point tracks, from one, a few, or hundreds of its views, within seconds.
Paper: VGGT: Visual Geometry Grounded Transformer Code: https://github.com/facebookresearch/vggt Project Page: https://vgg-t.github.io/ Demo: https://huggingface.co/spaces/facebook/vggt