3D-R1 / README.md
nielsr's picture
nielsr HF Staff
Add model card
131be7e verified
|
raw
history blame
1.08 kB
metadata
pipeline_tag: image-text-to-text
library_name: transformers
license: apache-2.0

3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding

3D-R1 is a foundation model designed to enhance the reasoning capabilities of 3D Vision-Language Models (VLMs) for unified scene understanding. It addresses limitations in existing 3D VLMs by leveraging a high-quality synthetic dataset (Scene-30K), incorporating RLHF policies with novel reward functions (perception, semantic similarity, format), and introducing a dynamic view selection strategy. This approach aims to improve robust reasoning and generalization in 3D scene understanding.

The model was presented in the paper:

For more details, visit the project page and code repository: