amolharsh/Ground3D_Dataset
Updated • 8.32k
Ground3D-LMM is a large multimodal model for fine-grained 3D point grounding and spatial reasoning in indoor scenes. It produces per-point segmentation masks alongside grounded natural-language answers across 8 sub-tasks on ScanNet and ScanNet++.
This checkpoint is the 3D-only variant: 3D point cloud only (no RGB frames). Point-only variant — no RGB frames required.
pytorch_model.pth (full PyTorch state dict, ~11.7 GB)from huggingface_hub import snapshot_download
ckpt_dir = snapshot_download(repo_id="amolharsh/Ground3D-LMM-4B-3D")
# checkpoint at: f"{ckpt_dir}/pytorch_model.pth"
CKPT="$(huggingface-cli download amolharsh/Ground3D-LMM-4B-3D --quiet)/pytorch_model.pth"
PYTHONPATH=. python tools/test.py configs/ground3dlmm_eval_ground3d_scannet.py "$CKPT" --work-dir work_dirs/quickstart
See the GitHub README for full inference and training instructions.
Apache-2.0
Base model
Qwen/Qwen3-VL-4B-Instruct