Ground3D-LMM-4B (3D-only)

Ground3D-LMM is a large multimodal model for fine-grained 3D point grounding and spatial reasoning in indoor scenes. It produces per-point segmentation masks alongside grounded natural-language answers across 8 sub-tasks on ScanNet and ScanNet++.

This checkpoint is the 3D-only variant: 3D point cloud only (no RGB frames). Point-only variant — no RGB frames required.

Backbone: Qwen3-VL-4B (4.47B params, ~400M LoRA-trainable)
3D encoder: SpConvUNet (from UniSeg3D)
Decoder: Mask2Former-style with 2000 learned queries
Training data: Ground3D Dataset (2.47M Q/A pairs)
Code: https://github.com/amolharsh/Ground3D-LMM
File: pytorch_model.pth (full PyTorch state dict, ~11.7 GB)

Quick start

from huggingface_hub import snapshot_download
ckpt_dir = snapshot_download(repo_id="amolharsh/Ground3D-LMM-4B-3D")
# checkpoint at: f"{ckpt_dir}/pytorch_model.pth"

CKPT="$(huggingface-cli download amolharsh/Ground3D-LMM-4B-3D --quiet)/pytorch_model.pth"
PYTHONPATH=. python tools/test.py configs/ground3dlmm_eval_ground3d_scannet.py "$CKPT" --work-dir work_dirs/quickstart

See the GitHub README for full inference and training instructions.

License

Apache-2.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for amolharsh/Ground3D-LMM-4B-3D

Base model

Qwen/Qwen3-VL-4B-Instruct

Finetuned

(332)

this model

amolharsh
/

Ground3D-LMM-4B-3D

Ground3D-LMM-4B (3D-only)

Quick start

License

Model tree for amolharsh/Ground3D-LMM-4B-3D

Dataset used to train amolharsh/Ground3D-LMM-4B-3D