|
|
--- |
|
|
tags: |
|
|
- 3d-object-detection |
|
|
- open-vocabulary |
|
|
- point-cloud |
|
|
datasets: |
|
|
- lvis |
|
|
- sunrgbd |
|
|
- scannet |
|
|
pipeline_tag: object-detection |
|
|
--- |
|
|
|
|
|
# ImOV3D: Learning Open-Vocabulary Point Clouds 3D Object Detection from Only 2D Images |
|
|
|
|
|
**NeurIPS 2024** | [Paper](https://arxiv.org/abs/2410.24001) | [Project Page](https://yangtiming.github.io/ImOV3D_Page/) | [Code](https://github.com/yangtiming/ImOV3D) |
|
|
|
|
|
> Timing Yang\*, Yuanliang Ju\*, Li Yi |
|
|
> Shanghai Qi Zhi Institute, IIIS Tsinghua University, Shanghai AI Lab |
|
|
|
|
|
## Overview |
|
|
|
|
|
ImOV3D is the **first open-vocabulary 3D object detector trained entirely from 2D images** — no 3D ground truth required. It bridges the 2D-3D modality gap via flexible modality conversion: lifting 2D images to pseudo point clouds (monocular depth estimation) and rendering point clouds back to pseudo images (ControlNet). This creates a unified image-PC representation for training a multimodal 3D detector. |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@article{yang2024imov3d, |
|
|
title={ImOV3D: Learning Open Vocabulary Point Clouds 3D Object Detection from Only 2D Images}, |
|
|
author={Yang, Timing and Ju, Yuanliang and Yi, Li}, |
|
|
journal={Advances in Neural Information Processing Systems}, |
|
|
volume={37}, |
|
|
pages={141261--141291}, |
|
|
year={2024} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Contact |
|
|
|
|
|
Timing Yang: timingya@usc.edu · Yuanliang Ju: yuanliang.ju@mail.utoronto.ca |
|
|
|