TUN3D / README.md
nielsr's picture
nielsr HF Staff
Improve model card: Add pipeline tag, paper, project page, and code links
f3fb0c4 verified
|
raw
history blame
3.25 kB
metadata
license: cc-by-nc-4.0
pipeline_tag: image-to-3d

TUN3D: Towards Real-World Scene Understanding from Unposed Images

This repository contains an implementation of TUN3D, a method for real-world indoor scene understanding from multi-view images.

TUN3D works with GT point clouds, posed images (with known camera poses), or fully unposed image sets (without poses or depths).

Abstract

Layout estimation and 3D object detection are two fundamental tasks in indoor scene understanding. When combined, they enable the creation of a compact yet semantically rich spatial representation of a scene. Existing approaches typically rely on point cloud input, which poses a major limitation since most consumer cameras lack depth sensors and visual-only data remains far more common. We address this issue with TUN3D, the first method that tackles joint layout estimation and 3D object detection in real scans, given multi-view images as input, and does not require ground-truth camera poses or depth supervision. Our approach builds on a lightweight sparse-convolutional backbone and employs two dedicated heads: one for 3D object detection and one for layout estimation, leveraging a novel and effective parametric wall representation. Extensive experiments show that TUN3D achieves state-of-the-art performance across three challenging scene understanding benchmarks: (i) using ground-truth point clouds, (ii) using posed images, and (iii) using unposed images. While performing on par with specialized 3D object detection methods, TUN3D significantly advances layout estimation, setting a new benchmark in holistic indoor scene understanding.

Installation and Usage

The repository is divided into two modules: Reconstruction and Recognition. Each module requires a separate installation of dependencies. Please refer to the GitHub repository for detailed installation instructions, data preprocessing steps, and guidance on running the model.

Predictions example

ScanNet

S3DIS

Citation

If you find this work useful for your research, please cite our paper:

@misc{konushin2025tun3drealworldsceneunderstanding,
      title={TUN3D: Towards Real-World Scene Understanding from Unposed Images}, 
      author={Anton Konushin and Nikita Drozdov and Bulat Gabdullin and Alexey Zakharov and Anna Vorontsova and Danila Rukhovich and Maksim Kolodiazhnyi},
      year={2025},
      eprint={2509.21388},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2509.21388}, 
}