Improve model card: Add pipeline tag, paper, project page, and code links

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +52 -3
README.md CHANGED
@@ -1,3 +1,52 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ pipeline_tag: image-to-3d
4
+ ---
5
+
6
+ # TUN3D: Towards Real-World Scene Understanding from Unposed Images
7
+
8
+ This repository contains an implementation of TUN3D, a method for real-world indoor scene understanding from multi-view images.
9
+
10
+ * **Paper:** [TUN3D: Towards Real-World Scene Understanding from Unposed Images](https://huggingface.co/papers/2509.21388)
11
+ * **Project Page:** https://bulatko.github.io/tun3d/
12
+ * **Code:** https://github.com/col14m/TUN3D
13
+
14
+ <div align="center">
15
+ <video src="https://github.com/user-attachments/assets/8644a6d7-3a4e-4b1b-b58e-023276ea12ee"> </video>
16
+ <p><i>TUN3D works with GT point clouds, posed images (with known camera poses), or fully unposed image sets (without poses or depths).</i></p>
17
+ </div>
18
+
19
+ ## Abstract
20
+ Layout estimation and 3D object detection are two fundamental tasks in indoor scene understanding. When combined, they enable the creation of a compact yet semantically rich spatial representation of a scene. Existing approaches typically rely on point cloud input, which poses a major limitation since most consumer cameras lack depth sensors and visual-only data remains far more common. We address this issue with TUN3D, the first method that tackles joint layout estimation and 3D object detection in real scans, given multi-view images as input, and does not require ground-truth camera poses or depth supervision. Our approach builds on a lightweight sparse-convolutional backbone and employs two dedicated heads: one for 3D object detection and one for layout estimation, leveraging a novel and effective parametric wall representation. Extensive experiments show that TUN3D achieves state-of-the-art performance across three challenging scene understanding benchmarks: (i) using ground-truth point clouds, (ii) using posed images, and (iii) using unposed images. While performing on par with specialized 3D object detection methods, TUN3D significantly advances layout estimation, setting a new benchmark in holistic indoor scene understanding.
21
+
22
+ ## Installation and Usage
23
+ The repository is divided into two modules: `Reconstruction` and `Recognition`. Each module requires a separate installation of dependencies. Please refer to the [GitHub repository](https://github.com/col14m/TUN3D) for detailed installation instructions, data preprocessing steps, and guidance on running the model.
24
+
25
+ ## Predictions example
26
+
27
+ #### ScanNet
28
+
29
+ <p float="left">
30
+ <img src="https://github.com/col14m/TUN3D/raw/main/recognition/imgs/predictions_scannet.png" width="900" height="396" />
31
+ </p>
32
+
33
+ #### S3DIS
34
+ <p float="left">
35
+ <img src="https://github.com/col14m/TUN3D/raw/main/recognition/imgs/predictions_s3dis.png" width="900" height="396" />
36
+ </p>
37
+
38
+ ## Citation
39
+
40
+ If you find this work useful for your research, please cite our paper:
41
+
42
+ ```bibtex
43
+ @misc{konushin2025tun3drealworldsceneunderstanding,
44
+ title={TUN3D: Towards Real-World Scene Understanding from Unposed Images},
45
+ author={Anton Konushin and Nikita Drozdov and Bulat Gabdullin and Alexey Zakharov and Anna Vorontsova and Danila Rukhovich and Maksim Kolodiazhnyi},
46
+ year={2025},
47
+ eprint={2509.21388},
48
+ archivePrefix={arXiv},
49
+ primaryClass={cs.CV},
50
+ url={https://arxiv.org/abs/2509.21388},
51
+ }
52
+ ```