Spaces:
Running
on
Zero
Running
on
Zero
| title: UniSH (Unified Scene & Human Reconstruction) | |
| emoji: 🏃♂️ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 5.12.0 | |
| python_version: "3.10" | |
| app_file: app.py | |
| pinned: false | |
| license: cc-by-nc-4.0 | |
| # UniSH: Unifying Scene and Human Reconstruction in a Feed-Forward Pass | |
| <div align="center"> | |
| Mengfei Li<sup>1</sup>, Peng Li<sup>1</sup>, Zheng Zhang<sup>2</sup>, Jiahao Lu<sup>1</sup>, Chengfeng Zhao<sup>1</sup>, Wei Xue<sup>1</sup>, <br> | |
| Qifeng Liu<sup>1</sup>, Sida Peng<sup>3</sup>, Wenxiao Zhang<sup>1</sup>, Wenhan Luo<sup>1</sup>, Yuan Liu<sup>1†</sup>, Yike Guo<sup>1†</sup> | |
| <sup>1</sup>The Hong Kong University of Science and Technology, <sup>2</sup>Beijing University of Posts and Telecommunications, <sup>3</sup>Zhejiang University | |
| <a href="https://murphylmf.github.io/UniSH/"><img src="https://img.shields.io/badge/Project-Page-8A2BE2" alt="Project Page"></a> | |
| <a href="https://arxiv.org/abs/2601.01222"><img src="https://img.shields.io/badge/arXiv-2601.01222-b31b1b.svg" alt="arXiv"></a> | |
| <a href="https://github.com/murphylmf/UniSH"><img src="https://img.shields.io/badge/GitHub-Code-black.svg" alt="Code"></a> | |
| </div> | |
| ## Abstract | |
| We present UniSH, a unified, feed-forward framework for joint metric-scale 3D scene and human reconstruction. A key challenge in this domain is the scarcity of large-scale, annotated real-world data, forcing a reliance on synthetic datasets. This reliance introduces a significant sim-to-real domain gap, leading to poor generalization, low-fidelity human geometry, and poor alignment on in-the-wild videos. | |
| To address this, we propose an innovative training paradigm that effectively leverages unlabeled in-the-wild data. Our framework bridges strong, disparate priors from scene reconstruction and HMR, and is trained with two core components: (1) a robust distillation strategy to refine human surface details by distilling high-frequency details from an expert depth model, and (2) a two-stage supervision scheme, which first learns coarse localization on synthetic data, then fine-tunes on real data by directly optimizing the geometric correspondence between the SMPL mesh and the human point cloud. This approach enables our feed-forward model to jointly recover high-fidelity scene geometry, human point clouds, camera parameters, and coherent, metric-scale SMPL bodies, all in a single forward pass. Extensive experiments demonstrate that our model achieves state-of-the-art performance on human-centric scene reconstruction and delivers highly competitive results on global human motion estimation, comparing favorably against both optimization-based frameworks and HMR-only methods. | |
| ## Method | |
|  | |
| **The network architecture of UniSH.** | |
| UniSH takes a monocular video as input. The video frames are processed by the **Reconstruction Branch** to predict per-frame camera extrinsics *E*, confidence maps *C*, and pointmaps *P*. Camera intrinsics *K* are derived from the pointmaps. Human crops from the video are fed into the **Human Body Branch** along with *K* to estimate global SMPL shape parameters *β* and per-frame pose parameters *θ<sub>i</sub>*. Features from both branches are processed by **AlignNet** to predict the global scene scale *s* and per-frame SMPL translations *t<sub>i</sub>* for coherent scene and human alignment. | |
| ## Usage | |
| This Space provides an interactive demo for UniSH. | |
| 1. **Upload a Video**: Upload a monocular video containing a human. | |
| 2. **Set Duration**: Choose the duration to process (default: 3 seconds). | |
| 3. **Run Inference**: Click "Run Inference" to generate the 3D reconstruction. | |
| 4. **Visualize**: The result will be displayed in an interactive 3D viewer where you can rotate, pan, and zoom. | |
| ## BibTeX | |
| ```bibtex | |
| @misc{li2026unishunifyingscenehuman, | |
| title={UniSH: Unifying Scene and Human Reconstruction in a Feed-Forward Pass}, | |
| author={Mengfei Li and Peng Li and Zheng Zhang and Jiahao Lu and Chengfeng Zhao and Wei Xue and Qifeng Liu and Sida Peng and Wenxiao Zhang and Wenhan Luo and Yuan Liu and Yike Guo}, | |
| year={2026}, | |
| eprint={2601.01222}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.CV}, | |
| url={https://arxiv.org/abs/2601.01222}, | |
| } | |
| ``` | |
| ## Acknowledgements | |
| This website is licensed under a [Creative Commons Attribution-ShareAlike 4.0 International License](http://creativecommons.org/licenses/by-sa/4.0/). | |
| Template borrowed from [Nerfies](https://github.com/nerfies/nerfies.github.io). | |