UniFuture / README.md
nielsr's picture
nielsr HF Staff
Improve model card with metadata and paper links
1ff0f5e verified
|
raw
history blame
2.07 kB
metadata
license: apache-2.0
pipeline_tag: image-to-video
tags:
  - autonomous-driving
  - world-model
  - computer-vision
  - 4D

UniFuture: A 4D Driving World Model for Future Generation and Perception

UniFuture is a unified 4D Driving World Model designed to simulate the dynamic evolution of the 3D physical world. Unlike existing driving world models that focus solely on 2D pixel-level video generation or static perception, UniFuture bridges appearance and geometry to construct a holistic 4D representation.

Introduction

UniFuture treats future RGB images and depth maps as coupled projections of the same 4D reality and models them jointly within a single framework. To achieve this, it introduces two key components:

  • Dual-Latent Sharing (DLS): A scheme that maps visual and geometric modalities into a shared spatio-temporal latent space, implicitly entangling texture with structure.
  • Multi-scale Latent Interaction (MLI): A mechanism that enforces bidirectional consistency: geometry constrains visual synthesis to prevent structural hallucinations, while visual semantics refine geometric estimation.

During inference, UniFuture can forecast high-fidelity, geometrically consistent 4D scene sequences (image-depth pairs) from a single current frame.

Citation

If you find this work useful in your research, please consider citing:

@inproceedings{liang2026UniFuture,
  title={UniFuture: A 4D Driving World Model for Future Generation and Perception},
  author={Liang, Dingkang and Zhang, Dingyuan and Zhou, Xin and Tu, Sifan and Feng, Tianrui and Li, Xiaofan and Zhang, Yumeng and Du, Mingyang and Tan, Xiao and Bai, Xiang},
  booktitle={IEEE International Conference on Robotics and Automation},
  year={2026}
}