Improve model card with metadata and paper links

Hi! I'm Niels, part of the community science team at Hugging Face. I'm opening this PR to improve the model card for UniFuture. I've added:
- Relevant metadata (license, pipeline tag, and tags).
- Links to the paper, project page, and GitHub repository.
- A summary of the model's architecture and the proper BibTeX citation.

This will make the model more discoverable and provide users with the necessary context and resources.

Files changed (1) hide show

README.md +38 -7

README.md CHANGED Viewed

@@ -1,7 +1,38 @@
----
-license: apache-2.0
----
-UniFuture

+---
+license: apache-2.0
+pipeline_tag: image-to-video
+tags:
+- autonomous-driving
+- world-model
+- computer-vision
+- 4D
+---
+# UniFuture: A 4D Driving World Model for Future Generation and Perception
+UniFuture is a unified 4D Driving World Model designed to simulate the dynamic evolution of the 3D physical world. Unlike existing driving world models that focus solely on 2D pixel-level video generation or static perception, UniFuture bridges appearance and geometry to construct a holistic 4D representation.
+- **Paper:** [UniFuture: A 4D Driving World Model for Future Generation and Perception](https://arxiv.org/abs/2503.13587)
+- **Project Page:** [https://dk-liang.github.io/UniFuture/](https://dk-liang.github.io/UniFuture/)
+- **Repository:** [https://github.com/dk-liang/unifuture](https://github.com/dk-liang/unifuture)
+## Introduction
+UniFuture treats future RGB images and depth maps as coupled projections of the same 4D reality and models them jointly within a single framework. To achieve this, it introduces two key components:
+- **Dual-Latent Sharing (DLS):** A scheme that maps visual and geometric modalities into a shared spatio-temporal latent space, implicitly entangling texture with structure.
+- **Multi-scale Latent Interaction (MLI):** A mechanism that enforces bidirectional consistency: geometry constrains visual synthesis to prevent structural hallucinations, while visual semantics refine geometric estimation.
+During inference, UniFuture can forecast high-fidelity, geometrically consistent 4D scene sequences (image-depth pairs) from a single current frame.
+## Citation
+If you find this work useful in your research, please consider citing:
+```bibtex
+@inproceedings{liang2026UniFuture,
+  title={UniFuture: A 4D Driving World Model for Future Generation and Perception},
+  author={Liang, Dingkang and Zhang, Dingyuan and Zhou, Xin and Tu, Sifan and Feng, Tianrui and Li, Xiaofan and Zhang, Yumeng and Du, Mingyang and Tan, Xiao and Bai, Xiang},
+  booktitle={IEEE International Conference on Robotics and Automation},
+  year={2026}
+}
+```