TesserAct: Learning 4D Embodied World Models

Haoyu Zhen*, Qiao Sun*, Hongxin Zhang, Junyan Li, Siyuan Zhou, Yilun Du, Chuang Gan

Paper PDF | Project Page | Model on Hugging Face | Code

We propose TesserAct, the 4D Embodied World Model, which takes input images and text instruction to generate RGB, depth, and normal videos, reconstructing a 4D scene and predicting actions.

Downloads last month: -

Paper for anyeZHY/tesseract

TesserAct: Learning 4D Embodied World Models

Paper • 2504.20995 • Published Apr 29, 2025 • 22