--- license: apache-2.0 library_name: diffusers ---

RoboTransfer: Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer

Liu Liu, Xiaofeng Wang, Guosheng Zhao, Keyu Li, Wenkang Qin, Jiaxiong Qiu, Zheng Zhu, Guan Huang, Zhizhong Su

--- ## 🔍 Abstract ![RoboTransfer Pipeline](assets/robotransfer.jpg) **RoboTransfer** is a novel diffusion-based video generation framework tailored for robotic visual policy transfer. Unlike conventional approaches, RoboTransfer introduces **geometry-aware synthesis** by injecting **depth and normal priors**, ensuring multi-view consistency across dynamic robotic scenes. The method further supports **explicit control over scene components**, such as **background editing**, **object identity swapping**, and **motion specification**, offering a fine-grained video generation pipeline that benefits embodied learning. --- ## 🧠 Key Features - 📐 **Geometry-Consistent Diffusion**: Injects global 3D cues (depth, normal) and cross-view interactions for multi-view realism. - 🧩 **Scene Component Control**: Enables manipulation of object attributes (pose, identity) and background features. - 🔁 **Cross-View Conditioning**: Learns representations from multiple camera views with spatial correspondence. - 🤖 **Robotic Policy Transfer**: Facilitates domain adaptation by generating synthetic training data in target domains. --- ## 📖 BibTeX ```bibtex @article{liu2025robotransfer, title={RoboTransfer: Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer}, author={Liu, Liu and Wang, Xiaofeng and Zhao, Guosheng and Li, Keyu and Qin, Wenkang and Qiu, Jiaxiong and Zhu, Zheng and Huang, Guan and Su, Zhizhong}, journal={arXiv preprint arXiv:2505.23171}, year={2025} }