Update README.md
Browse files
README.md
CHANGED
|
@@ -3,19 +3,84 @@ license: apache-2.0
|
|
| 3 |
library_name: diffusers
|
| 4 |
---
|
| 5 |
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
}
|
| 21 |
-
```
|
|
|
|
| 3 |
library_name: diffusers
|
| 4 |
---
|
| 5 |
|
| 6 |
+
# RoboTransfer: Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer
|
| 7 |
+
|
| 8 |
+
<div align="center" class="authors">
|
| 9 |
+
Liu Liu,
|
| 10 |
+
Xiaofeng Wang,
|
| 11 |
+
Guosheng Zhao,
|
| 12 |
+
Keyu Li,
|
| 13 |
+
Wenkang Qin,
|
| 14 |
+
Jiaxiong Qiu,
|
| 15 |
+
Zheng Zhu,
|
| 16 |
+
Guan Huang,
|
| 17 |
+
Zhizhong Su
|
| 18 |
+
</div>
|
| 19 |
+
|
| 20 |
+
<div align="center" style="line-height: 3;">
|
| 21 |
+
<a href="https://github.com/horizonrobotics/robot_lab" target="_blank" style="margin: 2px;">
|
| 22 |
+
<img alt="Code" src="https://img.shields.io/badge/Code-Github-blue" style="display: inline-block; vertical-align: middle;"/>
|
| 23 |
+
</a>
|
| 24 |
+
<a href="https://horizonrobotics.github.io/robot_lab/robotransfer" target="_blank" style="margin: 2px;">
|
| 25 |
+
<img alt="Project Page" src="https://img.shields.io/badge/π-Project_Page-blue" style="display: inline-block; vertical-align: middle;"/>
|
| 26 |
+
</a>
|
| 27 |
+
<a href="https://arxiv.org/abs/2505.23171" target="_blank" style="margin: 2px;">
|
| 28 |
+
<img alt="arXiv" src="https://img.shields.io/badge/π-arXiv-b31b1b" style="display: inline-block; vertical-align: middle;"/>
|
| 29 |
+
</a>
|
| 30 |
+
<a href="https://youtu.be/dGXKtqDnm5Q" target="_blank" style="margin: 2px;">
|
| 31 |
+
<img alt="Video" src="https://img.shields.io/badge/π₯-Video-red" style="display: inline-block; vertical-align: middle;"/>
|
| 32 |
+
</a>
|
| 33 |
+
<a href="https://mp.weixin.qq.com/s/c9-1HPBMHIy4oEwyKnsT7Q" target="_blank" style="margin: 2px;">
|
| 34 |
+
<img alt="δΈζδ»η»" src="https://img.shields.io/badge/δΈζδ»η»-07C160?logo=wechat&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
| 35 |
+
</a>
|
| 36 |
+
</div>
|
| 37 |
+
|
| 38 |
+
<div align="center">
|
| 39 |
+
<img src="assets/pin/robotransfer.png" width="90%" alt="RoboTransfer Overview"/>
|
| 40 |
+
<p style="font-size:0.8em; color:#555;">The RoboTransfer framework integrates multi-view geometry and video diffusion, enabling controllable and geometry-consistent robotic video synthesis for policy transfer.</p>
|
| 41 |
+
</div>
|
| 42 |
+
|
| 43 |
+
---
|
| 44 |
+
|
| 45 |
+
## π Abstract
|
| 46 |
+
|
| 47 |
+
**RoboTransfer** is a novel diffusion-based video generation framework tailored for robotic visual policy transfer. Unlike conventional approaches, RoboTransfer introduces **geometry-aware synthesis** by injecting **depth and normal priors**, ensuring multi-view consistency across dynamic robotic scenes. The method further supports **explicit control over scene components**, such as **background editing**, **object identity swapping**, and **motion specification**, offering a fine-grained video generation pipeline that benefits embodied learning.
|
| 48 |
+
|
| 49 |
+
---
|
| 50 |
+
|
| 51 |
+
## π§ Key Features
|
| 52 |
+
|
| 53 |
+
- π **Geometry-Consistent Diffusion**: Injects global 3D cues (depth, normal) and cross-view interactions for multi-view realism.
|
| 54 |
+
- π§© **Scene Component Control**: Enables manipulation of object attributes (pose, identity) and background features.
|
| 55 |
+
- π **Cross-View Conditioning**: Learns representations from multiple camera views with spatial correspondence.
|
| 56 |
+
- π€ **Robotic Policy Transfer**: Facilitates domain adaptation by generating synthetic training data in target domains.
|
| 57 |
+
|
| 58 |
+
---
|
| 59 |
+
|
| 60 |
+
## π¦ Resources
|
| 61 |
+
|
| 62 |
+
- **[π§ Paper (arXiv)](https://arxiv.org/abs/2505.23171)**
|
| 63 |
+
- **[π Project Page](https://horizonrobotics.github.io/robot_lab/robotransfer)**
|
| 64 |
+
- **[π₯ Video Demo](https://youtu.be/dGXKtqDnm5Q)**
|
| 65 |
+
- **[π» GitHub Code (Coming Soon)](https://github.com/horizonrobotics/robot_lab)**
|
| 66 |
+
- **[π δΈζδ»η»](https://mp.weixin.qq.com/s/c9-1HPBMHIy4oEwyKnsT7Q)**
|
| 67 |
+
|
| 68 |
+
---
|
| 69 |
+
|
| 70 |
+
## πΈ Framework Overview
|
| 71 |
+
|
| 72 |
+

|
| 73 |
+
|
| 74 |
+
> The overall architecture includes view-specific encoding, geometry injection, diffusion denoising with spatial constraints, and component-level editing modules. Our system enables compositional control over scene dynamics while preserving physical and geometric consistency.
|
| 75 |
+
|
| 76 |
+
---
|
| 77 |
+
|
| 78 |
+
## π BibTeX
|
| 79 |
+
|
| 80 |
+
```bibtex
|
| 81 |
+
@article{liu2025robotransfer,
|
| 82 |
+
title={RoboTransfer: Geometry-Consistent Video Diffusion for Robotic Visual Policy Transfer},
|
| 83 |
+
author={Liu, Liu and Wang, Xiaofeng and Zhao, Guosheng and Li, Keyu and Qin, Wenkang and Qiu, Jiaxiong and Zhu, Zheng and Huang, Guan and Su, Zhizhong},
|
| 84 |
+
journal={arXiv preprint arXiv:2505.23171},
|
| 85 |
+
year={2025}
|
| 86 |
}
|
|
|