yjguo
/

Ctrl-World

action_conditioned_video_model

Model card Files Files and versions

Ctrl-World / README.md

yjguo's picture

Update README.md

8cf8146 verified 3 months ago

|

history blame contribute delete

2.32 kB

	---
	license: mit
	datasets:
	- cadene/droid_1.0.1
	language:
	- en
	base_model:
	- stabilityai/stable-video-diffusion-img2vid
	pipeline_tag: robotics
	tags:
	- action_conditioned_video_model
	---
	<div align="center">
	<h2><center>👉 Ctrl-World: A Controllable Generative World Model for Robot Manipulation </h2>

	[Yanjiang Guo](https://robert-gyj.github.io), [Lucy Xiaoyang Shi](https://lucys0.github.io), [Jianyu Chen](http://people.iiis.tsinghua.edu.cn/~jychen/), [Chelsea Finn](https://ai.stanford.edu/~cbfinn/)

	\*Equal contribution; Stanford University, Tsinghua University


	<a href='https://arxiv.org/abs/2510.10125'><img src='https://img.shields.io/badge/ArXiv-2510.10125-red'></a>
	<a href='https://ctrl-world.github.io/'><img src='https://img.shields.io/badge/Project-Page-Blue'></a>

	</div>

	## TL; DR:
	[Ctrl-World](https://sites.google.com/view/ctrl-world) is an action-conditioned world model compatible with modern VLA policies and enables policy-in-the-loop rollouts entirely in imagination, which can be used to evaluate and improve the instruction following ability of VLA.

	<p>
	<img src="ctrl_world.jpg" alt="wild-data" width="100%" />
	</p>

	## Model Details:
	This repo include the Ctrl-World model checkpoint trained on opensourced [DROID dataset](https://droid-dataset.github.io/) (~95k trajectories, 564 scenes).
	The DROID platform consists of a Franka Panda robotic arm equipped with a Robotiq gripper and three cameras: two randomly placed third-person cameras and one wrist-mounted camera.

	## Usage
	See the official [Ctrl-World github repo](https://github.com/Robert-gyj/Ctrl-World/tree/main) for detailed usage.

	## Acknowledgement

	Ctrl-World is developed from the opensourced video foundation model [Stable-Video-Diffusion](https://github.com/Stability-AI/generative-models). The VLA model used in this repo is from [openpi](https://github.com/Physical-Intelligence/openpi). We thank the authors for their efforts!


	## Bibtex
	If you find our work helpful, please leave us a star and cite our paper. Thank you!
	```
	@article{guo2025ctrl,
	title={Ctrl-World: A Controllable Generative World Model for Robot Manipulation},
	author={Guo, Yanjiang and Shi, Lucy Xiaoyang and Chen, Jianyu and Finn, Chelsea},
	journal={arXiv preprint arXiv:2510.10125},
	year={2025}
	}
	```