Haoyuwu
/

MultiWorldCheckpoint

Model card Files Files and versions

MultiWorldCheckpoint / README.md

nielsr's picture

nielsr HF Staff

Add model card and metadata

dc1444d verified 25 days ago

|

2.49 kB

	---
	license: apache-2.0
	pipeline_tag: image-to-video
	---

	# MultiWorld: Scalable Multi-Agent Multi-View Video World Models

	MultiWorld is a unified framework for multi-agent multi-view world modeling that enables accurate control of multiple agents while maintaining multi-view consistency. It is modeled as an action-conditioned video generation model that takes historical frames and current actions as input to predict future frames.

	- Paper: [MultiWorld: Scalable Multi-Agent Multi-View Video World Models](https://huggingface.co/papers/2604.18564)
	- Project Page: [https://multi-world.github.io/](https://multi-world.github.io/)
	- GitHub Repository: [https://github.com/CIntellifusion/MultiWorld](https://github.com/CIntellifusion/MultiWorld)

	## Overview

	MultiWorld introduces two key components:
	1. Multi-Agent Condition Module: Employs Agent Identity Embedding and Adaptive Action Weighting to achieve precise multi-agent controllability.
	2. Global State Encoder: Uses a frozen VGGT backbone to extract implicit 3D global environmental information, ensuring multi-view consistency.

	The model scales effectively across varying agent counts and camera views, supporting autoregressive inference to generate video sequences beyond the training context length.

	## Setup and Usage

	### Environment Setup

	```bash
	conda create -n multiworld python=3.13
	conda activate multiworld
	# install torch
	pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 \
	--index-url https://download.pytorch.org/whl/cu128

	pip install -r requirements.txt
	```

	### Inference Example

	To run inference on the "It Takes Two" game dataset:

	```bash
	python -m torch.distributed.run --nproc_per_node=8 \
	ittakestwo/parallel_inference.py \
	--inference-seed 0 \
	--num-inference-steps 50 \
	--config-path ittakestwo/configs/inference_480P_full.yaml \
	--model-path <path_to_model_checkpoint> \
	--output-dir outputs/eval_480P_full
	```

	For robotics tasks:

	```bash
	python -m torch.distributed.run --nproc_per_node=8 \
	robots/parallel_inference.py \
	--config-path robots/configs/inference.yaml \
	--model-path <path_to_model_checkpoint> \
	--output-dir outputs/test_robotics_output
	```

	## Citation

	```bibtex
	@article{wu2025multiworld,
	title={MultiWorld: Scalable Multi-Agent Multi-View Video World Models},
	author={Wu, Haoyu and Yu, Jiwen and Zou, Yingtian and Liu, Xihui},
	journal={arXiv preprint arXiv:2604.18564},
	year={2026}
	}
	```