nielsr HF Staff

Add model card and metadata

dc1444d verified 25 days ago

2.49 kB

license: apache-2.0
pipeline_tag: image-to-video

MultiWorld: Scalable Multi-Agent Multi-View Video World Models

MultiWorld is a unified framework for multi-agent multi-view world modeling that enables accurate control of multiple agents while maintaining multi-view consistency. It is modeled as an action-conditioned video generation model that takes historical frames and current actions as input to predict future frames.

Paper: MultiWorld: Scalable Multi-Agent Multi-View Video World Models
Project Page: https://multi-world.github.io/
GitHub Repository: https://github.com/CIntellifusion/MultiWorld

Overview

MultiWorld introduces two key components:

Multi-Agent Condition Module: Employs Agent Identity Embedding and Adaptive Action Weighting to achieve precise multi-agent controllability.
Global State Encoder: Uses a frozen VGGT backbone to extract implicit 3D global environmental information, ensuring multi-view consistency.

The model scales effectively across varying agent counts and camera views, supporting autoregressive inference to generate video sequences beyond the training context length.

Setup and Usage

Environment Setup

conda create -n multiworld python=3.13 
conda activate multiworld
# install torch 
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 \
    --index-url https://download.pytorch.org/whl/cu128

pip install -r requirements.txt

Inference Example

To run inference on the "It Takes Two" game dataset:

python -m torch.distributed.run --nproc_per_node=8 \
    ittakestwo/parallel_inference.py \
    --inference-seed 0 \
    --num-inference-steps 50 \
    --config-path ittakestwo/configs/inference_480P_full.yaml \
    --model-path <path_to_model_checkpoint> \
    --output-dir outputs/eval_480P_full

For robotics tasks:

python -m torch.distributed.run --nproc_per_node=8 \
    robots/parallel_inference.py \
    --config-path robots/configs/inference.yaml \
    --model-path <path_to_model_checkpoint> \
    --output-dir outputs/test_robotics_output

Citation

@article{wu2025multiworld,
  title={MultiWorld: Scalable Multi-Agent Multi-View Video World Models},
  author={Wu, Haoyu and Yu, Jiwen and Zou, Yingtian and Liu, Xihui},
  journal={arXiv preprint arXiv:2604.18564},
  year={2026}
}