Haoyuwu
/

MultiWorldCheckpoint

Model card Files Files and versions

Add model card and metadata

#1

by nielsr HF Staff - opened 16 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

Files changed (1) hide show

README.md +69 -3

README.md CHANGED Viewed

@@ -1,3 +1,69 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+pipeline_tag: image-to-video
+---
+# MultiWorld: Scalable Multi-Agent Multi-View Video World Models
+MultiWorld is a unified framework for multi-agent multi-view world modeling that enables accurate control of multiple agents while maintaining multi-view consistency. It is modeled as an action-conditioned video generation model that takes historical frames and current actions as input to predict future frames.
+- **Paper:** [MultiWorld: Scalable Multi-Agent Multi-View Video World Models](https://huggingface.co/papers/2604.18564)
+- **Project Page:** [https://multi-world.github.io/](https://multi-world.github.io/)
+- **GitHub Repository:** [https://github.com/CIntellifusion/MultiWorld](https://github.com/CIntellifusion/MultiWorld)
+## Overview
+MultiWorld introduces two key components:
+1. **Multi-Agent Condition Module**: Employs Agent Identity Embedding and Adaptive Action Weighting to achieve precise multi-agent controllability.
+2. **Global State Encoder**: Uses a frozen VGGT backbone to extract implicit 3D global environmental information, ensuring multi-view consistency.
+The model scales effectively across varying agent counts and camera views, supporting autoregressive inference to generate video sequences beyond the training context length.
+## Setup and Usage
+### Environment Setup
+```bash
+conda create -n multiworld python=3.13
+conda activate multiworld
+# install torch
+pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 \
+    --index-url https://download.pytorch.org/whl/cu128
+pip install -r requirements.txt
+```
+### Inference Example
+To run inference on the "It Takes Two" game dataset:
+```bash
+python -m torch.distributed.run --nproc_per_node=8 \
+    ittakestwo/parallel_inference.py \
+    --inference-seed 0 \
+    --num-inference-steps 50 \
+    --config-path ittakestwo/configs/inference_480P_full.yaml \
+    --model-path <path_to_model_checkpoint> \
+    --output-dir outputs/eval_480P_full
+```
+For robotics tasks:
+```bash
+python -m torch.distributed.run --nproc_per_node=8 \
+    robots/parallel_inference.py \
+    --config-path robots/configs/inference.yaml \
+    --model-path <path_to_model_checkpoint> \
+    --output-dir outputs/test_robotics_output
+```
+## Citation
+```bibtex
+@article{wu2025multiworld,
+  title={MultiWorld: Scalable Multi-Agent Multi-View Video World Models},
+  author={Wu, Haoyu and Yu, Jiwen and Zou, Yingtian and Liu, Xihui},
+  journal={arXiv preprint arXiv:2604.18564},
+  year={2026}
+}
+```