Add model card and metadata

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +69 -3
README.md CHANGED
@@ -1,3 +1,69 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: image-to-video
4
+ ---
5
+
6
+ # MultiWorld: Scalable Multi-Agent Multi-View Video World Models
7
+
8
+ MultiWorld is a unified framework for multi-agent multi-view world modeling that enables accurate control of multiple agents while maintaining multi-view consistency. It is modeled as an action-conditioned video generation model that takes historical frames and current actions as input to predict future frames.
9
+
10
+ - **Paper:** [MultiWorld: Scalable Multi-Agent Multi-View Video World Models](https://huggingface.co/papers/2604.18564)
11
+ - **Project Page:** [https://multi-world.github.io/](https://multi-world.github.io/)
12
+ - **GitHub Repository:** [https://github.com/CIntellifusion/MultiWorld](https://github.com/CIntellifusion/MultiWorld)
13
+
14
+ ## Overview
15
+
16
+ MultiWorld introduces two key components:
17
+ 1. **Multi-Agent Condition Module**: Employs Agent Identity Embedding and Adaptive Action Weighting to achieve precise multi-agent controllability.
18
+ 2. **Global State Encoder**: Uses a frozen VGGT backbone to extract implicit 3D global environmental information, ensuring multi-view consistency.
19
+
20
+ The model scales effectively across varying agent counts and camera views, supporting autoregressive inference to generate video sequences beyond the training context length.
21
+
22
+ ## Setup and Usage
23
+
24
+ ### Environment Setup
25
+
26
+ ```bash
27
+ conda create -n multiworld python=3.13
28
+ conda activate multiworld
29
+ # install torch
30
+ pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 \
31
+ --index-url https://download.pytorch.org/whl/cu128
32
+
33
+ pip install -r requirements.txt
34
+ ```
35
+
36
+ ### Inference Example
37
+
38
+ To run inference on the "It Takes Two" game dataset:
39
+
40
+ ```bash
41
+ python -m torch.distributed.run --nproc_per_node=8 \
42
+ ittakestwo/parallel_inference.py \
43
+ --inference-seed 0 \
44
+ --num-inference-steps 50 \
45
+ --config-path ittakestwo/configs/inference_480P_full.yaml \
46
+ --model-path <path_to_model_checkpoint> \
47
+ --output-dir outputs/eval_480P_full
48
+ ```
49
+
50
+ For robotics tasks:
51
+
52
+ ```bash
53
+ python -m torch.distributed.run --nproc_per_node=8 \
54
+ robots/parallel_inference.py \
55
+ --config-path robots/configs/inference.yaml \
56
+ --model-path <path_to_model_checkpoint> \
57
+ --output-dir outputs/test_robotics_output
58
+ ```
59
+
60
+ ## Citation
61
+
62
+ ```bibtex
63
+ @article{wu2025multiworld,
64
+ title={MultiWorld: Scalable Multi-Agent Multi-View Video World Models},
65
+ author={Wu, Haoyu and Yu, Jiwen and Zou, Yingtian and Liu, Xihui},
66
+ journal={arXiv preprint arXiv:2604.18564},
67
+ year={2026}
68
+ }
69
+ ```