Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,101 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Magi-1: Autoregressive Video Generation Are Scalable World Models
|
| 2 |
+
|
| 3 |
+
<!-- TODO: add image -->
|
| 4 |
+
<div align="center" style="margin-top: 0px; margin-bottom: 0px;">
|
| 5 |
+
<img src=https://github.com/user-attachments/.... width="30%"/>
|
| 6 |
+
此处添加官方图片
|
| 7 |
+
</div>
|
| 8 |
+
|
| 9 |
+
-----
|
| 10 |
+
|
| 11 |
+
This repository contains the code for the Magi-1 model, pre-trained weights and inference code. You can find more information on our [project page](http://sand.ai).
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
## 1. Introduction
|
| 15 |
+
|
| 16 |
+
We present magi, a world model that generates videos by autoregressively predicting a sequence of video chunks, defined as fixed-length segments of consecutive frames. Trained to denoise per-chunk noise that increases monotonically over time, magi enables causal temporal modeling and naturally supports streaming generation. It achieves strong performance on image-to-video (I2V) tasks conditioned on text instructions, providing high temporal consistency and scalability, which are made possible by several algorithmic innovations and a dedicated infrastructure stack. Magi further supports controllable generation via chunk-wise prompting, enabling smooth scene transitions, long-horizon synthesis, and fine-grained text-driven control. We believe magi offers a promising direction for unifying high-fidelity video generation with flexible instruction control and real-time deployment.
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
## 2. Model and Checkpoints
|
| 20 |
+
|
| 21 |
+
We provide the pre-trained weights for Magi-1, including the 24B and 4.5B models, as well as the corresponding distill and distill+quant models. The model weight links are shown in the table.
|
| 22 |
+
|
| 23 |
+
| Model | Link | Recommend Machine |
|
| 24 |
+
| ----------------------------- | ------------------------------------------------------------ | ------------------------------- |
|
| 25 |
+
| Magi-1-24B | [Magi-1-24B](https://huggingface.co/sand-ai/Magi-1/tree/main/ckpt/magi/24B_base) | H100/H800 \* 8 |
|
| 26 |
+
| Magi-1-24B-distill | [Magi-1-24B-distill](https://huggingface.co/sand-ai/Magi-1/tree/main/ckpt/magi/24B_distill) | H100/H800 \* 8 |
|
| 27 |
+
| Magi-1-24B-distill+fp8_quant | [Magi-1-24B-distill+quant](https://huggingface.co/sand-ai/Magi-1/tree/main/ckpt/magi/24B_distill_quant) | H100/H800 \* 4 or RTX 4090 \* 8 |
|
| 28 |
+
| Magi-1-4.5B | Magi-1-4.5B (Comming Soon) | RTX 4090 \* 1 |
|
| 29 |
+
| Magi-1-4.5B-distill | Magi-1-4.5B-distill (Comming Soon) | RTX 4090 \* 1 |
|
| 30 |
+
| Magi-1-4.5B-distill+fp8_quant | Magi-1-4.5B-distill+fp8_quant (Comming Soon) | RTX 4090 \* 1 |
|
| 31 |
+
|
| 32 |
+
|
| 33 |
+
## 3. How to run
|
| 34 |
+
|
| 35 |
+
### 3.1 Environment preparation
|
| 36 |
+
|
| 37 |
+
We provide two ways to run Magi-1, with the Docker environment being the recommended option.
|
| 38 |
+
|
| 39 |
+
**Run with docker environment (Recommend)**
|
| 40 |
+
|
| 41 |
+
```bash
|
| 42 |
+
docker pull magi/magi:latest
|
| 43 |
+
|
| 44 |
+
docker run -it --gpus all --privileged --shm-size=32g --name magi --net=host --ipc=host --ulimit memlock=-1 --ulimit stack=6710886 sandai/magi:latest /bin/bash
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
**Run with source code**
|
| 48 |
+
|
| 49 |
+
```bash
|
| 50 |
+
# Create a new environment
|
| 51 |
+
conda create -n magi python==3.10.12
|
| 52 |
+
# Install pytorch
|
| 53 |
+
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia
|
| 54 |
+
# Install other dependencies
|
| 55 |
+
pip install -r requirements.txt
|
| 56 |
+
# Install magi-attention, new install method
|
| 57 |
+
pip install --no-cache-dir "https://python-artifacts.oss-cn-shanghai.aliyuncs.com/flash_attn_3-3.0.0b2-cp310-cp310-linux_x86_64.whl" --no-deps
|
| 58 |
+
```
|
| 59 |
+
|
| 60 |
+
### 3.2 Inference command
|
| 61 |
+
|
| 62 |
+
```bash
|
| 63 |
+
# Run 24B Magi-1 model
|
| 64 |
+
bash example/24B/run.sh
|
| 65 |
+
|
| 66 |
+
# Run 4.5B Magi-1 model
|
| 67 |
+
bash example/4.5B/run.sh
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
### 3.3 Useful configs
|
| 71 |
+
|
| 72 |
+
| Config | Help |
|
| 73 |
+
| -------------- | ------------------------------------------------------------ |
|
| 74 |
+
| seed | Random seed used for video generation |
|
| 75 |
+
| video_size_h | Height of the video |
|
| 76 |
+
| video_size_w | Width of the video |
|
| 77 |
+
| num_frames | Controls the duration of generated video |
|
| 78 |
+
| fps | Frames per second, 4 video frames correspond to 1 latent_frame |
|
| 79 |
+
| cfg_number | Base model uses cfg_number==2, distill and quant model uses cfg_number=1 |
|
| 80 |
+
| load | Directory containing a model checkpoint. |
|
| 81 |
+
| t5_pretrained | Path to load pretrained T5 model |
|
| 82 |
+
| vae_pretrained | Path to load pretrained VAE model |
|
| 83 |
+
|
| 84 |
+
|
| 85 |
+
## 4. Acknowledgements
|
| 86 |
+
|
| 87 |
+
## 5. Contact
|
| 88 |
+
|
| 89 |
+
Please feel free to cite our paper if you find our code or model useful in your research.
|
| 90 |
+
|
| 91 |
+
```
|
| 92 |
+
@article{magi1,
|
| 93 |
+
title={Magi-1: Autoregressive Video Generation Are Scalable World Models},
|
| 94 |
+
author={Magi-1},
|
| 95 |
+
journal={arXiv preprint arXiv:2504.06165},
|
| 96 |
+
year={2025}
|
| 97 |
+
(TODO: add correct citation)
|
| 98 |
+
}
|
| 99 |
+
```
|
| 100 |
+
|
| 101 |
+
If you have any questions, please feel free to raise an issue.
|