sand-ai
/

MAGI-1

+# Magi-1: Autoregressive Video Generation Are Scalable World Models
+<!-- TODO: add image -->
+<div align="center" style="margin-top: 0px; margin-bottom: 0px;">
+<img src=https://github.com/user-attachments/.... width="30%"/>
+此处添加官方图片
+</div>
+-----
+This repository contains the code for the Magi-1 model, pre-trained weights and inference code. You can find more information on our [project page](http://sand.ai).
+## 1. Introduction
+We present magi, a world model that generates videos by autoregressively predicting a sequence of video chunks, defined as fixed-length segments of consecutive frames. Trained to denoise per-chunk noise that increases monotonically over time, magi enables causal temporal modeling and naturally supports streaming generation. It achieves strong performance on image-to-video (I2V) tasks conditioned on text instructions, providing high temporal consistency and scalability, which are made possible by several algorithmic innovations and a dedicated infrastructure stack. Magi further supports controllable generation via chunk-wise prompting, enabling smooth scene transitions, long-horizon synthesis, and fine-grained text-driven control. We believe magi offers a promising direction for unifying high-fidelity video generation with flexible instruction control and real-time deployment.
+## 2. Model and Checkpoints
+We provide the pre-trained weights for Magi-1, including the 24B and 4.5B models, as well as the corresponding distill and distill+quant models. The model weight links are shown in the table.
+| Model                         | Link                                                         | Recommend Machine               |
+| ----------------------------- | ------------------------------------------------------------ | ------------------------------- |
+| Magi-1-24B                    | [Magi-1-24B](https://huggingface.co/sand-ai/Magi-1/tree/main/ckpt/magi/24B_base)       | H100/H800 \* 8                  |
+| Magi-1-24B-distill            | [Magi-1-24B-distill](https://huggingface.co/sand-ai/Magi-1/tree/main/ckpt/magi/24B_distill) | H100/H800 \* 8                  |
+| Magi-1-24B-distill+fp8_quant  | [Magi-1-24B-distill+quant](https://huggingface.co/sand-ai/Magi-1/tree/main/ckpt/magi/24B_distill_quant) | H100/H800 \* 4 or RTX 4090 \* 8 |
+| Magi-1-4.5B                   | Magi-1-4.5B (Comming Soon)      | RTX 4090 \* 1                   |
+| Magi-1-4.5B-distill           | Magi-1-4.5B-distill (Comming Soon) | RTX 4090 \* 1                   |
+| Magi-1-4.5B-distill+fp8_quant | Magi-1-4.5B-distill+fp8_quant (Comming Soon) | RTX 4090 \* 1                   |
+## 3. How to run
+### 3.1 Environment preparation
+We provide two ways to run Magi-1, with the Docker environment being the recommended option.
+**Run with docker environment (Recommend)**
+```bash
+docker pull magi/magi:latest
+docker run -it --gpus all --privileged --shm-size=32g --name magi --net=host --ipc=host --ulimit memlock=-1 --ulimit stack=6710886 sandai/magi:latest /bin/bash
+```
+**Run with source code**
+```bash
+# Create a new environment
+conda create -n magi python==3.10.12
+# Install pytorch
+conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia
+# Install other dependencies
+pip install -r requirements.txt
+# Install magi-attention, new install method
+pip install --no-cache-dir "https://python-artifacts.oss-cn-shanghai.aliyuncs.com/flash_attn_3-3.0.0b2-cp310-cp310-linux_x86_64.whl" --no-deps
+```
+### 3.2 Inference command
+```bash
+# Run 24B Magi-1 model
+bash example/24B/run.sh
+# Run 4.5B Magi-1 model
+bash example/4.5B/run.sh
+```
+### 3.3 Useful configs
+| Config         | Help                                                         |
+| -------------- | ------------------------------------------------------------ |
+| seed           | Random seed used for video generation                        |
+| video_size_h   | Height of the video                                          |
+| video_size_w   | Width of the video                                           |
+| num_frames     | Controls the duration of generated video                     |
+| fps            | Frames per second, 4 video frames correspond to 1 latent_frame |
+| cfg_number     | Base model uses cfg_number==2, distill and quant model uses cfg_number=1 |
+| load           | Directory containing a model checkpoint.                     |
+| t5_pretrained  | Path to load pretrained T5 model                             |
+| vae_pretrained | Path to load pretrained VAE model                            |
+## 4. Acknowledgements
+## 5. Contact
+Please feel free to cite our paper if you find our code or model useful in your research.
+```
+@article{magi1,
+  title={Magi-1: Autoregressive Video Generation Are Scalable World Models},
+  author={Magi-1},
+  journal={arXiv preprint arXiv:2504.06165},
+  year={2025}
+  (TODO: add correct citation)
+}
+```
+If you have any questions, please feel free to raise an issue.