File size: 3,329 Bytes
f693479 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
---
license: mit
datasets:
- NumlockUknowSth/Cine250K
language:
- en
base_model:
- Wan-AI/Wan2.1-T2V-1.3B
pipeline_tag: text-to-video
tags:
- multi-shot
---
<div align="center">
<h1>CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models</h1>
[](https://uknowsth.github.io/CineTrans/)Β βΒ [](https://arxiv.org/abs/2508.11484)Β βΒ [](https://github.com/Vchitect/CineTrans)Β βΒ [](https://huggingface.co/datasets/NumlockUknowSth/Cine250K)Β βΒ
<p><a href="https://scholar.google.com/citations?hl=zh-CN&user=TbZZSVgAAAAJ">Xiaoxue Wu</a><sup>1,2*</sup>,
<a href="https://scholar.google.com/citations?user=0gY2o7MAAAAJ&hl=zh-CN" target="_blank">Bingjie Gao</a><sup>2,3</sup>,
<a href="https://scholar.google.com.hk/citations?user=gFtI-8QAAAAJ&hl=zh-CN">Yu Qiao</a><sup>2†</sup>,
<a href="https://wyhsirius.github.io/">Yaohui Wang</a><sup>2†</sup>,
<a href="https://scholar.google.com/citations?user=3fWSC8YAAAAJ">Xinyuan Chen</a><sup>2†</sup></p>
<span class="author-block"><sup>1</sup>Fudan University</span>
<span class="author-block"><sup>2</sup>Shanghai Artificial Intelligence Laboratory</span>
<span class="author-block"><sup>3</sup>Shanghai Jiao Tong University</span>
<span class="author-block"><sup>*</sup>Work done during internship at Shanghai AI Laboratory</span> <span class="author-block"><sup>†</sup>Corresponding author</span>
</div>
## π₯ Installation
1. Clone the Repository
```
git clone https://github.com/UknowSth/CineTrans.git
cd CineTrans
```
2. Set up Environment
```
conda create -n cinetrans python==3.11.9
conda activate cinetrans
pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
```
## π€ Checkpoint
### CineTrans-DiT
Download the weights of [Wan2.1-T2V-1.3B](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B/tree/main) and [lora weights](https://huggingface.co/NumlockUknowSth/CineTrans-DiT/tree/main). Place them as:
```
Wan2.1-T2V-1.3B/ # original weights
βββ google/
β βββ umt5-xxl/
βββ config.json
βββ diffusion_pytorch_model.safetensors
βββ models_t5_umt5-xxl-enc-bf16.pth
βββ Wan2.1_VAE.pth
ckpt/
βββ weights.pt # lora weights
```
For more inference details, please refer to our [GitHub repository](https://github.com/Vchitect/CineTrans).
## π BiTeX
If you find [CineTrans](https://github.com/Vchitect/CineTrans.git) useful for your research and applications, please cite using this BibTeX:
```
@misc{wu2025cinetranslearninggeneratevideos,
title={CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models},
author={Xiaoxue Wu and Bingjie Gao and Yu Qiao and Yaohui Wang and Xinyuan Chen},
year={2025},
eprint={2508.11484},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2508.11484},
}
``` |