---
license: mit
datasets:
- NumlockUknowSth/Cine250K
language:
- en
pipeline_tag: text-to-video
tags:
- multi-shot
---
CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models
[](https://uknowsth.github.io/CineTrans/) [](https://arxiv.org/abs/2508.11484) [](https://github.com/Vchitect/CineTrans) [](https://huggingface.co/datasets/NumlockUknowSth/Cine250K)
Xiaoxue Wu1,2*,
Bingjie Gao2,3,
Yu Qiao2†,
Yaohui Wang2†,
Xinyuan Chen2†
1Fudan University
2Shanghai Artificial Intelligence Laboratory
3Shanghai Jiao Tong University
*Work done during internship at Shanghai AI Laboratory †Corresponding author
## 📥 Installation
1. Clone the Repository
```
git clone https://github.com/UknowSth/CineTrans.git
cd CineTrans
```
2. Set up Environment
```
conda create -n cinetrans python==3.11.9
conda activate cinetrans
pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
```
## 🤗 Checkpoint
### CineTrans-Unet
Download the required [model weights](https://huggingface.co/NumlockUknowSth/CineTrans-Unet/tree/main) and place them in the `ckpt/` directory.
```
ckpt/
│── stable-diffusion-v1-4/
│ ├── scheduler/
│ ├── text_encoder/
│ ├── tokenizer/
│ │── unet/
│ └── vae_temporal_decoder/
│── checkpoint.pt
│── longclip-L.pt
```
For more inference details, please refer to the [GitHub repository](https://github.com/Vchitect/CineTrans).
---
## 📑 BiTeX
If you find [CineTrans](https://github.com/Vchitect/CineTrans.git) useful for your research and applications, please cite using this BibTeX:
```
@misc{wu2025cinetranslearninggeneratevideos,
title={CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models},
author={Xiaoxue Wu and Bingjie Gao and Yu Qiao and Yaohui Wang and Xinyuan Chen},
year={2025},
eprint={2508.11484},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2508.11484},
}
```