CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models

Xiaoxue Wu^1,2*, Bingjie Gao^2,3, Yu Qiao^2†, Yaohui Wang^2†, Xinyuan Chen^2†

¹Fudan University ²Shanghai Artificial Intelligence Laboratory ³Shanghai Jiao Tong University

^*Work done during internship at Shanghai AI Laboratory ^†Corresponding author

📥 Installation

Clone the Repository

git clone https://github.com/UknowSth/CineTrans.git
cd CineTrans

Set up Environment

conda create -n cinetrans python==3.11.9
conda activate cinetrans

pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt

🤗 Checkpoint

CineTrans-Unet

Download the required model weights and place them in the ckpt/ directory.

ckpt/
│── stable-diffusion-v1-4/
│   ├── scheduler/
│   ├── text_encoder/
│   ├── tokenizer/  
│   │── unet/
│   └── vae_temporal_decoder/
│── checkpoint.pt
│── longclip-L.pt

For more inference details, please refer to the GitHub repository.

📑 BiTeX

If you find CineTrans useful for your research and applications, please cite using this BibTeX:

@misc{wu2025cinetranslearninggeneratevideos,
      title={CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models}, 
      author={Xiaoxue Wu and Bingjie Gao and Yu Qiao and Yaohui Wang and Xinyuan Chen},
      year={2025},
      eprint={2508.11484},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2508.11484}, 
}

Downloads last month: 1

Dataset used to train NumlockUknowSth/CineTrans-Unet

Paper for NumlockUknowSth/CineTrans-Unet

CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models

Paper • 2508.11484 • Published Aug 15, 2025