|
|
--- |
|
|
license: mit |
|
|
datasets: |
|
|
- NumlockUknowSth/Cine250K |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- Wan-AI/Wan2.1-T2V-1.3B |
|
|
pipeline_tag: text-to-video |
|
|
tags: |
|
|
- multi-shot |
|
|
--- |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
<h1>CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models</h1> |
|
|
|
|
|
[](https://uknowsth.github.io/CineTrans/)Β βΒ [](https://arxiv.org/abs/2508.11484)Β βΒ [](https://github.com/Vchitect/CineTrans)Β βΒ [](https://huggingface.co/datasets/NumlockUknowSth/Cine250K)Β βΒ |
|
|
|
|
|
|
|
|
<p><a href="https://scholar.google.com/citations?hl=zh-CN&user=TbZZSVgAAAAJ">Xiaoxue Wu</a><sup>1,2*</sup>, |
|
|
<a href="https://scholar.google.com/citations?user=0gY2o7MAAAAJ&hl=zh-CN" target="_blank">Bingjie Gao</a><sup>2,3</sup>, |
|
|
<a href="https://scholar.google.com.hk/citations?user=gFtI-8QAAAAJ&hl=zh-CN">Yu Qiao</a><sup>2†</sup>, |
|
|
<a href="https://wyhsirius.github.io/">Yaohui Wang</a><sup>2†</sup>, |
|
|
<a href="https://scholar.google.com/citations?user=3fWSC8YAAAAJ">Xinyuan Chen</a><sup>2†</sup></p> |
|
|
|
|
|
|
|
|
<span class="author-block"><sup>1</sup>Fudan University</span> |
|
|
<span class="author-block"><sup>2</sup>Shanghai Artificial Intelligence Laboratory</span> |
|
|
<span class="author-block"><sup>3</sup>Shanghai Jiao Tong University</span> |
|
|
|
|
|
|
|
|
<span class="author-block"><sup>*</sup>Work done during internship at Shanghai AI Laboratory</span> <span class="author-block"><sup>†</sup>Corresponding author</span> |
|
|
|
|
|
</div> |
|
|
|
|
|
## π₯ Installation |
|
|
1. Clone the Repository |
|
|
``` |
|
|
git clone https://github.com/UknowSth/CineTrans.git |
|
|
cd CineTrans |
|
|
``` |
|
|
2. Set up Environment |
|
|
``` |
|
|
conda create -n cinetrans python==3.11.9 |
|
|
conda activate cinetrans |
|
|
|
|
|
pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu118 |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
## π€ Checkpoint |
|
|
|
|
|
### CineTrans-DiT |
|
|
Download the weights of [Wan2.1-T2V-1.3B](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B/tree/main) and [lora weights](https://huggingface.co/NumlockUknowSth/CineTrans-DiT/tree/main). Place them as: |
|
|
``` |
|
|
Wan2.1-T2V-1.3B/ # original weights |
|
|
βββ google/ |
|
|
β βββ umt5-xxl/ |
|
|
βββ config.json |
|
|
βββ diffusion_pytorch_model.safetensors |
|
|
βββ models_t5_umt5-xxl-enc-bf16.pth |
|
|
βββ Wan2.1_VAE.pth |
|
|
ckpt/ |
|
|
βββ weights.pt # lora weights |
|
|
``` |
|
|
|
|
|
For more inference details, please refer to our [GitHub repository](https://github.com/Vchitect/CineTrans). |
|
|
|
|
|
## π BiTeX |
|
|
If you find [CineTrans](https://github.com/Vchitect/CineTrans.git) useful for your research and applications, please cite using this BibTeX: |
|
|
``` |
|
|
@misc{wu2025cinetranslearninggeneratevideos, |
|
|
title={CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models}, |
|
|
author={Xiaoxue Wu and Bingjie Gao and Yu Qiao and Yaohui Wang and Xinyuan Chen}, |
|
|
year={2025}, |
|
|
eprint={2508.11484}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CV}, |
|
|
url={https://arxiv.org/abs/2508.11484}, |
|
|
} |
|
|
``` |