| --- |
| pipeline_tag: text-to-video |
| license: apache-2.0 |
| language: |
| - en |
| base_model: |
| - Wan-AI/Wan2.1-T2V-1.3B |
| --- |
| |
| # ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling |
|
|
| **ShotStream** is a novel causal multi-shot architecture that enables interactive storytelling and efficient on-the-fly frame generation. It achieves sub-second latency and 16 FPS on a single NVIDIA GPU by reformulating the task as next-shot generation conditioned on historical context. |
|
|
| [**Project Page**](https://luo0207.github.io/ShotStream/) | [**Paper**](https://arxiv.org/abs/2603.25746) | [**Code**](https://github.com/KlingAIResearch/ShotStream) |
|
|
| ## Introduction |
| Multi-shot video generation is crucial for long narrative storytelling. ShotStream allows users to dynamically instruct ongoing narratives via streaming prompts. It preserves visual coherence through a dual-cache memory mechanism and mitigates error accumulation using a two-stage self-forcing distillation strategy (Distribution Matching Distillation). |
|
|
| ## Usage |
|
|
| **Training and inference code, as well as the models, are all released.** For the full implementation and **training details**, please refer to the [official GitHub repository](https://github.com/KlingAIResearch/ShotStream). |
|
|
| ### 1. Environment Setup |
|
|
| ```bash |
| git clone https://github.com/KlingAIResearch/ShotStream.git |
| cd ShotStream |
| # Setup environment using the provided script |
| bash tools/setup/env.sh |
| ``` |
|
|
| ### 2. Download Checkpoints |
|
|
| ```bash |
| # Download the checkpoints of Wan-T2V-1.3B and ShotStream |
| bash tools/setup/download_ckpt.sh |
| ``` |
|
|
| ### 3. Run Inference |
|
|
| To perform autoregressive 4-step long multi-shot video generation: |
|
|
| ```bash |
| bash tools/inference/causal_fewsteps.sh |
| ``` |
|
|
| ## Citation |
| If you find our work helpful, please cite our paper: |
|
|
| ```bibtex |
| @article{luo2026shotstream, |
| title={ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling}, |
| author={Luo, Yawen and Shi, Xiaoyu and Zhuang, Junhao and Chen, Yutian and Liu, Quande and Wang, Xintao and Wan, Pengfei and Xue, Tianfan}, |
| journal={arXiv preprint arXiv:2603.25746}, |
| year={2026} |
| } |
| ``` |