KlingTeam
/

ShotStream

Model card Files Files and versions

ShotStream / README.md

yawenluo's picture

Update README.md

a7b30c1 verified 4 days ago

|

history blame contribute delete

2.11 kB

	---
	pipeline_tag: text-to-video
	license: apache-2.0
	language:
	- en
	base_model:
	- Wan-AI/Wan2.1-T2V-1.3B
	---

	# ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling

	ShotStream is a novel causal multi-shot architecture that enables interactive storytelling and efficient on-the-fly frame generation. It achieves sub-second latency and 16 FPS on a single NVIDIA GPU by reformulating the task as next-shot generation conditioned on historical context.

	[Project Page](https://luo0207.github.io/ShotStream/) \| [Paper](https://arxiv.org/abs/2603.25746) \| [Code](https://github.com/KlingAIResearch/ShotStream)

	## Introduction
	Multi-shot video generation is crucial for long narrative storytelling. ShotStream allows users to dynamically instruct ongoing narratives via streaming prompts. It preserves visual coherence through a dual-cache memory mechanism and mitigates error accumulation using a two-stage self-forcing distillation strategy (Distribution Matching Distillation).

	## Usage

	Training and inference code, as well as the models, are all released. For the full implementation and training details, please refer to the [official GitHub repository](https://github.com/KlingAIResearch/ShotStream).

	### 1. Environment Setup

	```bash
	git clone https://github.com/KlingAIResearch/ShotStream.git
	cd ShotStream
	# Setup environment using the provided script
	bash tools/setup/env.sh
	```

	### 2. Download Checkpoints

	```bash
	# Download the checkpoints of Wan-T2V-1.3B and ShotStream
	bash tools/setup/download_ckpt.sh
	```

	### 3. Run Inference

	To perform autoregressive 4-step long multi-shot video generation:

	```bash
	bash tools/inference/causal_fewsteps.sh
	```

	## Citation
	If you find our work helpful, please cite our paper:

	```bibtex
	@article{luo2026shotstream,
	title={ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling},
	author={Luo, Yawen and Shi, Xiaoyu and Zhuang, Junhao and Chen, Yutian and Liu, Quande and Wang, Xintao and Wan, Pengfei and Xue, Tianfan},
	journal={arXiv preprint arXiv:2603.25746},
	year={2026}
	}
	```