Text-to-Video
English
nielsr HF Staff commited on
Commit
e1dd59f
·
verified ·
1 Parent(s): 3bfa3fe

Improve model card and add pipeline tag

Browse files

Hi, I'm Niels from the Hugging Face community science team. I noticed this model repository could benefit from improved documentation and metadata.

This PR adds:
- The `text-to-video` pipeline tag to the YAML metadata to improve discoverability.
- A more structured README including links to the project page, paper, and official GitHub repository.
- Basic usage instructions (environment setup and inference) adapted from the official README.
- The BibTeX citation for the paper.

These changes help users understand how to run and cite the model effectively!

Files changed (1) hide show
  1. README.md +48 -5
README.md CHANGED
@@ -1,9 +1,52 @@
 
 
 
 
1
  # ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling
2
 
3
- TL;DR: We propose ShotStream, a novel causal multi-shot architecture that enables interactive storytelling and efficient on-the-fly frame generation, achieving 16 FPS on a single NVIDIA GPU.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
 
5
- Please refer to the [Github](https://github.com/KlingAIResearch/ShotStream/blob/main/README.md) README for usage.
 
6
 
7
- * Paper:[https://arxiv.org/abs/2603.25746](https://arxiv.org/abs/2603.25746)
8
- * Project Page:[https://luo0207.github.io/ShotStream/](https://luo0207.github.io/ShotStream/)
9
- * Code:[https://github.com/KlingAIResearch/ShotStream](https://github.com/KlingAIResearch/ShotStream)
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: text-to-video
3
+ ---
4
+
5
  # ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling
6
 
7
+ **ShotStream** is a novel causal multi-shot architecture that enables interactive storytelling and efficient on-the-fly frame generation. It achieves sub-second latency and 16 FPS on a single NVIDIA GPU by reformulating the task as next-shot generation conditioned on historical context.
8
+
9
+ [**Project Page**](https://luo0207.github.io/ShotStream/) | [**Paper**](https://arxiv.org/abs/2603.25746) | [**Code**](https://github.com/KlingAIResearch/ShotStream)
10
+
11
+ ## Introduction
12
+ Multi-shot video generation is crucial for long narrative storytelling. ShotStream allows users to dynamically instruct ongoing narratives via streaming prompts. It preserves visual coherence through a dual-cache memory mechanism and mitigates error accumulation using a two-stage distillation strategy (Distribution Matching Distillation).
13
+
14
+ ## Usage
15
+
16
+ For the full implementation and training details, please refer to the [official GitHub repository](https://github.com/KlingAIResearch/ShotStream).
17
+
18
+ ### 1. Environment Setup
19
+
20
+ ```bash
21
+ git clone https://github.com/KlingAIResearch/ShotStream.git
22
+ cd ShotStream
23
+ # Setup environment using the provided script
24
+ bash tools/setup/env.sh
25
+ ```
26
+
27
+ ### 2. Download Checkpoints
28
+
29
+ ```bash
30
+ # Download the checkpoints of Wan-T2V-1.3B and ShotStream
31
+ bash tools/setup/download_ckpt.sh
32
+ ```
33
+
34
+ ### 3. Run Inference
35
+
36
+ To perform autoregressive 4-step long multi-shot video generation:
37
+
38
+ ```bash
39
+ bash tools/inference/causal_fewsteps.sh
40
+ ```
41
 
42
+ ## Citation
43
+ If you find our work helpful, please cite our paper:
44
 
45
+ ```bibtex
46
+ @article{luo2026shotstream,
47
+ title={ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling},
48
+ author={Luo, Yawen and Shi, Xiaoyu and Zhuang, Junhao and Chen, Yutian and Liu, Quande and Wang, Xintao and Pengfei Wan and Xue, Tianfan},
49
+ journal={arXiv preprint arXiv:2603.25746},
50
+ year={2026}
51
+ }
52
+ ```