Improve model card: Add pipeline tag, update paper & GitHub links
Browse filesThis PR enhances the model card for `MemFlow` by:
- Adding `pipeline_tag: text-to-video` to the metadata, improving discoverability on the Hub.
- Updating the primary paper link (badge and in the "Updates" section) to the official Hugging Face paper page: [MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives](https://huggingface.co/papers/2512.14699), as the previous links were broken or incorrect.
- Adding a prominent GitHub badge linking to the repository: [https://github.com/KlingTeam/MemFlow](https://github.com/KlingTeam/MemFlow).
- Correcting placeholder values in the BibTeX citation block with the accurate arXiv information.
These changes provide more comprehensive and accurate information for users.
README.md
CHANGED
|
@@ -1,3 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
<p align="center" >
|
| 2 |
<img src="assets/logo.png" width="30%" >
|
| 3 |
</p>
|
|
@@ -29,7 +33,9 @@
|
|
| 29 |
|
| 30 |
<a href='https://www.youtube.com/watch?v=7l7-WlIrgHg'><img src='https://img.shields.io/static/v1?label=Youtube&message=DemoVideo&color=yellow&logo=youtube'></a>
|
| 31 |
|
| 32 |
-
<a href=""><img src="https://img.shields.io/
|
|
|
|
|
|
|
| 33 |
|
| 34 |
<a href='https://huggingface.co/KlingTeam/MemFlow'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-orange'></a>
|
| 35 |
</p>
|
|
@@ -40,14 +46,14 @@
|
|
| 40 |
- __[2025.12.14]__: Training and inference code, [model checkpoints](https://huggingface.co/KlingTeam/MemFlow) are available.
|
| 41 |
<!-- - __[2025.09.25]__: [CamCloneMaster](https://arxiv.org/abs/2506.03140) has been accepted by SIGGRAPH Aisa 2025. -->
|
| 42 |
<!-- - __[2025.09.08]__: [CameraClone Dataset](https://huggingface.co/datasets/KwaiVGI/CameraClone-Dataset/) is avaliable. -->
|
| 43 |
-
- __[2025.12.14]__: Release the [project page](https://sihuiji.github.io/MemFlow.github.io/) and the [
|
| 44 |
|
| 45 |
## 📷 Introduction
|
| 46 |
**TL;DR:**
|
| 47 |
We propose MemFlow to address the core challenge of long-context consistency and narrative coherence in streaming video generation.
|
| 48 |
Specifically, before generating the coming chunk, we dynamically update the memory bank by retrieving the most relevant historical frames with the text prompt of this chunk.
|
| 49 |
In addition, during generation, we only activate the most relevant tokens in the memory bank for each query in the attention layers, which effectively guarantees the generation efficiency.
|
| 50 |
-
In this way, MemFlow achieves outstanding long-context consistency with negligible computation burden and keeps the compatibility with any streaming video generation model with KV cache.
|
| 51 |
|
| 52 |
|
| 53 |
<div align="center">
|
|
@@ -133,7 +139,7 @@ bash interactive_inference.sh
|
|
| 133 |
|
| 134 |
1. For each subject and background appearing in a video, maintaining consistent descriptions across different prompts within the same video greatly improves global coherence during prompt switches. See the example for the exact prompt set we used to produce some of our videos on the demo page.
|
| 135 |
|
| 136 |
-
2. MemFlow supports diverse interaction—action changes, introducing/removing objects, background shifts, and more. While large-scale continuous camera motions can be achieved through appropriate cinematic language (see [`prompts/interactive_example.jsonl`](prompts/interactive_example.jsonl)), rapid shot-to-shot transitions or fast cutscene-style edits are not supported.
|
| 137 |
|
| 138 |
## ⚙️ Training
|
| 139 |
**Download checkpoints**
|
|
@@ -157,7 +163,7 @@ bash train_long.sh
|
|
| 157 |
|
| 158 |
**Hints for two stage training**
|
| 159 |
|
| 160 |
-
The `bank_size` is a tunable hyperparameter specified in [`configs/train_init.yaml`](configs/train_init.yaml) and [`configs/train_long.yaml`](configs/train_long.yaml). It controls the number of latent frames stored in the memory bank. When `bank_size` matches the number of latent frames of frame sink in [LongLive](https://github.com/NVlabs/LongLive) (as in our default setting), training can optionally start directly from Stage 2 (Streaming Long Tuning). Specifically, we initialize from the checkpoint [`longlive_base.pt`](https://huggingface.co/Efficient-Large-Model/LongLive-1.3B/blob/main/models/longlive_base.pt) obtained in Stage 1 of [LongLive](https://github.com/NVlabs/LongLive) and fine-tune only the LoRA parameters, which significantly improves training efficiency.
|
| 161 |
|
| 162 |
|
| 163 |
<!-- ## How to contribute
|
|
@@ -182,9 +188,9 @@ Please leave us a star 🌟 and cite our paper if you find our work helpful.
|
|
| 182 |
title={MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives},
|
| 183 |
author={Ji, Sihui and Chen, Xi and Yang, Shuai and Tao, Xin and Wan, Pengfei and Zhao, Hengshuang},
|
| 184 |
year={2025},
|
| 185 |
-
eprint={2512.
|
| 186 |
archivePrefix={arXiv},
|
| 187 |
primaryClass={cs.CV},
|
| 188 |
-
url={https://arxiv.org/abs/2512.
|
| 189 |
}
|
| 190 |
```
|
|
|
|
| 1 |
+
---
|
| 2 |
+
pipeline_tag: text-to-video
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
<p align="center" >
|
| 6 |
<img src="assets/logo.png" width="30%" >
|
| 7 |
</p>
|
|
|
|
| 33 |
|
| 34 |
<a href='https://www.youtube.com/watch?v=7l7-WlIrgHg'><img src='https://img.shields.io/static/v1?label=Youtube&message=DemoVideo&color=yellow&logo=youtube'></a>
|
| 35 |
|
| 36 |
+
<a href="https://huggingface.co/papers/2512.14699"><img src="https://img.shields.io/badge/Paper-MemFlow-red?logo=huggingface"></a>
|
| 37 |
+
|
| 38 |
+
<a href='https://github.com/KlingTeam/MemFlow'><img src='https://img.shields.io/badge/GitHub-Code-blue?logo=github'></a>
|
| 39 |
|
| 40 |
<a href='https://huggingface.co/KlingTeam/MemFlow'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-orange'></a>
|
| 41 |
</p>
|
|
|
|
| 46 |
- __[2025.12.14]__: Training and inference code, [model checkpoints](https://huggingface.co/KlingTeam/MemFlow) are available.
|
| 47 |
<!-- - __[2025.09.25]__: [CamCloneMaster](https://arxiv.org/abs/2506.03140) has been accepted by SIGGRAPH Aisa 2025. -->
|
| 48 |
<!-- - __[2025.09.08]__: [CameraClone Dataset](https://huggingface.co/datasets/KwaiVGI/CameraClone-Dataset/) is avaliable. -->
|
| 49 |
+
- __[2025.12.14]__: Release the [project page](https://sihuiji.github.io/MemFlow.github.io/) and the [Paper](https://huggingface.co/papers/2512.14699) version.
|
| 50 |
|
| 51 |
## 📷 Introduction
|
| 52 |
**TL;DR:**
|
| 53 |
We propose MemFlow to address the core challenge of long-context consistency and narrative coherence in streaming video generation.
|
| 54 |
Specifically, before generating the coming chunk, we dynamically update the memory bank by retrieving the most relevant historical frames with the text prompt of this chunk.
|
| 55 |
In addition, during generation, we only activate the most relevant tokens in the memory bank for each query in the attention layers, which effectively guarantees the generation efficiency.
|
| 56 |
+
In this way, MemFlow achieves outstanding long-context consistency with negligible computation burden (7.9% speed reduction compared with the memory-free baseline) and keeps the compatibility with any streaming video generation model with KV cache.
|
| 57 |
|
| 58 |
|
| 59 |
<div align="center">
|
|
|
|
| 139 |
|
| 140 |
1. For each subject and background appearing in a video, maintaining consistent descriptions across different prompts within the same video greatly improves global coherence during prompt switches. See the example for the exact prompt set we used to produce some of our videos on the demo page.
|
| 141 |
|
| 142 |
+
2. MemFlow supports diverse interaction—action changes, introducing/removing objects, background shifts, and more. While large-scale continuous camera motions can be achieved through appropriate cinematic language (see [`prompts/interactive_example.jsonl`](https://github.com/KlingTeam/MemFlow/blob/main/prompts/interactive_example.jsonl)), rapid shot-to-shot transitions or fast cutscene-style edits are not supported.
|
| 143 |
|
| 144 |
## ⚙️ Training
|
| 145 |
**Download checkpoints**
|
|
|
|
| 163 |
|
| 164 |
**Hints for two stage training**
|
| 165 |
|
| 166 |
+
The `bank_size` is a tunable hyperparameter specified in [`configs/train_init.yaml`](https://github.com/KlingTeam/MemFlow/blob/main/configs/train_init.yaml) and [`configs/train_long.yaml`](https://github.com/KlingTeam/MemFlow/blob/main/configs/train_long.yaml). It controls the number of latent frames stored in the memory bank. When `bank_size` matches the number of latent frames of frame sink in [LongLive](https://github.com/NVlabs/LongLive) (as in our default setting), training can optionally start directly from Stage 2 (Streaming Long Tuning). Specifically, we initialize from the checkpoint [`longlive_base.pt`](https://huggingface.co/Efficient-Large-Model/LongLive-1.3B/blob/main/models/longlive_base.pt) obtained in Stage 1 of [LongLive](https://github.com/NVlabs/LongLive) and fine-tune only the LoRA parameters, which significantly improves training efficiency.
|
| 167 |
|
| 168 |
|
| 169 |
<!-- ## How to contribute
|
|
|
|
| 188 |
title={MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives},
|
| 189 |
author={Ji, Sihui and Chen, Xi and Yang, Shuai and Tao, Xin and Wan, Pengfei and Zhao, Hengshuang},
|
| 190 |
year={2025},
|
| 191 |
+
eprint={2512.14699},
|
| 192 |
archivePrefix={arXiv},
|
| 193 |
primaryClass={cs.CV},
|
| 194 |
+
url={https://arxiv.org/abs/2512.14699},
|
| 195 |
}
|
| 196 |
```
|