multimodalart's picture
multimodalart HF Staff
Upload folder using huggingface_hub
e88b235 verified
|
Raw
History Blame Contribute Delete
1.37 kB
---
title: StreamDiffusionV2 Realtime
emoji: 🌀
colorFrom: indigo
colorTo: pink
sdk: gradio
sdk_version: 6.10.0
app_file: app.py
python_version: "3.10"
short_description: Realtime webcam video diffusion (StreamDiffusionV2 / Wan2.1)
startup_duration_timeout: 1h
pinned: false
---
# StreamDiffusionV2 · Realtime Webcam Diffusion
Live ZeroGPU demo of [**StreamDiffusionV2**](https://streamdiffusionv2.github.io/)
(MLSys 2026 Best Paper) on **Wan2.1-T2V-1.3B**, with a custom `gradio.Server`
frontend.
Unlike a fixed-length generator, StreamDiffusionV2 is **designed for continuous
streaming**: a causal Diffusion-Transformer with a **sink-token-guided rolling KV
cache**, a motion-aware noise controller, and StreamVAE. Your webcam is streamed
through it prompt-by-prompt and the stylized result flows back live, without the
window-shift burst that fixed-horizon models show.
The browser captures the webcam and posts frames to a lightweight FastAPI route;
a held `@spaces.GPU` session runs StreamDiffusionV2's single-GPU streaming loop
(`start_stream_session``run_stream_batch`) and streams frames back over the
Gradio JS client, paced by a client jitter buffer.
One of three rolling/streaming demos:
- StreamDiffusionV2 (this) — video-to-video webcam.
- LongLive — interactive long text-to-video.
- Rolling Forcing — real-time multi-minute text-to-video.