--- arxiv: "2602.14381" tags: - video-generation - vace - real-time - autoregressive - diffusion - wan license: apache-2.0 --- # Adapting VACE for Real-Time Autoregressive Video Diffusion This is the companion model card for the paper [Adapting VACE for Real-Time Autoregressive Video Diffusion](https://arxiv.org/abs/2602.14381). ## Overview This work presents modifications to [VACE](https://github.com/ali-vilab/VACE) that enable real-time autoregressive generation. The original VACE system uses bidirectional attention across full sequences, which is incompatible with streaming requirements. The key innovation moves reference frames from the diffusion latent space into a parallel conditioning pathway, maintaining fixed chunk sizes and KV caching needed for autoregressive models. The adaptation leverages existing pretrained weights without retraining. Testing across 1.3B and 14B model scales shows structural control adds 20-30% latency overhead with minimal memory costs. ## Real-Time Demo Resolume Arena as live input into Scope via Spout: ## VACE Control Examples These comparisons show the adapted VACE conditioning across different control modes (corresponding to figures in the paper): | Control Mode | Video | |---|---| | Depth | | | Scribble | | | Optical Flow | | | Image-to-Video | | | Inpainting | | | Outpainting | | | Layout | | ## Reference Implementation The reference implementation is available in [Daydream Scope](https://github.com/daydreamlive/scope), a tool for running real-time, interactive generative AI video pipelines. ## Author [ryanontheinside.com](https://ryanontheinside.com) ## Citation ```bibtex @article{fosdick2026adapting, title={Adapting VACE for Real-Time Autoregressive Video Diffusion}, author={Fosdick, Ryan}, journal={arXiv preprint arXiv:2602.14381}, year={2026} } ```