LongLive-RAG β Checkpoints & Toy Data
This repository hosts the model checkpoints, prompt files, and a toy latent set for
LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation.
- π Paper: https://arxiv.org/abs/2606.02553
- π€ HF Paper Page: https://huggingface.co/papers/2606.02553
- π» Code: https://github.com/qixinhu11/LongLive-RAG
- π TL;DR: LongLive-RAG turns long video generation into a retrieval problem. An autoregressive (AR) video generator looks back over the video it has already generated and retrieves the most relevant past latents as additional context β reducing error accumulation, identity drift, and background flicker over long horizons, without retraining the base generator.
What's in here
checkpoints/
βββ causal_forcing.pt # Causal-Forcing AR backbone
βββ self_forcing.pt # Self-Forcing AR backbone
βββ longlive_base.pt # LongLive AR backbone
βββ longlive_lora.pt # LongLive LoRA, paired with longlive_base.pt
βββ ae_latent_mem.pt # Retrieval autoencoder, default for inference
βββ moviegenbench_128_refined.txt # 128 MovieGenBench prompts
βββ vidprom_filtered_extended.txt # Self-Forcing prompt pool for generate_latent.py
toydatasets/
βββ latent_0000xx.pt # Tiny example latent set for the training demo
- AR backbones β
causal_forcing.pt,self_forcing.pt, andlonglive_base.pt+longlive_lora.ptβ are the frozen base generators that LongLive-RAG plugs into. ae_latent_mem.ptis the trainable retrieval encoder, implemented as a small latent autoencoder. This is the only component LongLive-RAG trains.toydatasets/contains a tiny set of clean latent blocks for smoke-testing the autoencoder training pipeline end-to-end.
The base WAN VAE (
Wan-AI/Wan2.1-T2V-1.3B) that LongLive-RAG operates in the latent space of is not included here. Please download it separately, as described below.
Download
Everything restores into the expected layout of the code repository. Run the following commands from the root of your local LongLive-RAG checkout:
# Base WAN VAE β LongLive-RAG operates in its latent space
hf download Wan-AI/Wan2.1-T2V-1.3B --local-dir wan_models/Wan2.1-T2V-1.3B
# All LongLive-RAG assets β restores checkpoints/ and toydatasets/ in place
hf download qixinhu11/LongLive-RAG --local-dir . --include "checkpoints/*" "toydatasets/*"
The --include filter pulls only checkpoints/ and toydatasets/, so it will not overwrite
your local README.md. Older setups can replace hf download with
huggingface-cli download using the same arguments.
To pull a single file instead:
hf download qixinhu11/LongLive-RAG checkpoints/ae_latent_mem.pt --local-dir .
Usage
Clone the code repository, download the assets above, then run the shipped 3 Γ 2 grid, covering three AR backbones and two context-assembly methods:
# Main result: LongLive backbone + LongLive-RAG retrieval
bash inference.sh longlive latentmem
# Baseline: native sliding-window context
bash inference.sh causal_forcing native
See the GitHub README for full installation, inference, and training instructions.
Paper
LongLive-RAG is described in the following paper:
LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation
Qixin Hu, Shuai Yang, Wei Huang, Song Han, Yukang Chen
- arXiv: https://arxiv.org/abs/2606.02553
- Hugging Face Papers: https://huggingface.co/papers/2606.02553
If you find this repository useful, please cite our paper.
License
Released under the Apache 2.0 license.
The included AR backbones and WAN VAE latent space derive from their respective upstream projects:
Please also respect the original licenses of these upstream projects.
Citation
@article{hu2026longliverag,
title = {LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation},
author = {Hu, Qixin and Yang, Shuai and Huang, Wei and Han, Song and Chen, Yukang},
journal = {arXiv preprint arXiv:2606.02553},
year = {2026},
eprint = {2606.02553},
archivePrefix = {arXiv},
primaryClass = {cs.CV}
}