LongLive-RAG — Checkpoints & Toy Data

This repository hosts the model checkpoints, prompt files, and a toy latent set for
LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation.

📄 Paper: https://arxiv.org/abs/2606.02553
🤗 HF Paper Page: https://huggingface.co/papers/2606.02553
💻 Code: https://github.com/qixinhu11/LongLive-RAG
🔍 TL;DR: LongLive-RAG turns long video generation into a retrieval problem. An autoregressive (AR) video generator looks back over the video it has already generated and retrieves the most relevant past latents as additional context — reducing error accumulation, identity drift, and background flicker over long horizons, without retraining the base generator.

What's in here

checkpoints/
├── causal_forcing.pt              # Causal-Forcing AR backbone
├── self_forcing.pt                # Self-Forcing AR backbone
├── longlive_base.pt               # LongLive AR backbone
├── longlive_lora.pt               # LongLive LoRA, paired with longlive_base.pt
├── ae_latent_mem.pt               # Retrieval autoencoder, default for inference
├── moviegenbench_128_refined.txt  # 128 MovieGenBench prompts
└── vidprom_filtered_extended.txt  # Self-Forcing prompt pool for generate_latent.py

toydatasets/
└── latent_0000xx.pt               # Tiny example latent set for the training demo

AR backbones — causal_forcing.pt, self_forcing.pt, and longlive_base.pt + longlive_lora.pt — are the frozen base generators that LongLive-RAG plugs into.
ae_latent_mem.pt is the trainable retrieval encoder, implemented as a small latent autoencoder. This is the only component LongLive-RAG trains.
toydatasets/ contains a tiny set of clean latent blocks for smoke-testing the autoencoder training pipeline end-to-end.

The base WAN VAE (Wan-AI/Wan2.1-T2V-1.3B) that LongLive-RAG operates in the latent space of is not included here. Please download it separately, as described below.

Download

Everything restores into the expected layout of the code repository. Run the following commands from the root of your local LongLive-RAG checkout:

# Base WAN VAE — LongLive-RAG operates in its latent space
hf download Wan-AI/Wan2.1-T2V-1.3B --local-dir wan_models/Wan2.1-T2V-1.3B

# All LongLive-RAG assets — restores checkpoints/ and toydatasets/ in place
hf download qixinhu11/LongLive-RAG --local-dir . --include "checkpoints/*" "toydatasets/*"

The --include filter pulls only checkpoints/ and toydatasets/, so it will not overwrite your local README.md. Older setups can replace hf download with huggingface-cli download using the same arguments.

To pull a single file instead:

hf download qixinhu11/LongLive-RAG checkpoints/ae_latent_mem.pt --local-dir .

Usage

Clone the code repository, download the assets above, then run the shipped 3 × 2 grid, covering three AR backbones and two context-assembly methods:

# Main result: LongLive backbone + LongLive-RAG retrieval
bash inference.sh longlive latentmem

# Baseline: native sliding-window context
bash inference.sh causal_forcing native

See the GitHub README for full installation, inference, and training instructions.

Paper

LongLive-RAG is described in the following paper:

LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation
Qixin Hu, Shuai Yang, Wei Huang, Song Han, Yukang Chen

arXiv: https://arxiv.org/abs/2606.02553
Hugging Face Papers: https://huggingface.co/papers/2606.02553

If you find this repository useful, please cite our paper.

License

Released under the Apache 2.0 license.

The included AR backbones and WAN VAE latent space derive from their respective upstream projects:

Please also respect the original licenses of these upstream projects.

Citation

@article{hu2026longliverag,
  title         = {LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation},
  author        = {Hu, Qixin and Yang, Shuai and Huang, Wei and Han, Song and Chen, Yukang},
  journal       = {arXiv preprint arXiv:2606.02553},
  year          = {2026},
  eprint        = {2606.02553},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Paper for qixinhu11/LongLive-RAG

LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation

Paper • 2606.02553 • Published Jun 1 • 20