LongLive-RAG β€” Checkpoints & Toy Data

This repository hosts the model checkpoints, prompt files, and a toy latent set for
LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation.

What's in here

checkpoints/
β”œβ”€β”€ causal_forcing.pt              # Causal-Forcing AR backbone
β”œβ”€β”€ self_forcing.pt                # Self-Forcing AR backbone
β”œβ”€β”€ longlive_base.pt               # LongLive AR backbone
β”œβ”€β”€ longlive_lora.pt               # LongLive LoRA, paired with longlive_base.pt
β”œβ”€β”€ ae_latent_mem.pt               # Retrieval autoencoder, default for inference
β”œβ”€β”€ moviegenbench_128_refined.txt  # 128 MovieGenBench prompts
└── vidprom_filtered_extended.txt  # Self-Forcing prompt pool for generate_latent.py

toydatasets/
└── latent_0000xx.pt               # Tiny example latent set for the training demo
  • AR backbones β€” causal_forcing.pt, self_forcing.pt, and longlive_base.pt + longlive_lora.pt β€” are the frozen base generators that LongLive-RAG plugs into.
  • ae_latent_mem.pt is the trainable retrieval encoder, implemented as a small latent autoencoder. This is the only component LongLive-RAG trains.
  • toydatasets/ contains a tiny set of clean latent blocks for smoke-testing the autoencoder training pipeline end-to-end.

The base WAN VAE (Wan-AI/Wan2.1-T2V-1.3B) that LongLive-RAG operates in the latent space of is not included here. Please download it separately, as described below.

Download

Everything restores into the expected layout of the code repository. Run the following commands from the root of your local LongLive-RAG checkout:

# Base WAN VAE β€” LongLive-RAG operates in its latent space
hf download Wan-AI/Wan2.1-T2V-1.3B --local-dir wan_models/Wan2.1-T2V-1.3B

# All LongLive-RAG assets β€” restores checkpoints/ and toydatasets/ in place
hf download qixinhu11/LongLive-RAG --local-dir . --include "checkpoints/*" "toydatasets/*"

The --include filter pulls only checkpoints/ and toydatasets/, so it will not overwrite your local README.md. Older setups can replace hf download with huggingface-cli download using the same arguments.

To pull a single file instead:

hf download qixinhu11/LongLive-RAG checkpoints/ae_latent_mem.pt --local-dir .

Usage

Clone the code repository, download the assets above, then run the shipped 3 Γ— 2 grid, covering three AR backbones and two context-assembly methods:

# Main result: LongLive backbone + LongLive-RAG retrieval
bash inference.sh longlive latentmem

# Baseline: native sliding-window context
bash inference.sh causal_forcing native

See the GitHub README for full installation, inference, and training instructions.

Paper

LongLive-RAG is described in the following paper:

LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation
Qixin Hu, Shuai Yang, Wei Huang, Song Han, Yukang Chen

If you find this repository useful, please cite our paper.

License

Released under the Apache 2.0 license.

The included AR backbones and WAN VAE latent space derive from their respective upstream projects:

Please also respect the original licenses of these upstream projects.

Citation

@article{hu2026longliverag,
  title         = {LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation},
  author        = {Hu, Qixin and Yang, Shuai and Huang, Wei and Han, Song and Chen, Yukang},
  journal       = {arXiv preprint arXiv:2606.02553},
  year          = {2026},
  eprint        = {2606.02553},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for qixinhu11/LongLive-RAG