LongLive2.0 5B NVFP4 Denoising Step 4

This repository hosts the LongLive2.0 5B NVFP4 denoising step 4 checkpoint for inference with the LongLive2.0 release code:

https://github.com/NVlabs/LongLive

LongLive2.0 inference loads the Wan2.2-TI2V-5B generator, applies the few-step DMD adapter when a separate LoRA checkpoint is provided, and runs the generator with NVFP4 weight quantization plus optional FP4 KV-cache quantization.

Installation

The NVFP4 path uses a stricter environment than the default BF16 release path. We recommend keeping it in a separate conda environment.

git clone https://github.com/wileewang/LongLive2.0.git
cd LongLive2.0

conda create -n longlive2_nvfp4 python=3.12 -y
conda activate longlive2_nvfp4

pip install -r requirements.txt
pip install --upgrade --index-url https://download.pytorch.org/whl/cu128 \
  torch==2.10.0 torchvision==0.25.0

Build the NVFP4 / FP4 extensions:

cd fouroversix
pip install ninja packaging psutil "setuptools>=77.0.3"

# B200 / GB200 / GB300
export CUDA_ARCHS=100

# RTX 50/60 series, if needed
# export CUDA_ARCHS=120

pip install --no-build-isolation -e .
cd ..

git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
git checkout v2.8.3
pip install -U pip setuptools wheel ninja packaging
pip install --no-build-isolation -e .
cd ..

cd utils/kernel
python setup.py build_ext --inplace
cd ../..

Quick environment check:

python -c "import torch, torchvision; print(torch.__version__, torch.version.cuda); print(torchvision.__version__)"
python -c "import flash_attn; print(flash_attn.__version__)"
python -c "import fouroversix; from utils.quant import LongLiveQuantizationConfig, quantize_to_fp4"
python -c "from utils.kernel.kv_dequant import dequantize_kv_cache_fp4"

The released LongLive2.0 checkpoint is sufficient for standard inference. You only need to download the original Wan2.2-TI2V-5B components if you want to run training, initialize from the original Wan weights, or use code paths that explicitly load the base Wan model files:

huggingface-cli download Wan-AI/Wan2.2-TI2V-5B \
  --local-dir wan_models/Wan2.2-TI2V-5B

Download this checkpoint repository:

huggingface-cli download Perflow-Shuai/LongLive-2.0-5B-NVFP4-4Step \
  --local-dir checkpoints/longlive2_5b_nvfp4_4step

Configure Inference

Edit configs/nvfp4/inference_nvfp4.yaml.

For the released 4-step NVFP4 checkpoint, keep inference.sampling_steps: 4:

checkpoints:
  generator_ckpt: checkpoints/longlive2_5b_nvfp4_4step/path/to/generator.pt
  lora_ckpt: null

merge_lora: false

data:
  data_path: /path/to/inference_prompts
  image_or_video_shape:
  - 1
  - 384
  - 48
  - 44
  - 80

output_folder: videos/longlive2_nvfp4_4step
num_samples: 1
num_output_frames: 384

inference:
  sampling_steps: 4
  sink_size: 8
  guidance_scale: 1.0
  multi_shot_sink: true
  multi_shot_rope_offset: 8
  kv_quant: true
  kv_quant_scale_rule: mse
  kv_quant_backend: cuda
  streaming_vae: false
  async_vae: false
  vae_type: wan

model_quant: true
model_quant_use_transformer_engine: false
model_quant_scale_rule: mse
model_quant_activation_scale_rule: mse
model_quant_weight_scale_rule: mse
model_quant_gradient_scale_rule: mse

Replace the checkpoint filename above with the actual file in this repository. If this repository contains a separate DMD LoRA checkpoint instead of a merged generator, set checkpoints.lora_ckpt to that LoRA file and set merge_lora: true, then add the LoRA adapter config:

adapter:
  type: lora
  rank: 128
  alpha: 128
  dropout: 0.0
  dtype: bfloat16
  apply_to_critic: true
  verbose: true

If checkpoints.lora_ckpt is null, remove the adapter section.

Do not set model_quant_use_transformer_engine: true when loading a FourOverSix materialized NVFP4 checkpoint. FourOverSix checkpoints store quantized_weight_* buffers and should be loaded through the FourOverSix path.

Prompt Folder

data.data_path can be either:

a .txt file, where each line is one single-shot prompt; or
a directory of multi-shot prompt folders.

Example multi-shot prompt folder:

inference_prompts/
  robot_lab_demo/
    0.json
    1.json
    2.json
    shot_durations.txt

Each JSON file contains:

{
  "caption": "A compact silver robot with one blue optic explores a clean robotics lab."
}

shot_durations.txt is optional. If provided, each number is the number of temporal chunks assigned to the corresponding caption, for example:

2 2 4

Run

Single node, 4 GPUs:

torchrun --standalone --nnodes=1 --nproc_per_node=4 inference.py \
  --config_path configs/nvfp4/inference_nvfp4.yaml

Single GPU:

python inference.py --config_path configs/nvfp4/inference_nvfp4.yaml

Or use the helper script, which reads NUM_GPUS / num_gpus when provided:

scripts/inference_nvfp4.sh configs/nvfp4/inference_nvfp4.yaml

Outputs are written to output_folder.

Notes

This model card is for the 4-step NVFP4 checkpoint. Use inference.sampling_steps: 4.
model_quant enables NVFP4 generator inference.
inference.kv_quant enables FP4 KV-cache storage and requires the utils/kernel extension.
inference.multi_shot_sink enables the multi-shot attention sink.
inference.multi_shot_rope_offset controls the multi-shot RoPE offset.
inference.streaming_vae, inference.async_vae, inference.vae_type, and inference.vae_device control streaming or asynchronous VAE decode.

License/Terms of Use

GOVERNING TERMS: This trial service is governed by the NVIDIA API Trial Terms of Service. Use of this model is governed by the NVIDIA Open Model License Agreement.

Citation

@article{longlive_2,
  title={LongLive2.0: An NVFP4 Parallel Infrastructure for Long Video Generation},
  author={Chen, Yukang and Wang, Luozhou and Huang, Wei and Yang, Shuai and Zhang, Bohan and Xiao, Yicheng and Chu, Ruihang and Mao, Weian and Hu, Qixin and Liu, Shaoteng and Zhao, Yuyang and Mao, Huizi and Chen, Ying-Cong and Xie, Enze and Qi, Xiaojuan and Han, Song},
  journal={arXiv preprint arXiv},
  year={2026}
}

Downloads last month: -

Collection including Efficient-Large-Model/LongLive-2.0-5B-NVFP4-S4

LongAI

Collection

Boost AI's Long ability, while keeping Efficient. Models in this collection includes LongVILA, LongVILA-R1, LongLive. • 13 items • Updated Jun 2 • 3

Paper for Efficient-Large-Model/LongLive-2.0-5B-NVFP4-S4

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

Paper • 2605.18739 • Published May 18 • 116