Instructions to use Efficient-Large-Model/LongLive-2.0-5B-NVFP4-S4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Wan2.2
How to use Efficient-Large-Model/LongLive-2.0-5B-NVFP4-S4 with Wan2.2:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
LongLive2.0 5B NVFP4 Denoising Step 4
This repository hosts the LongLive2.0 5B NVFP4 denoising step 4 checkpoint for inference with the LongLive2.0 release code:
https://github.com/NVlabs/LongLive
LongLive2.0 inference loads the Wan2.2-TI2V-5B generator, applies the few-step DMD adapter when a separate LoRA checkpoint is provided, and runs the generator with NVFP4 weight quantization plus optional FP4 KV-cache quantization.
Installation
The NVFP4 path uses a stricter environment than the default BF16 release path. We recommend keeping it in a separate conda environment.
git clone https://github.com/wileewang/LongLive2.0.git
cd LongLive2.0
conda create -n longlive2_nvfp4 python=3.12 -y
conda activate longlive2_nvfp4
pip install -r requirements.txt
pip install --upgrade --index-url https://download.pytorch.org/whl/cu128 \
torch==2.10.0 torchvision==0.25.0
Build the NVFP4 / FP4 extensions:
cd fouroversix
pip install ninja packaging psutil "setuptools>=77.0.3"
# B200 / GB200 / GB300
export CUDA_ARCHS=100
# RTX 50/60 series, if needed
# export CUDA_ARCHS=120
pip install --no-build-isolation -e .
cd ..
git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention
git checkout v2.8.3
pip install -U pip setuptools wheel ninja packaging
pip install --no-build-isolation -e .
cd ..
cd utils/kernel
python setup.py build_ext --inplace
cd ../..
Quick environment check:
python -c "import torch, torchvision; print(torch.__version__, torch.version.cuda); print(torchvision.__version__)"
python -c "import flash_attn; print(flash_attn.__version__)"
python -c "import fouroversix; from utils.quant import LongLiveQuantizationConfig, quantize_to_fp4"
python -c "from utils.kernel.kv_dequant import dequantize_kv_cache_fp4"
The released LongLive2.0 checkpoint is sufficient for standard inference. You only need to download the original Wan2.2-TI2V-5B components if you want to run training, initialize from the original Wan weights, or use code paths that explicitly load the base Wan model files:
huggingface-cli download Wan-AI/Wan2.2-TI2V-5B \
--local-dir wan_models/Wan2.2-TI2V-5B
Download this checkpoint repository:
huggingface-cli download Perflow-Shuai/LongLive-2.0-5B-NVFP4-4Step \
--local-dir checkpoints/longlive2_5b_nvfp4_4step
Configure Inference
Edit configs/nvfp4/inference_nvfp4.yaml.
For the released 4-step NVFP4 checkpoint, keep
inference.sampling_steps: 4:
checkpoints:
generator_ckpt: checkpoints/longlive2_5b_nvfp4_4step/path/to/generator.pt
lora_ckpt: null
merge_lora: false
data:
data_path: /path/to/inference_prompts
image_or_video_shape:
- 1
- 384
- 48
- 44
- 80
output_folder: videos/longlive2_nvfp4_4step
num_samples: 1
num_output_frames: 384
inference:
sampling_steps: 4
sink_size: 8
guidance_scale: 1.0
multi_shot_sink: true
multi_shot_rope_offset: 8
kv_quant: true
kv_quant_scale_rule: mse
kv_quant_backend: cuda
streaming_vae: false
async_vae: false
vae_type: wan
model_quant: true
model_quant_use_transformer_engine: false
model_quant_scale_rule: mse
model_quant_activation_scale_rule: mse
model_quant_weight_scale_rule: mse
model_quant_gradient_scale_rule: mse
Replace the checkpoint filename above with the actual file in this repository.
If this repository contains a separate DMD LoRA checkpoint instead of a merged
generator, set checkpoints.lora_ckpt to that LoRA file and set
merge_lora: true, then add the LoRA adapter config:
adapter:
type: lora
rank: 128
alpha: 128
dropout: 0.0
dtype: bfloat16
apply_to_critic: true
verbose: true
If checkpoints.lora_ckpt is null, remove the adapter section.
Do not set model_quant_use_transformer_engine: true when loading a FourOverSix
materialized NVFP4 checkpoint. FourOverSix checkpoints store
quantized_weight_* buffers and should be loaded through the FourOverSix path.
Prompt Folder
data.data_path can be either:
- a
.txtfile, where each line is one single-shot prompt; or - a directory of multi-shot prompt folders.
Example multi-shot prompt folder:
inference_prompts/
robot_lab_demo/
0.json
1.json
2.json
shot_durations.txt
Each JSON file contains:
{
"caption": "A compact silver robot with one blue optic explores a clean robotics lab."
}
shot_durations.txt is optional. If provided, each number is the number of
temporal chunks assigned to the corresponding caption, for example:
2 2 4
Run
Single node, 4 GPUs:
torchrun --standalone --nnodes=1 --nproc_per_node=4 inference.py \
--config_path configs/nvfp4/inference_nvfp4.yaml
Single GPU:
python inference.py --config_path configs/nvfp4/inference_nvfp4.yaml
Or use the helper script, which reads NUM_GPUS / num_gpus when provided:
scripts/inference_nvfp4.sh configs/nvfp4/inference_nvfp4.yaml
Outputs are written to output_folder.
Notes
- This model card is for the 4-step NVFP4 checkpoint. Use
inference.sampling_steps: 4. model_quantenables NVFP4 generator inference.inference.kv_quantenables FP4 KV-cache storage and requires theutils/kernelextension.inference.multi_shot_sinkenables the multi-shot attention sink.inference.multi_shot_rope_offsetcontrols the multi-shot RoPE offset.inference.streaming_vae,inference.async_vae,inference.vae_type, andinference.vae_devicecontrol streaming or asynchronous VAE decode.
License/Terms of Use
GOVERNING TERMS: This trial service is governed by the NVIDIA API Trial Terms of Service. Use of this model is governed by the NVIDIA Open Model License Agreement.
Citation
@article{longlive_2,
title={LongLive2.0: An NVFP4 Parallel Infrastructure for Long Video Generation},
author={Chen, Yukang and Wang, Luozhou and Huang, Wei and Yang, Shuai and Zhang, Bohan and Xiao, Yicheng and Chu, Ruihang and Mao, Weian and Hu, Qixin and Liu, Shaoteng and Zhao, Yuyang and Mao, Huizi and Chen, Ying-Cong and Xie, Enze and Qi, Xiaojuan and Han, Song},
journal={arXiv preprint arXiv},
year={2026}
}
- Downloads last month
- -