Instructions to use charles2530/Wan2.2-NVFP4-Sparse with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use charles2530/Wan2.2-NVFP4-Sparse with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("charles2530/Wan2.2-NVFP4-Sparse", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
Wan2.2 NVFP4 Sparse to ComfyUI Conversion Analysis
Sources checked
- Kijai Hugging Face repo: https://huggingface.co/Kijai/WanVideo_comfy_nvfp4
- ComfyUI Wan2.2 workflow docs: https://docs.comfy.org/tutorials/video/wan/wan2_2
- ComfyUI Wan2.2 examples: https://comfyanonymous.github.io/ComfyUI_examples/wan22/
- ComfyUI mixed precision loader reference: https://huggingface.co/mhnakif/comfy/blob/main/comfy/ops.py
- ComfyUI quant op reference: https://huggingface.co/mhnakif/comfy/blob/main/comfy/quant_ops.py
- Comfy Kitchen hardware/backend reference: https://github.com/Comfy-Org/comfy-kitchen
- Local ComfyUI source checkout used for verification:
Comfy-Org/ComfyUIcommit5aa71b9bc28809a16596bb9fa3d0a6300d8e3f0e
Script provenance
convert_lightx2v_nvfp4_to_comfy.py is a local conversion script written for
this directory. It is not copied from one upstream script. The implementation is
derived from these upstream pages and ComfyUI source conventions:
- Kijai model page:
https://huggingface.co/Kijai/WanVideo_comfy_nvfp4
- This page gives the actual LightX2V NVFP4 to Comfy NVFP4 conversion rule:
nibble-swap packed U8 weights, keep
weight_scale, setweight_scale_2 = alpha * input_global_scale, and setinput_scale = 1 / input_global_scale.
- This page gives the actual LightX2V NVFP4 to Comfy NVFP4 conversion rule:
nibble-swap packed U8 weights, keep
- ComfyUI quantized loader:
https://github.com/Comfy-Org/ComfyUI/blob/5aa71b9bc28809a16596bb9fa3d0a6300d8e3f0e/comfy/ops.py#L1058-L1091
- This loader reads
{layer}.comfy_quant, branches onformat == "nvfp4", then requires{layer}.weight_scale_2and{layer}.weight_scale.
- This loader reads
- ComfyUI quant algorithm registry:
https://github.com/Comfy-Org/ComfyUI/blob/5aa71b9bc28809a16596bb9fa3d0a6300d8e3f0e/comfy/quant_ops.py#L190-L205
- This defines the
nvfp4storage dtype astorch.uint8and the parameter set asweight_scale,weight_scale_2, andinput_scale.
- This defines the
- ComfyUI quantization metadata handling:
https://github.com/Comfy-Org/ComfyUI/blob/5aa71b9bc28809a16596bb9fa3d0a6300d8e3f0e/comfy/utils.py#L1360-L1421
- This shows that
_quantization_metadata.layersis converted into{layer}.comfy_quantJSON byte tensors and that the presence of.comfy_quantenables mixed quantized ops.
- This shows that
- ComfyUI native NVFP4 hardware gate:
https://github.com/Comfy-Org/ComfyUI/blob/5aa71b9bc28809a16596bb9fa3d0a6300d8e3f0e/comfy/model_management.py#L1877-L1885
- This returns true only for NVIDIA GPUs with compute capability major version >= 10, which is why H100 can validate/load files but is not expected to use native Blackwell NVFP4 tensor-core compute.
Format findings
The original files in this directory are LightX2V NVFP4 Sparse safetensors. Each file has 400 quantized Linear layers with these LightX2V-side tensors:
{layer}.weight: packed NVFP4 values intorch.uint8{layer}.weight_scale: FP8 E4M3 block scale tensor{layer}.alpha: scalar post-matmul rescaler{layer}.input_global_scale: scalar input scale convention
Kijai's model card says the ComfyUI conversion is still NVFP4 and uses the same datatype, but changes conventions:
- Swap the high/low nibbles in each packed
uint8weight byte. - Keep
{layer}.weight_scaleas-is. - Convert
{layer}.alpha * {layer}.input_global_scaleinto{layer}.weight_scale_2. - Convert
1 / {layer}.input_global_scaleinto{layer}.input_scale.
ComfyUI's mixed precision loader expects a {layer}.comfy_quant tensor
containing JSON bytes. For NVFP4 it then loads:
{layer}.weight{layer}.weight_scale_2{layer}.weight_scale- optional registered parameters such as
{layer}.input_scale
The converted file metadata also includes _quantization_metadata with one
nvfp4 layer entry per quantized layer so ComfyUI can select mixed precision
operations for the model.
H100 note
The conversion itself does not require a Blackwell GPU; it is a safetensors
layout conversion. However, Comfy Kitchen documents TensorCoreNVFP4Layout as
requiring SM >= 10.0 / Blackwell for native NVFP4 tensor-core acceleration. H100
is Hopper, so ComfyUI may disable native NVFP4 compute and run a fallback path.
Script
The conversion script is:
python convert_lightx2v_nvfp4_to_comfy.py
Useful options:
python convert_lightx2v_nvfp4_to_comfy.py --dry-run
python convert_lightx2v_nvfp4_to_comfy.py --overwrite
python convert_lightx2v_nvfp4_to_comfy.py input.safetensors --output-dir /path/to/out
The script writes <original_stem>_comfy.safetensors and uses a temporary file
before renaming into place.
Converted outputs
Wan2.2-I2V-A14B_NVFP4_Sparse_high_comfy.safetensorsWan2.2-I2V-A14B_NVFP4_Sparse_low_comfy.safetensorsWan2.2-T2V-A14B_NVFP4_Sparse_high_comfy.safetensorsWan2.2-T2V-A14B_NVFP4_Sparse_low_comfy.safetensors
For ComfyUI native workflows, place these diffusion model files under:
ComfyUI/models/diffusion_models/
The Wan2.2 14B workflows still need the normal text encoder and VAE files in their ComfyUI locations.
Verification performed
For each converted file:
- Tensor count is 2695.
_quantization_metadatacontains 400 quantized layers.alphacount is 0.input_global_scalecount is 0.input_scalecount is 400.weight_scalecount is 400.weight_scale_2count is 400.comfy_quantcount is 400.{layer}.comfy_quantdecodes to{"format": "nvfp4"}.- A sampled
blocks.0.cross_attn.k.weightblock equals the expected nibble swap from the original. - The sampled
weight_scale_2equalsalpha * input_global_scale. - The sampled
input_scaleequals1 / input_global_scale.