Wan2.2-NVFP4-Sparse / analysis.md
charles2530's picture
Add files using upload-large-folder tool
e2634b7 verified
# Wan2.2 NVFP4 Sparse to ComfyUI Conversion Analysis
## Sources checked
- Kijai Hugging Face repo: https://huggingface.co/Kijai/WanVideo_comfy_nvfp4
- ComfyUI Wan2.2 workflow docs: https://docs.comfy.org/tutorials/video/wan/wan2_2
- ComfyUI Wan2.2 examples: https://comfyanonymous.github.io/ComfyUI_examples/wan22/
- ComfyUI mixed precision loader reference:
https://huggingface.co/mhnakif/comfy/blob/main/comfy/ops.py
- ComfyUI quant op reference:
https://huggingface.co/mhnakif/comfy/blob/main/comfy/quant_ops.py
- Comfy Kitchen hardware/backend reference:
https://github.com/Comfy-Org/comfy-kitchen
- Local ComfyUI source checkout used for verification:
`Comfy-Org/ComfyUI` commit
`5aa71b9bc28809a16596bb9fa3d0a6300d8e3f0e`
## Script provenance
`convert_lightx2v_nvfp4_to_comfy.py` is a local conversion script written for
this directory. It is not copied from one upstream script. The implementation is
derived from these upstream pages and ComfyUI source conventions:
- Kijai model page:
https://huggingface.co/Kijai/WanVideo_comfy_nvfp4
- This page gives the actual LightX2V NVFP4 to Comfy NVFP4 conversion rule:
nibble-swap packed U8 weights, keep `weight_scale`, set
`weight_scale_2 = alpha * input_global_scale`, and set
`input_scale = 1 / input_global_scale`.
- ComfyUI quantized loader:
https://github.com/Comfy-Org/ComfyUI/blob/5aa71b9bc28809a16596bb9fa3d0a6300d8e3f0e/comfy/ops.py#L1058-L1091
- This loader reads `{layer}.comfy_quant`, branches on `format == "nvfp4"`,
then requires `{layer}.weight_scale_2` and `{layer}.weight_scale`.
- ComfyUI quant algorithm registry:
https://github.com/Comfy-Org/ComfyUI/blob/5aa71b9bc28809a16596bb9fa3d0a6300d8e3f0e/comfy/quant_ops.py#L190-L205
- This defines the `nvfp4` storage dtype as `torch.uint8` and the parameter
set as `weight_scale`, `weight_scale_2`, and `input_scale`.
- ComfyUI quantization metadata handling:
https://github.com/Comfy-Org/ComfyUI/blob/5aa71b9bc28809a16596bb9fa3d0a6300d8e3f0e/comfy/utils.py#L1360-L1421
- This shows that `_quantization_metadata.layers` is converted into
`{layer}.comfy_quant` JSON byte tensors and that the presence of
`.comfy_quant` enables mixed quantized ops.
- ComfyUI native NVFP4 hardware gate:
https://github.com/Comfy-Org/ComfyUI/blob/5aa71b9bc28809a16596bb9fa3d0a6300d8e3f0e/comfy/model_management.py#L1877-L1885
- This returns true only for NVIDIA GPUs with compute capability major
version >= 10, which is why H100 can validate/load files but is not expected
to use native Blackwell NVFP4 tensor-core compute.
## Format findings
The original files in this directory are LightX2V NVFP4 Sparse safetensors. Each
file has 400 quantized Linear layers with these LightX2V-side tensors:
- `{layer}.weight`: packed NVFP4 values in `torch.uint8`
- `{layer}.weight_scale`: FP8 E4M3 block scale tensor
- `{layer}.alpha`: scalar post-matmul rescaler
- `{layer}.input_global_scale`: scalar input scale convention
Kijai's model card says the ComfyUI conversion is still NVFP4 and uses the same
datatype, but changes conventions:
- Swap the high/low nibbles in each packed `uint8` weight byte.
- Keep `{layer}.weight_scale` as-is.
- Convert `{layer}.alpha * {layer}.input_global_scale` into
`{layer}.weight_scale_2`.
- Convert `1 / {layer}.input_global_scale` into `{layer}.input_scale`.
ComfyUI's mixed precision loader expects a `{layer}.comfy_quant` tensor
containing JSON bytes. For NVFP4 it then loads:
- `{layer}.weight`
- `{layer}.weight_scale_2`
- `{layer}.weight_scale`
- optional registered parameters such as `{layer}.input_scale`
The converted file metadata also includes `_quantization_metadata` with one
`nvfp4` layer entry per quantized layer so ComfyUI can select mixed precision
operations for the model.
## H100 note
The conversion itself does not require a Blackwell GPU; it is a safetensors
layout conversion. However, Comfy Kitchen documents `TensorCoreNVFP4Layout` as
requiring SM >= 10.0 / Blackwell for native NVFP4 tensor-core acceleration. H100
is Hopper, so ComfyUI may disable native NVFP4 compute and run a fallback path.
## Script
The conversion script is:
```bash
python convert_lightx2v_nvfp4_to_comfy.py
```
Useful options:
```bash
python convert_lightx2v_nvfp4_to_comfy.py --dry-run
python convert_lightx2v_nvfp4_to_comfy.py --overwrite
python convert_lightx2v_nvfp4_to_comfy.py input.safetensors --output-dir /path/to/out
```
The script writes `<original_stem>_comfy.safetensors` and uses a temporary file
before renaming into place.
## Converted outputs
- `Wan2.2-I2V-A14B_NVFP4_Sparse_high_comfy.safetensors`
- `Wan2.2-I2V-A14B_NVFP4_Sparse_low_comfy.safetensors`
- `Wan2.2-T2V-A14B_NVFP4_Sparse_high_comfy.safetensors`
- `Wan2.2-T2V-A14B_NVFP4_Sparse_low_comfy.safetensors`
For ComfyUI native workflows, place these diffusion model files under:
```text
ComfyUI/models/diffusion_models/
```
The Wan2.2 14B workflows still need the normal text encoder and VAE files in
their ComfyUI locations.
## Verification performed
For each converted file:
- Tensor count is 2695.
- `_quantization_metadata` contains 400 quantized layers.
- `alpha` count is 0.
- `input_global_scale` count is 0.
- `input_scale` count is 400.
- `weight_scale` count is 400.
- `weight_scale_2` count is 400.
- `comfy_quant` count is 400.
- `{layer}.comfy_quant` decodes to `{"format": "nvfp4"}`.
- A sampled `blocks.0.cross_attn.k.weight` block equals the expected nibble
swap from the original.
- The sampled `weight_scale_2` equals `alpha * input_global_scale`.
- The sampled `input_scale` equals `1 / input_global_scale`.