convert_to_quant / TODO.md
silveroxides's picture
temp: Disable MXFP8/NVFP4, remove comfy-kitchen (build failure)
aa41047

A newer version of the Gradio SDK is available: 6.9.0

Upgrade

TODO: MXFP8/NVFP4 Support

Status: Temporarily Disabled

MXFP8 and NVFP4 quantization formats are temporarily disabled due to build issues with comfy-kitchen on HuggingFace Space infrastructure.

Issue

The comfy-kitchen CUDA build fails due to a CUDA 12.9/glibc header conflict:

  • cospi/sinpi function exception specification mismatch between CUDA's math_functions.h and system headers

Planned Resolution

Options being considered:

  1. Pre-built wheel: Host a pre-compiled comfy-kitchen wheel
  2. Custom Dockerfile: Build comfy-kitchen in a controlled environment
  3. PyTorch fallback: Implement pure PyTorch quantization as fallback

Currently Available Formats

  • FP8 Tensorwise (per-tensor scaling)
  • FP8 Block (per-block scaling, 64 or 128 block size)
  • INT8 Block (Triton-based, 128 block size)

Reference

  • comfy-kitchen branch: sc_mm_mxfp8_sync
  • MXFP8 requires SM >= 10.0 (Blackwell GPU)
  • NVFP4 requires SM >= 10.0/12.0 (Blackwell GPU)