convert_to_quant / TODO.md
silveroxides's picture
temp: Disable MXFP8/NVFP4, remove comfy-kitchen (build failure)
aa41047
# TODO: MXFP8/NVFP4 Support
## Status: Temporarily Disabled
MXFP8 and NVFP4 quantization formats are temporarily disabled due to build issues with comfy-kitchen on HuggingFace Space infrastructure.
## Issue
The comfy-kitchen CUDA build fails due to a CUDA 12.9/glibc header conflict:
- `cospi`/`sinpi` function exception specification mismatch between CUDA's `math_functions.h` and system headers
## Planned Resolution
Options being considered:
1. **Pre-built wheel**: Host a pre-compiled comfy-kitchen wheel
2. **Custom Dockerfile**: Build comfy-kitchen in a controlled environment
3. **PyTorch fallback**: Implement pure PyTorch quantization as fallback
## Currently Available Formats
- FP8 Tensorwise (per-tensor scaling)
- FP8 Block (per-block scaling, 64 or 128 block size)
- INT8 Block (Triton-based, 128 block size)
## Reference
- comfy-kitchen branch: `sc_mm_mxfp8_sync`
- MXFP8 requires SM >= 10.0 (Blackwell GPU)
- NVFP4 requires SM >= 10.0/12.0 (Blackwell GPU)