RedstoneWhite
/

HunyuanImage-3.0-Instruct-Distil-FP8-FlashInfer

hunyuan_image_3_moe

Model card Files Files and versions

RedstoneWhite commited on about 1 month ago

Commit

2f86691

·

verified ·

1 Parent(s): 9d0ae80

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -6,7 +6,7 @@ base_model:
 # What's New
-This repo contains an FP8 quantized HunyuanImage-3.0-Instruct-Distil using LLM-compressor with a similar recipe from [HunyuanImage-3.0-Instruct-Distil-INT8-v2](https://huggingface.co/EricRollei/HunyuanImage-3.0-Instruct-Distil-INT8-v2). Model codes are patched to enable inference with FlashInfer CUTLASS FP8 MoE kernel. Tested on a DGX Spark Founder Edition with FlashInfer==0.6.8 and a modified [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) node in ComfyUI. Inference time with cot_recaption reduced from ~1400s to ~200s. Patched ComfyUI nodes will be uploaded soon.
 Feel free to open an issue if you encounter any problem when trying to use it.

 # What's New
+This repo contains an FP8 quantized HunyuanImage-3.0-Instruct-Distil using LLM-compressor with a similar recipe from [HunyuanImage-3.0-Instruct-Distil-INT8-v2](https://huggingface.co/EricRollei/HunyuanImage-3.0-Instruct-Distil-INT8-v2). Model codes are patched to enable inference with FlashInfer CUTLASS FP8 MoE kernel. Tested on a DGX Spark Founder Edition with FlashInfer==0.6.8 and a modified [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) node in ComfyUI. Inference time with cot_recaption reduced from ~1400s to ~200s. ~~Patched ComfyUI nodes will be uploaded soon.~~ Patched nodes can be found at [Here](https://github.com/redstonewhite/Comfy_HunyuanImage3). Use it just like the original one and you should be fine.
 Feel free to open an issue if you encounter any problem when trying to use it.