Update README.md
Browse files
README.md
CHANGED
|
@@ -6,7 +6,7 @@ base_model:
|
|
| 6 |
|
| 7 |
# What's New
|
| 8 |
|
| 9 |
-
This repo contains an FP8 quantized HunyuanImage-3.0-Instruct-Distil using LLM-compressor with a similar recipe from [HunyuanImage-3.0-Instruct-Distil-INT8-v2](https://huggingface.co/EricRollei/HunyuanImage-3.0-Instruct-Distil-INT8-v2). Model codes are patched to enable inference with FlashInfer CUTLASS FP8 MoE kernel. Tested on a DGX Spark Founder Edition with FlashInfer==0.6.8 and a modified [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) node in ComfyUI. Inference time with cot_recaption reduced from ~1400s to ~200s. Patched ComfyUI nodes will be uploaded soon.
|
| 10 |
|
| 11 |
Feel free to open an issue if you encounter any problem when trying to use it.
|
| 12 |
|
|
|
|
| 6 |
|
| 7 |
# What's New
|
| 8 |
|
| 9 |
+
This repo contains an FP8 quantized HunyuanImage-3.0-Instruct-Distil using LLM-compressor with a similar recipe from [HunyuanImage-3.0-Instruct-Distil-INT8-v2](https://huggingface.co/EricRollei/HunyuanImage-3.0-Instruct-Distil-INT8-v2). Model codes are patched to enable inference with FlashInfer CUTLASS FP8 MoE kernel. Tested on a DGX Spark Founder Edition with FlashInfer==0.6.8 and a modified [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) node in ComfyUI. Inference time with cot_recaption reduced from ~1400s to ~200s. ~~Patched ComfyUI nodes will be uploaded soon.~~ Patched nodes can be found at [Here](https://github.com/redstonewhite/Comfy_HunyuanImage3). Use it just like the original one and you should be fine.
|
| 10 |
|
| 11 |
Feel free to open an issue if you encounter any problem when trying to use it.
|
| 12 |
|