Update README.md
Browse files
README.md
CHANGED
|
@@ -6,11 +6,9 @@ base_model:
|
|
| 6 |
|
| 7 |
# What's New
|
| 8 |
|
| 9 |
-
|
| 10 |
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
Inference time with cot_recaption reduced from ~1400s to ~200s on DGX Spark.
|
| 14 |
|
| 15 |
# Original README
|
| 16 |
|
|
|
|
| 6 |
|
| 7 |
# What's New
|
| 8 |
|
| 9 |
+
This repo contains an FP8 quantized HunyuanImage-3.0-Instruct-Distil using LLM-compressor with a similar recipe from [HunyuanImage-3.0-Instruct-Distil-INT8-v2](https://huggingface.co/EricRollei/HunyuanImage-3.0-Instruct-Distil-INT8-v2). Model codes are patched to enable inference with FlashInfer CUTLASS FP8 MoE kernel. Tested on a DGX Spark Founder Edition with FlashInfer==0.6.8 and a modified [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) node in ComfyUI. Inference time with cot_recaption reduced from ~1400s to ~200s. Patched ComfyUI nodes will be uploaded soon.
|
| 10 |
|
| 11 |
+
Feel free to open an issue if you encounter any problem when trying to use it.
|
|
|
|
|
|
|
| 12 |
|
| 13 |
# Original README
|
| 14 |
|