RedstoneWhite
/

HunyuanImage-3.0-Instruct-Distil-FP8-FlashInfer

hunyuan_image_3_moe

Model card Files Files and versions

RedstoneWhite commited on Apr 21

Commit

9d0ae80

·

verified ·

1 Parent(s): 4f7de87

Update README.md

Files changed (1) hide show

README.md +2 -4

README.md CHANGED Viewed

@@ -6,11 +6,9 @@ base_model:
 # What's New
-Quantized to FP8 using LLM-compressor with similar recipe from [HunyuanImage-3.0-Instruct-Distil-INT8-v2](https://huggingface.co/EricRollei/HunyuanImage-3.0-Instruct-Distil-INT8-v2)
-Enabling quantized inference with FlashInfer that fits in a single DGX Spark. Tested on a DGX Spark Founder Edition with FlashInfer==0.6.8 and a modified [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) node in ComfyUI.
-Inference time with cot_recaption reduced from ~1400s to ~200s on DGX Spark.
 # Original README

 # What's New
+This repo contains an FP8 quantized HunyuanImage-3.0-Instruct-Distil using LLM-compressor with a similar recipe from [HunyuanImage-3.0-Instruct-Distil-INT8-v2](https://huggingface.co/EricRollei/HunyuanImage-3.0-Instruct-Distil-INT8-v2). Model codes are patched to enable inference with FlashInfer CUTLASS FP8 MoE kernel. Tested on a DGX Spark Founder Edition with FlashInfer==0.6.8 and a modified [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) node in ComfyUI. Inference time with cot_recaption reduced from ~1400s to ~200s. Patched ComfyUI nodes will be uploaded soon.
+Feel free to open an issue if you encounter any problem when trying to use it.
 # Original README