RedstoneWhite commited on
Commit
9d0ae80
·
verified ·
1 Parent(s): 4f7de87

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -4
README.md CHANGED
@@ -6,11 +6,9 @@ base_model:
6
 
7
  # What's New
8
 
9
- Quantized to FP8 using LLM-compressor with similar recipe from [HunyuanImage-3.0-Instruct-Distil-INT8-v2](https://huggingface.co/EricRollei/HunyuanImage-3.0-Instruct-Distil-INT8-v2)
10
 
11
- Enabling quantized inference with FlashInfer that fits in a single DGX Spark. Tested on a DGX Spark Founder Edition with FlashInfer==0.6.8 and a modified [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) node in ComfyUI.
12
-
13
- Inference time with cot_recaption reduced from ~1400s to ~200s on DGX Spark.
14
 
15
  # Original README
16
 
 
6
 
7
  # What's New
8
 
9
+ This repo contains an FP8 quantized HunyuanImage-3.0-Instruct-Distil using LLM-compressor with a similar recipe from [HunyuanImage-3.0-Instruct-Distil-INT8-v2](https://huggingface.co/EricRollei/HunyuanImage-3.0-Instruct-Distil-INT8-v2). Model codes are patched to enable inference with FlashInfer CUTLASS FP8 MoE kernel. Tested on a DGX Spark Founder Edition with FlashInfer==0.6.8 and a modified [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) node in ComfyUI. Inference time with cot_recaption reduced from ~1400s to ~200s. Patched ComfyUI nodes will be uploaded soon.
10
 
11
+ Feel free to open an issue if you encounter any problem when trying to use it.
 
 
12
 
13
  # Original README
14