RedstoneWhite
/

HunyuanImage-3.0-Instruct-Distil-FP8-FlashInfer

hunyuan_image_3_moe

Model card Files Files and versions

RedstoneWhite commited on Apr 20

Commit

4f7de87

·

verified ·

1 Parent(s): 339455b

Update README.md

Files changed (1) hide show

README.md +12 -2

README.md CHANGED Viewed

@@ -1,8 +1,18 @@
 ---
-license: other
 pipeline_tag: image-to-image
 ---
 [中文文档](./README_zh_CN.md)
@@ -533,4 +543,4 @@ We extend our heartfelt gratitude to the following open-source projects and comm
 [![GitHub forks](https://img.shields.io/github/forks/Tencent-Hunyuan/HunyuanImage-3.0?style=social)](https://github.com/Tencent-Hunyuan/HunyuanImage-3.0)
-[![Star History Chart](https://api.star-history.com/svg?repos=Tencent-Hunyuan/HunyuanImage-3.0&type=Date)](https://www.star-history.com/#Tencent-Hunyuan/HunyuanImage-3.0&Date)

 ---
 pipeline_tag: image-to-image
+base_model:
+- tencent/HunyuanImage-3.0-Instruct-Distil
 ---
+# What's New
+Quantized to FP8 using LLM-compressor with similar recipe from [HunyuanImage-3.0-Instruct-Distil-INT8-v2](https://huggingface.co/EricRollei/HunyuanImage-3.0-Instruct-Distil-INT8-v2)
+Enabling quantized inference with FlashInfer that fits in a single DGX Spark. Tested on a DGX Spark Founder Edition with FlashInfer==0.6.8 and a modified [Comfy_HunyuanImage3](https://github.com/EricRollei/Comfy_HunyuanImage3) node in ComfyUI.
+Inference time with cot_recaption reduced from ~1400s to ~200s on DGX Spark.
+# Original README
 [中文文档](./README_zh_CN.md)
 [![GitHub forks](https://img.shields.io/github/forks/Tencent-Hunyuan/HunyuanImage-3.0?style=social)](https://github.com/Tencent-Hunyuan/HunyuanImage-3.0)
+[![Star History Chart](https://api.star-history.com/svg?repos=Tencent-Hunyuan/HunyuanImage-3.0&type=Date)](https://www.star-history.com/#Tencent-Hunyuan/HunyuanImage-3.0&Date)