drbaph
/

HunyuanImage-2.1_fp8

@@ -13,18 +13,15 @@ tags:
 pipeline_tag: text-to-image
 extra_gated_eu_disallowed: true
 ---
-<p align="center">
   <img src="https://cdn-uploads.huggingface.co/production/uploads/63473b59e5c0717e6737b872/5DZez8C7TeFwRn3FcKDix.png" alt="HunyuanImage-2.1 Banner" />
-</p>
-<div align="center">
-# **HunyuanImage-2.1**
-### An Efficient Diffusion Model for High-Resolution (2K) Text-to-Image Generation
 </div>
 <div align="center">
   <a href="https://github.com/Tencent-Hunyuan/HunyuanImage-2.1" target="_blank"><img src="https://img.shields.io/badge/Code-black.svg?logo=github" height="22px"></a>
   <a href="https://huggingface.co/spaces/tencent/HunyuanImage-2.1" target="_blank">
@@ -41,33 +38,27 @@ extra_gated_eu_disallowed: true
 > When using **HunyuanImage-2.1** with the **quantized encoder** + **quantized base model**,
 > the VRAM usage on an **NVIDIA RTX 5090** typically ranges between **26 GB and 30 GB** with average
 > 16 second inference time depending on resolution, batch size, and prompt complexity.
 ⚠ **Important Note:**
-The **refiner** and **distilled model** are **not yet implemented** and are **not ready for use in ComfyUI**.
-Currently, **only the base model** is supported.
 ---
 <p align="center">
   <img src="https://cdn-uploads.huggingface.co/production/uploads/63473b59e5c0717e6737b872/auZ_xmiKPw0QdBYUrTLn-.png" alt="Image1"/>
 </p>
 <p align="center">
   <img src="https://cdn-uploads.huggingface.co/production/uploads/63473b59e5c0717e6737b872/qod1zCPWjzOZSNcOWx49-.png" alt="Image2"/>
 </p>
 ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/63473b59e5c0717e6737b872/drMNYMjvB01RvgZKS6kX6.jpeg)
 ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/63473b59e5c0717e6737b872/uxhsoLKjzJu24eCZh_RQ8.jpeg)
 ---
 ## **Download Quantized Model (FP8 e4m3fn)**
 [**Download hunyuanimage2.1_fp8_e4m3fn.safetensors**](https://huggingface.co/drbaph/HunyuanImage-2.1_fp8/blob/main/hunyuanimage2.1_fp8_e4m3fn.safetensors)
 ---
 ### **Workflow Notes**
 - **Model:** HunyuanImage-2.1
 - **Mode:** Quantized Encoder + Quantized Base Model
@@ -75,11 +66,10 @@ Currently, **only the base model** is supported.
 - **Resolution Tested:** 2K (2048×2048)
 - **Frameworks:** ComfyUI & Diffusers
 - **Optimisations** Works with Patch Sage Attention + Lazycache / TeaCache ✅
-- **Refiner & Distilled Model:** ❌ Not implemented yet, **not available in ComfyUI**
 - **License:** [tencent-hunyuan-community](https://github.com/Tencent-Hunyuan/HunyuanImage-2.1/blob/master/LICENSE)
 ---
 <p align="center">
   🚀 **Optimized for High-Resolution, Memory-Efficient Text-to-Image Generation**
-</p>

 pipeline_tag: text-to-image
 extra_gated_eu_disallowed: true
 ---
+<div align="center">
   <img src="https://cdn-uploads.huggingface.co/production/uploads/63473b59e5c0717e6737b872/5DZez8C7TeFwRn3FcKDix.png" alt="HunyuanImage-2.1 Banner" />
+  <h1> HunyuanImage-2.1 fp8 e4m3fn  </h1>
+  <h2>An Efficient Diffusion Model for High-Resolution (2K) Text-to-Image Generation</h2>
+</div>
 </div>
 <div align="center">
   <a href="https://github.com/Tencent-Hunyuan/HunyuanImage-2.1" target="_blank"><img src="https://img.shields.io/badge/Code-black.svg?logo=github" height="22px"></a>
   <a href="https://huggingface.co/spaces/tencent/HunyuanImage-2.1" target="_blank">
 > When using **HunyuanImage-2.1** with the **quantized encoder** + **quantized base model**,
 > the VRAM usage on an **NVIDIA RTX 5090** typically ranges between **26 GB and 30 GB** with average
 > 16 second inference time depending on resolution, batch size, and prompt complexity.
+> **Reports that it works on 16gb VRAM GPU's**
 ⚠ **Important Note:**
+The **refiner** is still not implemented and is **not ready for use in ComfyUI**.
+However, the **distilled model now works in ComfyUI** with recommended settings of **8 steps / 1.5-2.5 CFG**.
 ---
 <p align="center">
   <img src="https://cdn-uploads.huggingface.co/production/uploads/63473b59e5c0717e6737b872/auZ_xmiKPw0QdBYUrTLn-.png" alt="Image1"/>
 </p>
 <p align="center">
   <img src="https://cdn-uploads.huggingface.co/production/uploads/63473b59e5c0717e6737b872/qod1zCPWjzOZSNcOWx49-.png" alt="Image2"/>
 </p>
 ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/63473b59e5c0717e6737b872/drMNYMjvB01RvgZKS6kX6.jpeg)
 ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/63473b59e5c0717e6737b872/uxhsoLKjzJu24eCZh_RQ8.jpeg)
 ---
 ## **Download Quantized Model (FP8 e4m3fn)**
 [**Download hunyuanimage2.1_fp8_e4m3fn.safetensors**](https://huggingface.co/drbaph/HunyuanImage-2.1_fp8/blob/main/hunyuanimage2.1_fp8_e4m3fn.safetensors)
 ---
 ### **Workflow Notes**
 - **Model:** HunyuanImage-2.1
 - **Mode:** Quantized Encoder + Quantized Base Model
 - **Resolution Tested:** 2K (2048×2048)
 - **Frameworks:** ComfyUI & Diffusers
 - **Optimisations** Works with Patch Sage Attention + Lazycache / TeaCache ✅
+- **Distilled Model:** ✅ Now works in ComfyUI with **8 steps / 1.5-2.5 CFG**
+- **Refiner:** ❌ Still not implemented, **not available in ComfyUI**
 - **License:** [tencent-hunyuan-community](https://github.com/Tencent-Hunyuan/HunyuanImage-2.1/blob/master/LICENSE)
 ---
 <p align="center">
   🚀 **Optimized for High-Resolution, Memory-Efficient Text-to-Image Generation**
+</p>