--- library_name: diffusers pipeline_tag: text-to-image base_model: - black-forest-labs/FLUX.2-klein-9B base_model_relation: quantized tags: - text-to-image - image-editing - diffusion - quantized - quantfunc - flux language: - en license: other license_name: flux-non-commercial-license license_link: LICENSE --- ## โš ๏ธ License โ€” Non-Commercial Use Only These are **quantized derivative weights** of [`black-forest-labs/FLUX.2-klein-9B`](https://huggingface.co/black-forest-labs/FLUX.2-klein-9B) (**FLUX.2 [klein] 9B**), which is licensed under the **FLUX Non-Commercial License v2.1** by Black Forest Labs. > This FLUX Model is licensed by Black Forest Labs Inc. under the FLUX Non-Commercial License. - **Non-commercial use only.** These weights may **not** be used for any commercial or revenue-generating purpose. Commercial use requires a separate license from Black Forest Labs โ€” see https://bfl.ai/licensing . - **Full license:** included as [`LICENSE`](./LICENSE) (FLUX Non-Commercial License v2.1). - **Modifications:** quantized from FLUX.2 [klein] 9B by the QuantFunc inference engine. - This is **not** an official Black Forest Labs product and is not endorsed by BFL. > **Disclaimer:** Derived from FLUX.2 [klein] by Black Forest Labs. This is **not an official Black Forest Labs product** and is not endorsed by or affiliated with BFL. "FLUX" is a trademark of Black Forest Labs. # QuantFunc
Logo

๐Ÿค— Hugging Face  |  ๐Ÿค– ModelScope  |  ๐Ÿ’ป GitHub  |  ๐Ÿ’ฌ WeChat (ๅพฎไฟก)  |  ๐ŸŽฎ Discord

> โšก **FLUX.2 Klein 9B โ€” the highest-quality Klein tier, pre-quantized.** Text-to-image and reference-based editing at **2xโ€“11x** with the QuantFunc plugin. The larger **9B** Klein model for maximum fidelity, shipped as **distilled (4-step)** + **base (28-step)** transformers across **three GPU tiers** (`50x` FP4 ยท `40x` INT4+FP8 ยท `30x-below` INT4+INT8). **Powered by the [QuantFunc ComfyUI plugin](https://github.com/RealJonathanYip/ComfyUI-QuantFunc) โ€” the fastest diffusion inference engine:** - ๐Ÿš€ **2xโ€“11x speedup** over standard BF16/FP16 Python pipelines (pre-exported โ†’ even faster loading). - โš™๏ธ **Native C++/CUDA** (`libquantfunc.so` / `quantfunc.dll`) with **zero Python model dependencies**. - ๐Ÿงฉ **Dual engine** (SVDQ offline + Lighting runtime 4-bit), **zero-cost LoRA stacking**, reference-image editing & inpainting. - ๐ŸŸข **Full GPU coverage** โ€” RTX 20/30/40/50 ยท A100/H100/H200/B100/B200/GB300 ยท RTX 6000 Ada / PRO Blackwell (CUDA 12 & 13); native **FP4** on Blackwell. ๐Ÿ‘‰ **Install the plugin:** **https://github.com/RealJonathanYip/ComfyUI-QuantFunc** # Klein-9B-Series Pre-quantized **FLUX.2 Klein 9B** model series by [QuantFunc](https://github.com/RealJonathanYip), Lighting backend. Text-to-image and reference-based image editing. > โœจ **Both the distilled AND the non-distilled (base) model are supported**, and the series ships **three GPU tiers** so every card gets the best path it can run: > **`50x`** (Blackwell, FP4) ยท **`40x`** (RTX 40 / Ada & Hopper, INT4 + FP8) ยท **`30x-below`** (RTX 30 and below, INT4 + INT8). ## Overview FLUX.2 Klein is Black Forest Labs' Flux.2 family. The **9B** variant (the larger, higher-quality variant, transformer K=4096). QuantFunc ships, pre-quantized: - **Distilled** transformer โ€” 4-step, fastest few-step generation/editing. - **Base / non-distilled** transformer โ€” the full 28-step model with classical CFG (`--guidance-scale 4.0`), highest quality. โ€ฆeach in 3 hardware tiers (below). Distilled and base **share the same base-model** โ€” only the transformer file differs. ## Hardware tiers (pick by GPU) FP4 needs Blackwell (SM120); FP8 needs Ada (SM89) or Hopper (SM90) โ€” e.g. RTX 40 / L40 / H100 / H200; INT4/INT8 run everywhere (Ampere/Turing, e.g. RTX 30/20, A100). So: | Tier | GPUs | attention + FFN | modulation/embedders/head | base-model | |------|------|-----------------|---------------------------|-----------| | **`50x`** | **Blackwell (SM120+)** โ€” RTX 50 series, B100/B200/GB200, RTX PRO Blackwell | **FP4** | **FP8** | `klein-9b-series-50x-above-base-model` (FP4 text encoder) | | **`40x`** | **RTX 40 / Ada (SM89) & Hopper (SM90)** โ€” RTX 40 series, L40/L40S, **H100, H200** | **INT4** | **FP8** | `klein-9b-series-50x-below-base-model` (INT4 text encoder) | | **`30x-below`** | **RTX 30 and below (pre-FP8)** โ€” RTX 30/20, A100, A40, T4, down to RTX 2080 | **INT4** | **INT8** | `klein-9b-series-50x-below-base-model` (INT4 text encoder) | > `40x` and `30x-below` **share** the same INT4 base-model โ€” they differ only in the transformer's 8-bit precision (FP8 vs INT8). `50x` uses the FP4 base-model. ## Directory Structure ``` Klein-9B-Series/ โ”œโ”€โ”€ klein-9b-series-50x-above-base-model/ # FP4 text encoder + VAE(enc+dec) + tokenizer + scheduler (50x) โ”œโ”€โ”€ klein-9b-series-50x-below-base-model/ # INT4 text encoder + VAE(enc+dec) + tokenizer + scheduler (40x & 30x-below) โ”œโ”€โ”€ transformer/ โ”‚ โ”œโ”€โ”€ config.json โ”‚ โ”œโ”€โ”€ klein-9b-50x-lighting.safetensors # distilled, FP4 (50x) โ”‚ โ”œโ”€โ”€ klein-9b-base-50x-lighting.safetensors # base 28-step, FP4 (50x) โ”‚ โ”œโ”€โ”€ klein-9b-40x-lighting.safetensors # distilled, INT4 + FP8 (40x) โ”‚ โ”œโ”€โ”€ klein-9b-base-40x-lighting.safetensors # base 28-step, INT4 + FP8(40x) โ”‚ โ”œโ”€โ”€ klein-9b-30x-below-lighting.safetensors # distilled, INT4 + INT8 (30x-below) โ”‚ โ””โ”€โ”€ klein-9b-base-30x-below-lighting.safetensors # base 28-step, INT4 + INT8(30x-below) โ””โ”€โ”€ precision-config/ โ”œโ”€โ”€ 50x-fp4-f8-sample.json โ”œโ”€โ”€ 40x-int4-f8-sample.json โ””โ”€โ”€ 30x-below-int4-i8-sample.json ``` > **Status:** โœ“ All weights uploaded; the VAE includes **both encoder and decoder**. Every tier ร— {distilled, base} is visually validated to generate correctly. ## Distilled (4-step) vs Base (28-step) | Transformer | Source | Steps | Guidance | Best for | |---|---|---|---|---| | `klein-9b--lighting.safetensors` | Klein **distilled** | 4 | none (guidance-distilled) | Fastest | | `klein-9b-base--lighting.safetensors` | Klein **base** | 28 | `--guidance-scale 4.0` (classical CFG) | Highest quality | ## Inference ```bash # 50x โ€” Blackwell (RTX 50 / B-series). Distilled, 4-step: quantfunc --model-dir klein-9b-series-50x-above-base-model \ --transformer transformer/klein-9b-50x-lighting.safetensors \ --model-backend lighting --auto-optimize --steps 4 \ --prompt "a cute cat on a windowsill, watercolor style" --output out.png # 40x โ€” RTX 40 / Ada or Hopper (H100/H200). Base 28-step (classical CFG): quantfunc --model-dir klein-9b-series-50x-below-base-model \ --transformer transformer/klein-9b-base-40x-lighting.safetensors \ --model-backend lighting --auto-optimize --steps 28 --guidance-scale 4.0 \ --prompt "a cute cat on a windowsill, watercolor style" --output out.png # 30x-below โ€” RTX 30 and below. Distilled, 4-step: quantfunc --model-dir klein-9b-series-50x-below-base-model \ --transformer transformer/klein-9b-30x-below-lighting.safetensors \ --model-backend lighting --auto-optimize --steps 4 \ --prompt "a cute cat on a windowsill, watercolor style" --output out.png ``` `--auto-optimize` picks the VRAM/attention/compression strategy for your GPU. The ComfyUI Lighting plugin auto-selects the matching tier + precision-config. ## Precision Config (precision-config/) | File | Tier / GPU | attention+FFN | islands | |------|-----------|---------------|---------| | `50x-fp4-f8-sample.json` | 50x โ€” Blackwell (SM120+) | FP4 | FP8 | | `40x-int4-f8-sample.json` | 40x โ€” Ada (SM89) & Hopper (SM90): RTX 40, L40, H100, H200 | INT4 | FP8 | | `30x-below-int4-i8-sample.json` | 30x-below โ€” RTX 30/20, A100 (pre-FP8) | INT4 | INT8 | These per-layer configs control the Lighting backend's quantization precision โ€” customize for your own speed/quality trade-off. ## Related Repositories - [QuantFunc/Klein-4B-Series](https://huggingface.co/QuantFunc/Klein-4B-Series) โ€” FLUX.2 Klein 4B - [QuantFunc/Qwen-Image-Series](https://huggingface.co/QuantFunc/Qwen-Image-Series) ยท [QuantFunc/Qwen-Image-Edit-Series](https://huggingface.co/QuantFunc/Qwen-Image-Edit-Series) ยท [QuantFunc/Z-Image-Series](https://huggingface.co/QuantFunc/Z-Image-Series) ## License The pre-quantized weights are derived from FLUX.2 Klein. Users must comply with the original Black Forest Labs FLUX.2 license. The QuantFunc inference engine and plugins are licensed separately. ## Community Join our community for support, updates, and discussions: - ๐ŸŽฎ [Discord server](https://discord.gg/jCp9TpFWcn) - ๐Ÿ’ฌ Scan the QR code below to join our WeChat group:
WeChat Group