---
library_name: diffusers
pipeline_tag: text-to-image
base_model:
- black-forest-labs/FLUX.2-klein-9B
base_model_relation: quantized
tags:
  - text-to-image
  - image-editing
  - diffusion
  - quantized
  - quantfunc
  - flux
language:
  - en
license: other
license_name: flux-non-commercial-license
license_link: LICENSE
---

<!-- QF-LICENSE-BLOCK:START -->
## ⚠️ License — Non-Commercial Use Only

These are **quantized derivative weights** of [`black-forest-labs/FLUX.2-klein-9B`](https://huggingface.co/black-forest-labs/FLUX.2-klein-9B) (**FLUX.2 [klein] 9B**), which is
licensed under the **FLUX Non-Commercial License v2.1** by Black Forest Labs.

> This FLUX Model is licensed by Black Forest Labs Inc. under the FLUX Non-Commercial License.

- **Non-commercial use only.** These weights may **not** be used for any commercial or
  revenue-generating purpose. Commercial use requires a separate license from Black Forest
  Labs — see https://bfl.ai/licensing .
- **Full license:** included as [`LICENSE`](./LICENSE) (FLUX Non-Commercial License v2.1).
- **Modifications:** quantized from FLUX.2 [klein] 9B by the QuantFunc inference engine.
- This is **not** an official Black Forest Labs product and is not endorsed by BFL.

> **Disclaimer:** Derived from FLUX.2 [klein] by Black Forest Labs. This is **not an official Black Forest Labs product** and is not endorsed by or affiliated with BFL. "FLUX" is a trademark of Black Forest Labs.
<!-- QF-LICENSE-BLOCK:END -->

# QuantFunc

<div align="center" style="margin-top: 50px;">
  <img src="assets/logo.webp" width="300" alt="Logo">
</div>

<p align="center">
  🤗 <a href="https://huggingface.co/QuantFunc">Hugging Face</a> &nbsp;|&nbsp;
  🤖 <a href="https://www.modelscope.cn/profile/QuantFunc">ModelScope</a> &nbsp;|&nbsp;
  💻 <a href="https://github.com/RealJonathanYip/ComfyUI-QuantFunc">GitHub</a> &nbsp;|&nbsp;
  💬 <a href="#wechat">WeChat (微信)</a> &nbsp;|&nbsp;
  🎮 <a href="https://discord.gg/jCp9TpFWcn">Discord</a>
</p>

> ⚡ **FLUX.2 Klein 9B — the highest-quality Klein tier, pre-quantized.** Text-to-image and reference-based editing at **2x–11x** with the QuantFunc plugin.

The larger **9B** Klein model for maximum fidelity, shipped as **distilled (4-step)** + **base (28-step)** transformers across **three GPU tiers** (`50x` FP4 · `40x` INT4+FP8 · `30x-below` INT4+INT8).

**Powered by the [QuantFunc ComfyUI plugin](https://github.com/RealJonathanYip/ComfyUI-QuantFunc) — the fastest diffusion inference engine:**

- 🚀 **2x–11x speedup** over standard BF16/FP16 Python pipelines (pre-exported → even faster loading).
- ⚙️ **Native C++/CUDA** (`libquantfunc.so` / `quantfunc.dll`) with **zero Python model dependencies**.
- 🧩 **Dual engine** (SVDQ offline + Lighting runtime 4-bit), **zero-cost LoRA stacking**, reference-image editing & inpainting.
- 🟢 **Full GPU coverage** — RTX 20/30/40/50 · A100/H100/H200/B100/B200/GB300 · RTX 6000 Ada / PRO Blackwell (CUDA 12 & 13); native **FP4** on Blackwell.

👉 **Install the plugin:** **https://github.com/RealJonathanYip/ComfyUI-QuantFunc**

# Klein-9B-Series

Pre-quantized **FLUX.2 Klein 9B** model series by [QuantFunc](https://github.com/RealJonathanYip), Lighting backend. Text-to-image and reference-based image editing.

> ✨ **Both the distilled AND the non-distilled (base) model are supported**, and the series ships **three GPU tiers** so every card gets the best path it can run:
> **`50x`** (Blackwell, FP4) · **`40x`** (RTX 40 / Ada & Hopper, INT4 + FP8) · **`30x-below`** (RTX 30 and below, INT4 + INT8).

## Overview

FLUX.2 Klein is Black Forest Labs' Flux.2 family. The **9B** variant (the larger, higher-quality variant, transformer K=4096). QuantFunc ships, pre-quantized:

- **Distilled** transformer — 4-step, fastest few-step generation/editing.
- **Base / non-distilled** transformer — the full 28-step model with classical CFG (`--guidance-scale 4.0`), highest quality.

…each in 3 hardware tiers (below). Distilled and base **share the same base-model** — only the transformer file differs.

## Hardware tiers (pick by GPU)

FP4 needs Blackwell (SM120); FP8 needs Ada (SM89) or Hopper (SM90) — e.g. RTX 40 / L40 / H100 / H200; INT4/INT8 run everywhere (Ampere/Turing, e.g. RTX 30/20, A100). So:

| Tier | GPUs | attention + FFN | modulation/embedders/head | base-model |
|------|------|-----------------|---------------------------|-----------|
| **`50x`** | **Blackwell (SM120+)** — RTX 50 series, B100/B200/GB200, RTX PRO Blackwell | **FP4** | **FP8** | `klein-9b-series-50x-above-base-model` (FP4 text encoder) |
| **`40x`** | **RTX 40 / Ada (SM89) & Hopper (SM90)** — RTX 40 series, L40/L40S, **H100, H200** | **INT4** | **FP8** | `klein-9b-series-50x-below-base-model` (INT4 text encoder) |
| **`30x-below`** | **RTX 30 and below (pre-FP8)** — RTX 30/20, A100, A40, T4, down to RTX 2080 | **INT4** | **INT8** | `klein-9b-series-50x-below-base-model` (INT4 text encoder) |

> `40x` and `30x-below` **share** the same INT4 base-model — they differ only in the transformer's 8-bit precision (FP8 vs INT8). `50x` uses the FP4 base-model.

## Directory Structure

```
Klein-9B-Series/
├── klein-9b-series-50x-above-base-model/         # FP4 text encoder + VAE(enc+dec) + tokenizer + scheduler   (50x)
├── klein-9b-series-50x-below-base-model/   # INT4 text encoder + VAE(enc+dec) + tokenizer + scheduler  (40x & 30x-below)
├── transformer/
│   ├── config.json
│   ├── klein-9b-50x-lighting.safetensors             # distilled, FP4   (50x)
│   ├── klein-9b-base-50x-lighting.safetensors        # base 28-step, FP4 (50x)
│   ├── klein-9b-40x-lighting.safetensors             # distilled, INT4 + FP8  (40x)
│   ├── klein-9b-base-40x-lighting.safetensors        # base 28-step, INT4 + FP8(40x)
│   ├── klein-9b-30x-below-lighting.safetensors       # distilled, INT4 + INT8 (30x-below)
│   └── klein-9b-base-30x-below-lighting.safetensors  # base 28-step, INT4 + INT8(30x-below)
└── precision-config/
    ├── 50x-fp4-f8-sample.json
    ├── 40x-int4-f8-sample.json
    └── 30x-below-int4-i8-sample.json
```

> **Status:** ✓ All weights uploaded; the VAE includes **both encoder and decoder**. Every tier × {distilled, base} is visually validated to generate correctly.

## Distilled (4-step) vs Base (28-step)

| Transformer | Source | Steps | Guidance | Best for |
|---|---|---|---|---|
| `klein-9b-<tier>-lighting.safetensors` | Klein **distilled** | 4 | none (guidance-distilled) | Fastest |
| `klein-9b-base-<tier>-lighting.safetensors` | Klein **base** | 28 | `--guidance-scale 4.0` (classical CFG) | Highest quality |

## Inference

```bash
# 50x — Blackwell (RTX 50 / B-series). Distilled, 4-step:
quantfunc --model-dir klein-9b-series-50x-above-base-model \
  --transformer transformer/klein-9b-50x-lighting.safetensors \
  --model-backend lighting --auto-optimize --steps 4 \
  --prompt "a cute cat on a windowsill, watercolor style" --output out.png

# 40x — RTX 40 / Ada or Hopper (H100/H200). Base 28-step (classical CFG):
quantfunc --model-dir klein-9b-series-50x-below-base-model \
  --transformer transformer/klein-9b-base-40x-lighting.safetensors \
  --model-backend lighting --auto-optimize --steps 28 --guidance-scale 4.0 \
  --prompt "a cute cat on a windowsill, watercolor style" --output out.png

# 30x-below — RTX 30 and below. Distilled, 4-step:
quantfunc --model-dir klein-9b-series-50x-below-base-model \
  --transformer transformer/klein-9b-30x-below-lighting.safetensors \
  --model-backend lighting --auto-optimize --steps 4 \
  --prompt "a cute cat on a windowsill, watercolor style" --output out.png
```

`--auto-optimize` picks the VRAM/attention/compression strategy for your GPU. The ComfyUI Lighting plugin auto-selects the matching tier + precision-config.

## Precision Config (precision-config/)

| File | Tier / GPU | attention+FFN | islands |
|------|-----------|---------------|---------|
| `50x-fp4-f8-sample.json` | 50x — Blackwell (SM120+) | FP4 | FP8 |
| `40x-int4-f8-sample.json` | 40x — Ada (SM89) & Hopper (SM90): RTX 40, L40, H100, H200 | INT4 | FP8 |
| `30x-below-int4-i8-sample.json` | 30x-below — RTX 30/20, A100 (pre-FP8) | INT4 | INT8 |

These per-layer configs control the Lighting backend's quantization precision — customize for your own speed/quality trade-off.

## Related Repositories

- [QuantFunc/Klein-4B-Series](https://huggingface.co/QuantFunc/Klein-4B-Series) — FLUX.2 Klein 4B
- [QuantFunc/Qwen-Image-Series](https://huggingface.co/QuantFunc/Qwen-Image-Series) · [QuantFunc/Qwen-Image-Edit-Series](https://huggingface.co/QuantFunc/Qwen-Image-Edit-Series) · [QuantFunc/Z-Image-Series](https://huggingface.co/QuantFunc/Z-Image-Series)

## License

The pre-quantized weights are derived from FLUX.2 Klein. Users must comply with the original Black Forest Labs FLUX.2 license. The QuantFunc inference engine and plugins are licensed separately.

## Community

Join our community for support, updates, and discussions:

- 🎮 [Discord server](https://discord.gg/jCp9TpFWcn)
- 💬 Scan the QR code below to join our WeChat group:

<div align="center" id="wechat">
  <img src="https://raw.githubusercontent.com/RealJonathanYip/ComfyUI-QuantFunc/main/assets/WeChat.jpg" alt="WeChat Group" width="300">
</div>