Klein-9B-Series / README.md
QuantFunc's picture
docs: point GitHub/QR links to RealJonathanYip (repo migrated)
92b6098 verified
|
Raw
History Blame Contribute Delete
9.41 kB
---
library_name: diffusers
pipeline_tag: text-to-image
base_model:
- black-forest-labs/FLUX.2-klein-9B
base_model_relation: quantized
tags:
- text-to-image
- image-editing
- diffusion
- quantized
- quantfunc
- flux
language:
- en
license: other
license_name: flux-non-commercial-license
license_link: LICENSE
---
<!-- QF-LICENSE-BLOCK:START -->
## โš ๏ธ License โ€” Non-Commercial Use Only
These are **quantized derivative weights** of [`black-forest-labs/FLUX.2-klein-9B`](https://huggingface.co/black-forest-labs/FLUX.2-klein-9B) (**FLUX.2 [klein] 9B**), which is
licensed under the **FLUX Non-Commercial License v2.1** by Black Forest Labs.
> This FLUX Model is licensed by Black Forest Labs Inc. under the FLUX Non-Commercial License.
- **Non-commercial use only.** These weights may **not** be used for any commercial or
revenue-generating purpose. Commercial use requires a separate license from Black Forest
Labs โ€” see https://bfl.ai/licensing .
- **Full license:** included as [`LICENSE`](./LICENSE) (FLUX Non-Commercial License v2.1).
- **Modifications:** quantized from FLUX.2 [klein] 9B by the QuantFunc inference engine.
- This is **not** an official Black Forest Labs product and is not endorsed by BFL.
> **Disclaimer:** Derived from FLUX.2 [klein] by Black Forest Labs. This is **not an official Black Forest Labs product** and is not endorsed by or affiliated with BFL. "FLUX" is a trademark of Black Forest Labs.
<!-- QF-LICENSE-BLOCK:END -->
# QuantFunc
<div align="center" style="margin-top: 50px;">
<img src="assets/logo.webp" width="300" alt="Logo">
</div>
<p align="center">
๐Ÿค— <a href="https://huggingface.co/QuantFunc">Hugging Face</a> &nbsp;|&nbsp;
๐Ÿค– <a href="https://www.modelscope.cn/profile/QuantFunc">ModelScope</a> &nbsp;|&nbsp;
๐Ÿ’ป <a href="https://github.com/RealJonathanYip/ComfyUI-QuantFunc">GitHub</a> &nbsp;|&nbsp;
๐Ÿ’ฌ <a href="#wechat">WeChat (ๅพฎไฟก)</a> &nbsp;|&nbsp;
๐ŸŽฎ <a href="https://discord.gg/jCp9TpFWcn">Discord</a>
</p>
> โšก **FLUX.2 Klein 9B โ€” the highest-quality Klein tier, pre-quantized.** Text-to-image and reference-based editing at **2xโ€“11x** with the QuantFunc plugin.
The larger **9B** Klein model for maximum fidelity, shipped as **distilled (4-step)** + **base (28-step)** transformers across **three GPU tiers** (`50x` FP4 ยท `40x` INT4+FP8 ยท `30x-below` INT4+INT8).
**Powered by the [QuantFunc ComfyUI plugin](https://github.com/RealJonathanYip/ComfyUI-QuantFunc) โ€” the fastest diffusion inference engine:**
- ๐Ÿš€ **2xโ€“11x speedup** over standard BF16/FP16 Python pipelines (pre-exported โ†’ even faster loading).
- โš™๏ธ **Native C++/CUDA** (`libquantfunc.so` / `quantfunc.dll`) with **zero Python model dependencies**.
- ๐Ÿงฉ **Dual engine** (SVDQ offline + Lighting runtime 4-bit), **zero-cost LoRA stacking**, reference-image editing & inpainting.
- ๐ŸŸข **Full GPU coverage** โ€” RTX 20/30/40/50 ยท A100/H100/H200/B100/B200/GB300 ยท RTX 6000 Ada / PRO Blackwell (CUDA 12 & 13); native **FP4** on Blackwell.
๐Ÿ‘‰ **Install the plugin:** **https://github.com/RealJonathanYip/ComfyUI-QuantFunc**
# Klein-9B-Series
Pre-quantized **FLUX.2 Klein 9B** model series by [QuantFunc](https://github.com/RealJonathanYip), Lighting backend. Text-to-image and reference-based image editing.
> โœจ **Both the distilled AND the non-distilled (base) model are supported**, and the series ships **three GPU tiers** so every card gets the best path it can run:
> **`50x`** (Blackwell, FP4) ยท **`40x`** (RTX 40 / Ada & Hopper, INT4 + FP8) ยท **`30x-below`** (RTX 30 and below, INT4 + INT8).
## Overview
FLUX.2 Klein is Black Forest Labs' Flux.2 family. The **9B** variant (the larger, higher-quality variant, transformer K=4096). QuantFunc ships, pre-quantized:
- **Distilled** transformer โ€” 4-step, fastest few-step generation/editing.
- **Base / non-distilled** transformer โ€” the full 28-step model with classical CFG (`--guidance-scale 4.0`), highest quality.
โ€ฆeach in 3 hardware tiers (below). Distilled and base **share the same base-model** โ€” only the transformer file differs.
## Hardware tiers (pick by GPU)
FP4 needs Blackwell (SM120); FP8 needs Ada (SM89) or Hopper (SM90) โ€” e.g. RTX 40 / L40 / H100 / H200; INT4/INT8 run everywhere (Ampere/Turing, e.g. RTX 30/20, A100). So:
| Tier | GPUs | attention + FFN | modulation/embedders/head | base-model |
|------|------|-----------------|---------------------------|-----------|
| **`50x`** | **Blackwell (SM120+)** โ€” RTX 50 series, B100/B200/GB200, RTX PRO Blackwell | **FP4** | **FP8** | `klein-9b-series-50x-above-base-model` (FP4 text encoder) |
| **`40x`** | **RTX 40 / Ada (SM89) & Hopper (SM90)** โ€” RTX 40 series, L40/L40S, **H100, H200** | **INT4** | **FP8** | `klein-9b-series-50x-below-base-model` (INT4 text encoder) |
| **`30x-below`** | **RTX 30 and below (pre-FP8)** โ€” RTX 30/20, A100, A40, T4, down to RTX 2080 | **INT4** | **INT8** | `klein-9b-series-50x-below-base-model` (INT4 text encoder) |
> `40x` and `30x-below` **share** the same INT4 base-model โ€” they differ only in the transformer's 8-bit precision (FP8 vs INT8). `50x` uses the FP4 base-model.
## Directory Structure
```
Klein-9B-Series/
โ”œโ”€โ”€ klein-9b-series-50x-above-base-model/ # FP4 text encoder + VAE(enc+dec) + tokenizer + scheduler (50x)
โ”œโ”€โ”€ klein-9b-series-50x-below-base-model/ # INT4 text encoder + VAE(enc+dec) + tokenizer + scheduler (40x & 30x-below)
โ”œโ”€โ”€ transformer/
โ”‚ โ”œโ”€โ”€ config.json
โ”‚ โ”œโ”€โ”€ klein-9b-50x-lighting.safetensors # distilled, FP4 (50x)
โ”‚ โ”œโ”€โ”€ klein-9b-base-50x-lighting.safetensors # base 28-step, FP4 (50x)
โ”‚ โ”œโ”€โ”€ klein-9b-40x-lighting.safetensors # distilled, INT4 + FP8 (40x)
โ”‚ โ”œโ”€โ”€ klein-9b-base-40x-lighting.safetensors # base 28-step, INT4 + FP8(40x)
โ”‚ โ”œโ”€โ”€ klein-9b-30x-below-lighting.safetensors # distilled, INT4 + INT8 (30x-below)
โ”‚ โ””โ”€โ”€ klein-9b-base-30x-below-lighting.safetensors # base 28-step, INT4 + INT8(30x-below)
โ””โ”€โ”€ precision-config/
โ”œโ”€โ”€ 50x-fp4-f8-sample.json
โ”œโ”€โ”€ 40x-int4-f8-sample.json
โ””โ”€โ”€ 30x-below-int4-i8-sample.json
```
> **Status:** โœ“ All weights uploaded; the VAE includes **both encoder and decoder**. Every tier ร— {distilled, base} is visually validated to generate correctly.
## Distilled (4-step) vs Base (28-step)
| Transformer | Source | Steps | Guidance | Best for |
|---|---|---|---|---|
| `klein-9b-<tier>-lighting.safetensors` | Klein **distilled** | 4 | none (guidance-distilled) | Fastest |
| `klein-9b-base-<tier>-lighting.safetensors` | Klein **base** | 28 | `--guidance-scale 4.0` (classical CFG) | Highest quality |
## Inference
```bash
# 50x โ€” Blackwell (RTX 50 / B-series). Distilled, 4-step:
quantfunc --model-dir klein-9b-series-50x-above-base-model \
--transformer transformer/klein-9b-50x-lighting.safetensors \
--model-backend lighting --auto-optimize --steps 4 \
--prompt "a cute cat on a windowsill, watercolor style" --output out.png
# 40x โ€” RTX 40 / Ada or Hopper (H100/H200). Base 28-step (classical CFG):
quantfunc --model-dir klein-9b-series-50x-below-base-model \
--transformer transformer/klein-9b-base-40x-lighting.safetensors \
--model-backend lighting --auto-optimize --steps 28 --guidance-scale 4.0 \
--prompt "a cute cat on a windowsill, watercolor style" --output out.png
# 30x-below โ€” RTX 30 and below. Distilled, 4-step:
quantfunc --model-dir klein-9b-series-50x-below-base-model \
--transformer transformer/klein-9b-30x-below-lighting.safetensors \
--model-backend lighting --auto-optimize --steps 4 \
--prompt "a cute cat on a windowsill, watercolor style" --output out.png
```
`--auto-optimize` picks the VRAM/attention/compression strategy for your GPU. The ComfyUI Lighting plugin auto-selects the matching tier + precision-config.
## Precision Config (precision-config/)
| File | Tier / GPU | attention+FFN | islands |
|------|-----------|---------------|---------|
| `50x-fp4-f8-sample.json` | 50x โ€” Blackwell (SM120+) | FP4 | FP8 |
| `40x-int4-f8-sample.json` | 40x โ€” Ada (SM89) & Hopper (SM90): RTX 40, L40, H100, H200 | INT4 | FP8 |
| `30x-below-int4-i8-sample.json` | 30x-below โ€” RTX 30/20, A100 (pre-FP8) | INT4 | INT8 |
These per-layer configs control the Lighting backend's quantization precision โ€” customize for your own speed/quality trade-off.
## Related Repositories
- [QuantFunc/Klein-4B-Series](https://huggingface.co/QuantFunc/Klein-4B-Series) โ€” FLUX.2 Klein 4B
- [QuantFunc/Qwen-Image-Series](https://huggingface.co/QuantFunc/Qwen-Image-Series) ยท [QuantFunc/Qwen-Image-Edit-Series](https://huggingface.co/QuantFunc/Qwen-Image-Edit-Series) ยท [QuantFunc/Z-Image-Series](https://huggingface.co/QuantFunc/Z-Image-Series)
## License
The pre-quantized weights are derived from FLUX.2 Klein. Users must comply with the original Black Forest Labs FLUX.2 license. The QuantFunc inference engine and plugins are licensed separately.
## Community
Join our community for support, updates, and discussions:
- ๐ŸŽฎ [Discord server](https://discord.gg/jCp9TpFWcn)
- ๐Ÿ’ฌ Scan the QR code below to join our WeChat group:
<div align="center" id="wechat">
<img src="https://raw.githubusercontent.com/RealJonathanYip/ComfyUI-QuantFunc/main/assets/WeChat.jpg" alt="WeChat Group" width="300">
</div>