docs: point GitHub/QR links to RealJonathanYip (repo migrated)

92b6098 verified 17 days ago

9.41 kB

	---
	library_name: diffusers
	pipeline_tag: text-to-image
	base_model:
	- black-forest-labs/FLUX.2-klein-9B
	base_model_relation: quantized
	tags:
	- text-to-image
	- image-editing
	- diffusion
	- quantized
	- quantfunc
	- flux
	language:
	- en
	license: other
	license_name: flux-non-commercial-license
	license_link: LICENSE
	---

	<!-- QF-LICENSE-BLOCK:START -->
	## ⚠️ License — Non-Commercial Use Only

	These are quantized derivative weights of [`black-forest-labs/FLUX.2-klein-9B`](https://huggingface.co/black-forest-labs/FLUX.2-klein-9B) (FLUX.2 [klein] 9B), which is
	licensed under the FLUX Non-Commercial License v2.1 by Black Forest Labs.

	> This FLUX Model is licensed by Black Forest Labs Inc. under the FLUX Non-Commercial License.

	- Non-commercial use only. These weights may not be used for any commercial or
	revenue-generating purpose. Commercial use requires a separate license from Black Forest
	Labs — see https://bfl.ai/licensing .
	- Full license: included as [`LICENSE`](./LICENSE) (FLUX Non-Commercial License v2.1).
	- Modifications: quantized from FLUX.2 [klein] 9B by the QuantFunc inference engine.
	- This is not an official Black Forest Labs product and is not endorsed by BFL.

	> Disclaimer: Derived from FLUX.2 [klein] by Black Forest Labs. This is not an official Black Forest Labs product and is not endorsed by or affiliated with BFL. "FLUX" is a trademark of Black Forest Labs.
	<!-- QF-LICENSE-BLOCK:END -->

	# QuantFunc

	<div align="center" style="margin-top: 50px;">
	<img src="assets/logo.webp" width="300" alt="Logo">
	</div>

	<p align="center">
	🤗 <a href="https://huggingface.co/QuantFunc">Hugging Face</a>  \|
	🤖 <a href="https://www.modelscope.cn/profile/QuantFunc">ModelScope</a>  \|
	💻 <a href="https://github.com/RealJonathanYip/ComfyUI-QuantFunc">GitHub</a>  \|
	💬 <a href="#wechat">WeChat (微信)</a>  \|
	🎮 <a href="https://discord.gg/jCp9TpFWcn">Discord</a>
	</p>

	> ⚡ FLUX.2 Klein 9B — the highest-quality Klein tier, pre-quantized. Text-to-image and reference-based editing at 2x–11x with the QuantFunc plugin.

	The larger 9B Klein model for maximum fidelity, shipped as distilled (4-step) + base (28-step) transformers across three GPU tiers (`50x` FP4 · `40x` INT4+FP8 · `30x-below` INT4+INT8).

	Powered by the [QuantFunc ComfyUI plugin](https://github.com/RealJonathanYip/ComfyUI-QuantFunc) — the fastest diffusion inference engine:

	- 🚀 2x–11x speedup over standard BF16/FP16 Python pipelines (pre-exported → even faster loading).
	- ⚙️ Native C++/CUDA (`libquantfunc.so` / `quantfunc.dll`) with zero Python model dependencies.
	- 🧩 Dual engine (SVDQ offline + Lighting runtime 4-bit), zero-cost LoRA stacking, reference-image editing & inpainting.
	- 🟢 Full GPU coverage — RTX 20/30/40/50 · A100/H100/H200/B100/B200/GB300 · RTX 6000 Ada / PRO Blackwell (CUDA 12 & 13); native FP4 on Blackwell.

	👉 Install the plugin: https://github.com/RealJonathanYip/ComfyUI-QuantFunc

	# Klein-9B-Series

	Pre-quantized FLUX.2 Klein 9B model series by [QuantFunc](https://github.com/RealJonathanYip), Lighting backend. Text-to-image and reference-based image editing.

	> ✨ Both the distilled AND the non-distilled (base) model are supported, and the series ships three GPU tiers so every card gets the best path it can run:
	> `50x` (Blackwell, FP4) · `40x` (RTX 40 / Ada & Hopper, INT4 + FP8) · `30x-below` (RTX 30 and below, INT4 + INT8).

	## Overview

	FLUX.2 Klein is Black Forest Labs' Flux.2 family. The 9B variant (the larger, higher-quality variant, transformer K=4096). QuantFunc ships, pre-quantized:

	- Distilled transformer — 4-step, fastest few-step generation/editing.
	- Base / non-distilled transformer — the full 28-step model with classical CFG (`--guidance-scale 4.0`), highest quality.

	…each in 3 hardware tiers (below). Distilled and base share the same base-model — only the transformer file differs.

	## Hardware tiers (pick by GPU)

	FP4 needs Blackwell (SM120); FP8 needs Ada (SM89) or Hopper (SM90) — e.g. RTX 40 / L40 / H100 / H200; INT4/INT8 run everywhere (Ampere/Turing, e.g. RTX 30/20, A100). So:

	\| Tier \| GPUs \| attention + FFN \| modulation/embedders/head \| base-model \|
	\|------\|------\|-----------------\|---------------------------\|-----------\|
	\| `50x` \| Blackwell (SM120+) — RTX 50 series, B100/B200/GB200, RTX PRO Blackwell \| FP4 \| FP8 \| `klein-9b-series-50x-above-base-model` (FP4 text encoder) \|
	\| `40x` \| RTX 40 / Ada (SM89) & Hopper (SM90) — RTX 40 series, L40/L40S, H100, H200 \| INT4 \| FP8 \| `klein-9b-series-50x-below-base-model` (INT4 text encoder) \|
	\| `30x-below` \| RTX 30 and below (pre-FP8) — RTX 30/20, A100, A40, T4, down to RTX 2080 \| INT4 \| INT8 \| `klein-9b-series-50x-below-base-model` (INT4 text encoder) \|

	> `40x` and `30x-below` share the same INT4 base-model — they differ only in the transformer's 8-bit precision (FP8 vs INT8). `50x` uses the FP4 base-model.

	## Directory Structure

	```
	Klein-9B-Series/
	├── klein-9b-series-50x-above-base-model/ # FP4 text encoder + VAE(enc+dec) + tokenizer + scheduler (50x)
	├── klein-9b-series-50x-below-base-model/ # INT4 text encoder + VAE(enc+dec) + tokenizer + scheduler (40x & 30x-below)
	├── transformer/
	│ ├── config.json
	│ ├── klein-9b-50x-lighting.safetensors # distilled, FP4 (50x)
	│ ├── klein-9b-base-50x-lighting.safetensors # base 28-step, FP4 (50x)
	│ ├── klein-9b-40x-lighting.safetensors # distilled, INT4 + FP8 (40x)
	│ ├── klein-9b-base-40x-lighting.safetensors # base 28-step, INT4 + FP8(40x)
	│ ├── klein-9b-30x-below-lighting.safetensors # distilled, INT4 + INT8 (30x-below)
	│ └── klein-9b-base-30x-below-lighting.safetensors # base 28-step, INT4 + INT8(30x-below)
	└── precision-config/
	├── 50x-fp4-f8-sample.json
	├── 40x-int4-f8-sample.json
	└── 30x-below-int4-i8-sample.json
	```

	> Status: ✓ All weights uploaded; the VAE includes both encoder and decoder. Every tier × {distilled, base} is visually validated to generate correctly.

	## Distilled (4-step) vs Base (28-step)

	\| Transformer \| Source \| Steps \| Guidance \| Best for \|
	\|---\|---\|---\|---\|---\|
	\| `klein-9b-<tier>-lighting.safetensors` \| Klein distilled \| 4 \| none (guidance-distilled) \| Fastest \|
	\| `klein-9b-base-<tier>-lighting.safetensors` \| Klein base \| 28 \| `--guidance-scale 4.0` (classical CFG) \| Highest quality \|

	## Inference

	```bash
	# 50x — Blackwell (RTX 50 / B-series). Distilled, 4-step:
	quantfunc --model-dir klein-9b-series-50x-above-base-model \
	--transformer transformer/klein-9b-50x-lighting.safetensors \
	--model-backend lighting --auto-optimize --steps 4 \
	--prompt "a cute cat on a windowsill, watercolor style" --output out.png

	# 40x — RTX 40 / Ada or Hopper (H100/H200). Base 28-step (classical CFG):
	quantfunc --model-dir klein-9b-series-50x-below-base-model \
	--transformer transformer/klein-9b-base-40x-lighting.safetensors \
	--model-backend lighting --auto-optimize --steps 28 --guidance-scale 4.0 \
	--prompt "a cute cat on a windowsill, watercolor style" --output out.png

	# 30x-below — RTX 30 and below. Distilled, 4-step:
	quantfunc --model-dir klein-9b-series-50x-below-base-model \
	--transformer transformer/klein-9b-30x-below-lighting.safetensors \
	--model-backend lighting --auto-optimize --steps 4 \
	--prompt "a cute cat on a windowsill, watercolor style" --output out.png
	```

	`--auto-optimize` picks the VRAM/attention/compression strategy for your GPU. The ComfyUI Lighting plugin auto-selects the matching tier + precision-config.

	## Precision Config (precision-config/)

	\| File \| Tier / GPU \| attention+FFN \| islands \|
	\|------\|-----------\|---------------\|---------\|
	\| `50x-fp4-f8-sample.json` \| 50x — Blackwell (SM120+) \| FP4 \| FP8 \|
	\| `40x-int4-f8-sample.json` \| 40x — Ada (SM89) & Hopper (SM90): RTX 40, L40, H100, H200 \| INT4 \| FP8 \|
	\| `30x-below-int4-i8-sample.json` \| 30x-below — RTX 30/20, A100 (pre-FP8) \| INT4 \| INT8 \|

	These per-layer configs control the Lighting backend's quantization precision — customize for your own speed/quality trade-off.

	## Related Repositories

	- [QuantFunc/Klein-4B-Series](https://huggingface.co/QuantFunc/Klein-4B-Series) — FLUX.2 Klein 4B
	- [QuantFunc/Qwen-Image-Series](https://huggingface.co/QuantFunc/Qwen-Image-Series) · [QuantFunc/Qwen-Image-Edit-Series](https://huggingface.co/QuantFunc/Qwen-Image-Edit-Series) · [QuantFunc/Z-Image-Series](https://huggingface.co/QuantFunc/Z-Image-Series)

	## License

	The pre-quantized weights are derived from FLUX.2 Klein. Users must comply with the original Black Forest Labs FLUX.2 license. The QuantFunc inference engine and plugins are licensed separately.

	## Community

	Join our community for support, updates, and discussions:

	- 🎮 [Discord server](https://discord.gg/jCp9TpFWcn)
	- 💬 Scan the QR code below to join our WeChat group:

	<div align="center" id="wechat">
	<img src="https://raw.githubusercontent.com/RealJonathanYip/ComfyUI-QuantFunc/main/assets/WeChat.jpg" alt="WeChat Group" width="300">
	</div>