---
license: other
license_name: flux-1-dev-non-commercial-license
license_link: >-
  https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev/blob/main/LICENSE.md
language:
- en
base_model:
- black-forest-labs/FLUX.1-Kontext-dev
pipeline_tag: image-to-image
tags:
- gguf-node
- gguf-connector
widget:
  - text: the anime girl with massive fennec ears is wearing cargo pants while sitting on a log in the woods biting into a sandwitch beside a beautiful alpine lake
    output:
      url: samples\ComfyUI_00001_.png
  - src: samples\fennec_girl_sing.png
    prompt: the anime girl with massive fennec ears is wearing cargo pants while sitting on a log in the woods biting into a sandwitch beside a beautiful alpine lake
    output:
      url: samples\ComfyUI_00001_.png
  - text: the anime girl with massive fennec ears is wearing a maid outfit with a long black gold leaf pattern dress and a white apron mouth open holding a fancy black forest cake with candles on top in the kitchen of an old dark Victorian mansion lit by candlelight with a bright window to the foggy forest and very expensive stuff everywhere
    output:
      url: samples\ComfyUI_00002_.png
  - src: samples\fennec_girl_sing.png
    prompt: the anime girl with massive fennec ears is wearing a maid outfit with a long black gold leaf pattern dress and a white apron mouth open holding a fancy black forest cake with candles on top in the kitchen of an old dark Victorian mansion lit by candlelight with a bright window to the foggy forest and very expensive stuff everywhere
    output:
      url: samples\ComfyUI_00002_.png
  - text: add a hat to the pig
    output:
      url: samples\hat.webp
  - src: samples\pig.png
    prompt: add a hat to the pig
    output:
      url: samples\hat.webp
---
# **gguf quantized version of kontext**
- run it straight with `gguf-connector`
- opt a `gguf` file in the current directory to interact with by:
```
ggc k0
```
>
>GGUF file(s) available. Select which one to use:
>
>1. flux-kontext-lite-q2_k.gguf
>2. flux-kontext-lite-q4_0.gguf
>3. flux-kontext-lite-q8_0.gguf
>
>Enter your choice (1 to 3): _
>
note: try experimental lite model with 8-step operation; save up to 70% loading time

## **run it with gguf-node via comfyui**
- drag **kontext** to > `./ComfyUI/models/diffusion_models`
- drag **clip-l, t5xxl** to > `./ComfyUI/models/text_encoders`
- drag **pig** to > `./ComfyUI/models/vae`

![screenshot](https://raw.githubusercontent.com/calcuis/comfy/master/kontext.png)

<Gallery />

- don't need safetensors anymore; all gguf (model + encoder + vae)
- full set gguf works on gguf-node (see the last item from reference at the very end)
- get more **t5xxl** gguf encoder either [here](https://huggingface.co/calcuis/pig-encoder/tree/main) or [here](https://huggingface.co/chatpig/t5-v1_1-xxl-encoder-fp32-gguf/tree/main)

![screenshot](https://raw.githubusercontent.com/calcuis/comfy/master/kontext-t2i.png)

## **extra: scaled safetensors (alternative 1)**
- get all-in-one checkpoint [here](https://huggingface.co/convertor/kontext-ckpt-fp8/blob/main/checkpoints/flux1-knotext-dev_fp8_e4m3fn.safetensors) (model, clips and vae embedded)
![screenshot](https://raw.githubusercontent.com/calcuis/comfy/master/kontext-ckpt.png)
- another option: get multi matrix scaled fp8 from comfyui [here](https://huggingface.co/Comfy-Org/flux1-kontext-dev_ComfyUI/blob/main/split_files/diffusion_models/flux1-dev-kontext_fp8_scaled.safetensors) or e4m3fn fp8 [here](https://huggingface.co/convertor/kontext-ckpt-fp8/blob/main/diffusion_models/flux1-dev-kontext_fp8_e4m3fn.safetensors) with seperate scaled version [l-clip](https://huggingface.co/chatpig/encoder/blob/main/clip_l_fp8_e4m3fn.safetensors), [t5xxl](https://huggingface.co/chatpig/encoder/blob/main/t5xxl_fp8_e4m3fn.safetensors) and [vae](https://huggingface.co/connector/pig-1k/blob/main/vae/pig_flux_vae_fp16.safetensors)

## **run it with diffusers🧨 (alternative 2)**
- might need the most updated diffusers (git version) for `FluxKontextPipeline` to work; upgrade your diffusers with:
```
pip install git+https://github.com/huggingface/diffusers.git
```

- see example inference below:
```py
import torch
from transformers import T5EncoderModel
from diffusers import FluxKontextPipeline
from diffusers.utils import load_image

text_encoder = T5EncoderModel.from_pretrained(
    "calcuis/kontext-gguf",
    gguf_file="t5xxl_fp16-q4_0.gguf",
    torch_dtype=torch.bfloat16,
    )

pipe = FluxKontextPipeline.from_pretrained(
    "calcuis/kontext-gguf",
    text_encoder_2=text_encoder,
    torch_dtype=torch.bfloat16
    ).to("cuda")

input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png")

image = pipe(
  image=input_image,
  prompt="Add a hat to the cat",
  guidance_scale=2.5
).images[0]
image.save("output.png")
```

- tip: if your machine doesn't has enough vram, would suggest running it with gguf-node via comfyui (plan a), otherwise you might expect to wait very long while falling to a slow mode; this is always a winner takes all game

## **run it with gguf-connector (other alternatives)**
- simply execute the command below in console/terminal
```
ggc k2
```

![screenshot](https://raw.githubusercontent.com/calcuis/gguf-pack/master/k2.png)
- note: during the first time launch, it will pull the required model file(s) from this repo to local cache automatically; then opt to run it entirely offline; i.e., from local URL:  http://127.0.0.1:7860 with lazy webui

![screenshot](https://raw.githubusercontent.com/calcuis/gguf-pack/master/k1a.png)
- with bot lora embedded version
```
ggc k1
```

![screenshot](https://raw.githubusercontent.com/calcuis/gguf-pack/master/k1b.png)
- new plushie style
![screenshot](https://raw.githubusercontent.com/calcuis/gguf-pack/master/k1.png)

## additional chapter for lora conversion via gguf-connector
- convert lora from base to unet format, i.e.,[plushie](https://huggingface.co/fal/Plushie-Kontext-Dev-LoRA/blob/main/plushie-kontext-dev-lora.safetensors), then it can be used in comfyui as well
```
ggc la
```

![screenshot](https://raw.githubusercontent.com/calcuis/comfy/master/kontext-lora.png)
- able to swap the lora back (from unet to base; auto-detection logic applied), then it can be used for inference again
```
ggc la
```

![screenshot](https://raw.githubusercontent.com/calcuis/gguf-pack/master/k1d.png)

### update
- [clip-l-v2](https://huggingface.co/calcuis/pig-encoder/blob/main/clip_l_v2_fp32-f16.gguf): missing tensor `text_projection.weight` added
- kontext-v2: `s-quant` and `k-quant`; except single and double blocks, all in `f32` status
  - pros: load faster (as no dequant needed for those tensors); and
      1) avoid key breaking issue, since some inference engines only dequant blocks;
      2) compatible for non-cuda machines, as most of them cannot run `bf16` tensors
  - cons: little bit larger in file size
- kontext-v3: `i-quant` attempt (upgrade your node to the latest version for full quant support)
- kontext-v4: `t-quant`; runnable (extramely fast); for speed test/experimental purposes

|rank|quant|s/it|loading speed|
|----|--------|---------|----------------|
|  1 | q2_k   | 6.40±.7 |🐖💨💨💨💨💨💨
|  2 | q4_0   | 8.58±.5 |🐖🐖💨💨💨💨💨
|  3 | q4_1   | 9.12±.5 |🐖🐖🐖💨💨💨💨
|  4 | q8_0   | 9.45±.3 |🐖🐖🐖🐖💨💨💨
|  5 | q3_k   | 9.50±.3 |🐖🐖🐖🐖💨💨💨
|  6 | q5_0   | 10.48±.5|🐖🐖🐖🐖🐖💨💨
|  7 | iq4_nl | 10.55±.5|🐖🐖🐖🐖🐖💨💨
|  8 | q5_1   | 10.65±.5|🐖🐖🐖🐖🐖💨💨
|  9 | iq4_xs | 11.45±.7|🐖🐖🐖🐖🐖🐖💨
|  10| iq3_s  | 11.62±.9|🐢🐢🐢🐢🐢🐢💨
|  11| iq3_xxs| 12.08±.9|🐢🐢🐢🐢🐢🐢🐢

not all included in the initial test (*tested with a beginner laptop gpu only, if you have highend model, might find q8_0 running surprisingly faster than others), the rest of them, test it yourself; btw, the interesting thing is: the loading time required was not aligning with file size, due to the complexity of each calculation (dequant), and might vary from models

### new memory economy mode
- this option works for machine with low/no vram or even without gpu
```
ggc k3
```
### 🐷 Kontext Image Editor (connector mode) 🐷
- opt a `gguf` file straight in the current directory to interact with
```
ggc k6
```
- semi-full quant supported in the k8 connector (use dequantor instead of diffusers)
```
ggc k8
```

### **reference**
- base model from [black-forest-labs](https://huggingface.co/black-forest-labs)
- comfyui from [comfyanonymous](https://github.com/comfyanonymous/ComfyUI)
- gguf-node ([pypi](https://pypi.org/project/gguf-node)|[repo](https://github.com/calcuis/gguf)|[pack](https://github.com/calcuis/gguf/releases))
- gguf-connector ([pypi](https://pypi.org/project/gguf-connector))