--- license: other license_name: flux-1-dev-non-commercial-license license_link: >- https://huggingface.co/black-forest-labs/FLUX.1-Kontext-dev/blob/main/LICENSE.md language: - en base_model: - black-forest-labs/FLUX.1-Kontext-dev pipeline_tag: image-to-image tags: - gguf-node - gguf-connector widget: - text: the anime girl with massive fennec ears is wearing cargo pants while sitting on a log in the woods biting into a sandwitch beside a beautiful alpine lake output: url: samples\ComfyUI_00001_.png - src: samples\fennec_girl_sing.png prompt: the anime girl with massive fennec ears is wearing cargo pants while sitting on a log in the woods biting into a sandwitch beside a beautiful alpine lake output: url: samples\ComfyUI_00001_.png - text: the anime girl with massive fennec ears is wearing a maid outfit with a long black gold leaf pattern dress and a white apron mouth open holding a fancy black forest cake with candles on top in the kitchen of an old dark Victorian mansion lit by candlelight with a bright window to the foggy forest and very expensive stuff everywhere output: url: samples\ComfyUI_00002_.png - src: samples\fennec_girl_sing.png prompt: the anime girl with massive fennec ears is wearing a maid outfit with a long black gold leaf pattern dress and a white apron mouth open holding a fancy black forest cake with candles on top in the kitchen of an old dark Victorian mansion lit by candlelight with a bright window to the foggy forest and very expensive stuff everywhere output: url: samples\ComfyUI_00002_.png - text: add a hat to the pig output: url: samples\hat.webp - src: samples\pig.png prompt: add a hat to the pig output: url: samples\hat.webp --- # **gguf quantized version of kontext** - run it straight with `gguf-connector` - opt a `gguf` file in the current directory to interact with by: ``` ggc k0 ``` > >GGUF file(s) available. Select which one to use: > >1. flux-kontext-lite-q2_k.gguf >2. flux-kontext-lite-q4_0.gguf >3. flux-kontext-lite-q8_0.gguf > >Enter your choice (1 to 3): _ > note: try experimental lite model with 8-step operation; save up to 70% loading time ## **run it with gguf-node via comfyui** - drag **kontext** to > `./ComfyUI/models/diffusion_models` - drag **clip-l, t5xxl** to > `./ComfyUI/models/text_encoders` - drag **pig** to > `./ComfyUI/models/vae` ![screenshot](https://raw.githubusercontent.com/calcuis/comfy/master/kontext.png) - don't need safetensors anymore; all gguf (model + encoder + vae) - full set gguf works on gguf-node (see the last item from reference at the very end) - get more **t5xxl** gguf encoder either [here](https://huggingface.co/calcuis/pig-encoder/tree/main) or [here](https://huggingface.co/chatpig/t5-v1_1-xxl-encoder-fp32-gguf/tree/main) ![screenshot](https://raw.githubusercontent.com/calcuis/comfy/master/kontext-t2i.png) ## **extra: scaled safetensors (alternative 1)** - get all-in-one checkpoint [here](https://huggingface.co/convertor/kontext-ckpt-fp8/blob/main/checkpoints/flux1-knotext-dev_fp8_e4m3fn.safetensors) (model, clips and vae embedded) ![screenshot](https://raw.githubusercontent.com/calcuis/comfy/master/kontext-ckpt.png) - another option: get multi matrix scaled fp8 from comfyui [here](https://huggingface.co/Comfy-Org/flux1-kontext-dev_ComfyUI/blob/main/split_files/diffusion_models/flux1-dev-kontext_fp8_scaled.safetensors) or e4m3fn fp8 [here](https://huggingface.co/convertor/kontext-ckpt-fp8/blob/main/diffusion_models/flux1-dev-kontext_fp8_e4m3fn.safetensors) with seperate scaled version [l-clip](https://huggingface.co/chatpig/encoder/blob/main/clip_l_fp8_e4m3fn.safetensors), [t5xxl](https://huggingface.co/chatpig/encoder/blob/main/t5xxl_fp8_e4m3fn.safetensors) and [vae](https://huggingface.co/connector/pig-1k/blob/main/vae/pig_flux_vae_fp16.safetensors) ## **run it with diffusers🧨 (alternative 2)** - might need the most updated diffusers (git version) for `FluxKontextPipeline` to work; upgrade your diffusers with: ``` pip install git+https://github.com/huggingface/diffusers.git ``` - see example inference below: ```py import torch from transformers import T5EncoderModel from diffusers import FluxKontextPipeline from diffusers.utils import load_image text_encoder = T5EncoderModel.from_pretrained( "calcuis/kontext-gguf", gguf_file="t5xxl_fp16-q4_0.gguf", torch_dtype=torch.bfloat16, ) pipe = FluxKontextPipeline.from_pretrained( "calcuis/kontext-gguf", text_encoder_2=text_encoder, torch_dtype=torch.bfloat16 ).to("cuda") input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/cat.png") image = pipe( image=input_image, prompt="Add a hat to the cat", guidance_scale=2.5 ).images[0] image.save("output.png") ``` - tip: if your machine doesn't has enough vram, would suggest running it with gguf-node via comfyui (plan a), otherwise you might expect to wait very long while falling to a slow mode; this is always a winner takes all game ## **run it with gguf-connector (other alternatives)** - simply execute the command below in console/terminal ``` ggc k2 ``` ![screenshot](https://raw.githubusercontent.com/calcuis/gguf-pack/master/k2.png) - note: during the first time launch, it will pull the required model file(s) from this repo to local cache automatically; then opt to run it entirely offline; i.e., from local URL: http://127.0.0.1:7860 with lazy webui ![screenshot](https://raw.githubusercontent.com/calcuis/gguf-pack/master/k1a.png) - with bot lora embedded version ``` ggc k1 ``` ![screenshot](https://raw.githubusercontent.com/calcuis/gguf-pack/master/k1b.png) - new plushie style ![screenshot](https://raw.githubusercontent.com/calcuis/gguf-pack/master/k1.png) ## additional chapter for lora conversion via gguf-connector - convert lora from base to unet format, i.e.,[plushie](https://huggingface.co/fal/Plushie-Kontext-Dev-LoRA/blob/main/plushie-kontext-dev-lora.safetensors), then it can be used in comfyui as well ``` ggc la ``` ![screenshot](https://raw.githubusercontent.com/calcuis/comfy/master/kontext-lora.png) - able to swap the lora back (from unet to base; auto-detection logic applied), then it can be used for inference again ``` ggc la ``` ![screenshot](https://raw.githubusercontent.com/calcuis/gguf-pack/master/k1d.png) ### update - [clip-l-v2](https://huggingface.co/calcuis/pig-encoder/blob/main/clip_l_v2_fp32-f16.gguf): missing tensor `text_projection.weight` added - kontext-v2: `s-quant` and `k-quant`; except single and double blocks, all in `f32` status - pros: load faster (as no dequant needed for those tensors); and 1) avoid key breaking issue, since some inference engines only dequant blocks; 2) compatible for non-cuda machines, as most of them cannot run `bf16` tensors - cons: little bit larger in file size - kontext-v3: `i-quant` attempt (upgrade your node to the latest version for full quant support) - kontext-v4: `t-quant`; runnable (extramely fast); for speed test/experimental purposes |rank|quant|s/it|loading speed| |----|--------|---------|----------------| | 1 | q2_k | 6.40Β±.7 |πŸ–πŸ’¨πŸ’¨πŸ’¨πŸ’¨πŸ’¨πŸ’¨ | 2 | q4_0 | 8.58Β±.5 |πŸ–πŸ–πŸ’¨πŸ’¨πŸ’¨πŸ’¨πŸ’¨ | 3 | q4_1 | 9.12Β±.5 |πŸ–πŸ–πŸ–πŸ’¨πŸ’¨πŸ’¨πŸ’¨ | 4 | q8_0 | 9.45Β±.3 |πŸ–πŸ–πŸ–πŸ–πŸ’¨πŸ’¨πŸ’¨ | 5 | q3_k | 9.50Β±.3 |πŸ–πŸ–πŸ–πŸ–πŸ’¨πŸ’¨πŸ’¨ | 6 | q5_0 | 10.48Β±.5|πŸ–πŸ–πŸ–πŸ–πŸ–πŸ’¨πŸ’¨ | 7 | iq4_nl | 10.55Β±.5|πŸ–πŸ–πŸ–πŸ–πŸ–πŸ’¨πŸ’¨ | 8 | q5_1 | 10.65Β±.5|πŸ–πŸ–πŸ–πŸ–πŸ–πŸ’¨πŸ’¨ | 9 | iq4_xs | 11.45Β±.7|πŸ–πŸ–πŸ–πŸ–πŸ–πŸ–πŸ’¨ | 10| iq3_s | 11.62Β±.9|πŸ’πŸ’πŸ’πŸ’πŸ’πŸ’πŸ’¨ | 11| iq3_xxs| 12.08Β±.9|🐒🐒🐒🐒🐒🐒🐒 not all included in the initial test (*tested with a beginner laptop gpu only, if you have highend model, might find q8_0 running surprisingly faster than others), the rest of them, test it yourself; btw, the interesting thing is: the loading time required was not aligning with file size, due to the complexity of each calculation (dequant), and might vary from models ### new memory economy mode - this option works for machine with low/no vram or even without gpu ``` ggc k3 ``` ### 🐷 Kontext Image Editor (connector mode) 🐷 - opt a `gguf` file straight in the current directory to interact with ``` ggc k6 ``` - semi-full quant supported in the k8 connector (use dequantor instead of diffusers) ``` ggc k8 ``` ### **reference** - base model from [black-forest-labs](https://huggingface.co/black-forest-labs) - comfyui from [comfyanonymous](https://github.com/comfyanonymous/ComfyUI) - gguf-node ([pypi](https://pypi.org/project/gguf-node)|[repo](https://github.com/calcuis/gguf)|[pack](https://github.com/calcuis/gguf/releases)) - gguf-connector ([pypi](https://pypi.org/project/gguf-connector))