OpenFLUX.1 GGUF Model Card

Quantized versions of ostris/OpenFLUX.1 in GGUF format for use with stable-diffusion.cpp.

At the time of publishing, no ready-made GGUF weights for OpenFlux were available for sd.cpp runtime — so here we are.

Sample output Sample generation: "A lovely cat" · seed 440103671 · Q8_0


Available Quantizations

File Type Description
openflux1-v0.1.0-Q8_0.gguf Q8_0 Great balance of quality and size ✅ recommended
openflux1-v0.1.0-Q4_0.gguf Q4_0 Smaller size

Quick Start

1. Download the model

# Recommended — Q8_0
wget https://huggingface.co/kostakoff/OpenFLUX.1-GGUF/resolve/main/openflux1-v0.1.0-Q8_0.gguf

# Other quantizations:
# wget https://huggingface.co/kostakoff/OpenFLUX.1-GGUF/resolve/main/openflux1-v0.1.0-Q4_0.gguf

2. Build stable-diffusion.cpp

Requirements: CUDA-capable GPU, CMake ≥ 3.18, CUDA Toolkit

git clone https://github.com/leejet/stable-diffusion.cpp
cd stable-diffusion.cpp
git submodule init
git submodule update

mkdir build && cd build
cmake .. -DSD_CUDA=ON
cmake --build . --config Release

Version used for conversion and testing:

stable-diffusion.cpp version master-520-d950627, commit d950627

3. Start the server

export CUDA_VISIBLE_DEVICES=0

./stable-diffusion.cpp/build/bin/sd-server \
  -m ./openflux1-v0.1.0-Q8_0.gguf \
  --vae-on-cpu \
  --listen-ip 0.0.0.0 \
  --listen-port 8081 \
  --seed -1

⚠️ The --vae-on-cpu flag is required! The VAE decoder consumes up to 10 GB of VRAM when converting the latent representation to PNG. Offloading VAE to CPU makes it possible to run the model on most consumer GPUs.

4. Generate an image

curl -s http://127.0.0.1:8081/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "flux",
    "prompt": "A lovely cat<sd_cpp_extra_args>{\"seed\": 440103671}</sd_cpp_extra_args>",
    "n": 1,
    "size": "",
    "response_format": "b64_json"
  }' | jq -r '.data[0].b64_json' | base64 --decode > out.png

Extra parameters are passed via <sd_cpp_extra_args> as a JSON snippet embedded directly in the prompt field.


How the weights were created

Converted from the original openflux1-v0.1.0-fp8.safetensors weights using the built-in sd-cli conversion tool:

# Q8_0
./stable-diffusion.cpp/build/bin/sd-cli -M convert \
  -m ~/llm/models/openflux/openflux1-v0.1.0-fp8.safetensors \
  -o ./openflux1-v0.1.0-Q8_0.gguf -v --type q8_0

# Q4_0
./stable-diffusion.cpp/build/bin/sd-cli -M convert \
  -m ~/llm/models/openflux/openflux1-v0.1.0-fp8.safetensors \
  -o ./openflux1-v0.1.0-Q4_0.gguf -v --type q4_0

License

This model inherits the license of the original — Apache 2.0

Downloads last month
30
GGUF
Model size
17B params
Architecture
undefined
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kostakoff/OpenFLUX.1-GGUF

Base model

ostris/OpenFLUX.1
Quantized
(1)
this model

Collection including kostakoff/OpenFLUX.1-GGUF