How to use from the
Use from the
Diffusers library
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline

# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("aniketppanchal/flux.1-dev-nf4-pkg", dtype=torch.bfloat16, device_map="cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]

FLUX.1 [dev] Grid

This repository provides quantized weights of the FLUX.1 [dev], converted using BitsAndBytes in NF4 format. This enables GPU inference with reduced VRAM requirements, making it accessible even on the Google Colab free tier or on GPUs with 8GB VRAM.

The FLUX.1 [dev] model consists of three main components:

  • Text Encodersβ€”CLIP and T5
  • Flux Transformer
  • VAE

In this repository, only the T5 encoder and the Flux Transformer are quantized. The CLIP encoder and VAE remain in their original precision but are included to ensure a fully functional inference pipeline.

Usage

pip install bitsandbytes==0.48.1 diffusers==0.35.1 peft==0.17.1 protobuf==5.29.5 sentencepiece==0.2.1 transformers==4.56.1

Full Pipeline Mode (β‰ˆ 14.8 GB VRAM)

import torch
from diffusers import FluxPipeline

ckpt_4bit_id = "aniketppanchal/flux.1-dev-nf4-pkg"
prompt = "A cat holding a sign that says hello world"
height = 1024
width = 1024

pipeline = FluxPipeline.from_pretrained(
    ckpt_4bit_id,
    torch_dtype=torch.float16,
    device_map="cuda",
)

image = pipeline(
    prompt=prompt,
    height=height,
    width=width,
    num_inference_steps=28,
    guidance_scale=3.5,
    max_sequence_length=512,
).images[0]
image.save("output.png")

Split Pipeline Mode (β‰ˆ 7.7 GB VRAM)

import gc

import torch
from diffusers import FluxPipeline, FluxTransformer2DModel
from transformers import T5EncoderModel

ckpt_4bit_id = "aniketppanchal/flux.1-dev-nf4-pkg"
prompt = "A cat holding a sign that says hello world"
height = 1024
width = 1024

# ----------Encode Prompt Embeddings----------

text_encoder_2 = T5EncoderModel.from_pretrained(
    ckpt_4bit_id,
    subfolder="text_encoder_2",
    torch_dtype=torch.float16,
    device_map="cuda",
)
pipeline = FluxPipeline.from_pretrained(
    ckpt_4bit_id,
    text_encoder_2=text_encoder_2,
    transformer=None,
    vae=None,
    torch_dtype=torch.float16,
    device_map="cuda",
)

with torch.no_grad():
    prompt_embeds, pooled_prompt_embeds, _ = pipeline.encode_prompt(
        prompt=prompt,
        max_sequence_length=512,
    )

del text_encoder_2, pipeline
gc.collect()
torch.cuda.empty_cache()

# ----------Generate Diffusion Latents----------

transformer = FluxTransformer2DModel.from_pretrained(
    ckpt_4bit_id,
    subfolder="transformer",
    torch_dtype=torch.float16,
    device_map="cuda",
)
pipeline = FluxPipeline.from_pretrained(
    ckpt_4bit_id,
    text_encoder=None,
    text_encoder_2=None,
    tokenizer=None,
    tokenizer_2=None,
    transformer=transformer,
    vae=None,
    torch_dtype=torch.float16,
    device_map="cuda",
)

packed_latents = pipeline(
    height=height,
    width=width,
    num_inference_steps=28,
    guidance_scale=3.5,
    prompt_embeds=prompt_embeds,
    pooled_prompt_embeds=pooled_prompt_embeds,
    output_type="latent",
    max_sequence_length=512,
).images

del prompt_embeds, pooled_prompt_embeds, transformer, pipeline
gc.collect()
torch.cuda.empty_cache()

# ----------Decode Latents to Image----------

pipeline = FluxPipeline.from_pretrained(
    ckpt_4bit_id,
    text_encoder=None,
    text_encoder_2=None,
    tokenizer=None,
    tokenizer_2=None,
    transformer=None,
    torch_dtype=torch.float16,
    device_map="cuda",
)

unpacked_latents = (
    pipeline._unpack_latents(
        packed_latents,
        height=height,
        width=width,
        vae_scale_factor=pipeline.vae_scale_factor,
    )
    / pipeline.vae.config.scaling_factor
    + pipeline.vae.config.shift_factor
)

with torch.no_grad():
    image_tensor = pipeline.vae.decode(unpacked_latents, return_dict=False)[0]

image = pipeline.image_processor.postprocess(image_tensor)[0]
image.save("output.png")

del packed_latents, unpacked_latents, image_tensor, pipeline
gc.collect()
torch.cuda.empty_cache()

License

This repository is released under the FLUX-1 Dev Non-Commercial License. The included LICENSE.md file corresponds to the frozen state of the original repository as of 3rd November 2025. For the latest version, see the FLUX.1 [dev] License.

Downloads last month
23
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for aniketppanchal/flux.1-dev-nf4-pkg

Quantized
(69)
this model

Collection including aniketppanchal/flux.1-dev-nf4-pkg