flux-dev-fp16 / README.md
wangkanai's picture
Upload folder using huggingface_hub
fa4393e verified
---
license: apache-2.0
library_name: diffusers
pipeline_tag: text-to-image
tags:
- flux
- text-to-image
- image-generation
- fp16
---
<!-- README Version: v1.4 -->
# FLUX.1-dev FP16
High-quality text-to-image generation model from Black Forest Labs. This repository contains the FLUX.1-dev model in FP16 precision for optimal quality and compatibility with modern GPUs.
## Model Description
FLUX.1-dev is a state-of-the-art text-to-image diffusion model designed for high-fidelity image generation. This FP16 version maintains full precision for maximum quality output, ideal for creative professionals and researchers requiring the highest image quality.
**Key Capabilities**:
- High-resolution text-to-image generation
- Advanced prompt understanding with T5-XXL text encoder
- Superior detail and coherence in generated images
- Wide range of artistic styles and subjects
- Multi-text encoder architecture (CLIP + T5)
## Repository Contents
```
flux-dev-fp16/
β”œβ”€β”€ checkpoints/flux/
β”‚ └── flux1-dev-fp16.safetensors # 23 GB - Complete model checkpoint
β”œβ”€β”€ clip/
β”‚ └── t5xxl_fp16.safetensors # 9.2 GB - T5-XXL text encoder
β”œβ”€β”€ clip_vision/
β”‚ └── clip_vision_h.safetensors # CLIP vision encoder
β”œβ”€β”€ diffusion_models/flux/
β”‚ └── flux1-dev-fp16.safetensors # 23 GB - Diffusion model
β”œβ”€β”€ text_encoders/
β”‚ β”œβ”€β”€ clip-vit-large.safetensors # 1.6 GB - CLIP ViT-Large encoder
β”‚ β”œβ”€β”€ clip_g.safetensors # 1.3 GB - CLIP-G encoder
β”‚ β”œβ”€β”€ clip_l.safetensors # 235 MB - CLIP-L encoder
β”‚ └── t5xxl_fp16.safetensors # 9.2 GB - T5-XXL encoder
└── vae/flux/
└── flux-vae-bf16.safetensors # 160 MB - VAE decoder (BF16)
Total Size: ~72 GB
```
## Hardware Requirements
### Minimum Requirements
- **VRAM**: 24 GB (RTX 3090, RTX 4090, A5000, A6000)
- **RAM**: 32 GB system memory
- **Disk Space**: 80 GB free space
- **GPU**: NVIDIA GPU with Compute Capability 7.0+ (Volta or newer)
### Recommended Requirements
- **VRAM**: 32+ GB (RTX 6000 Ada, A6000, H100)
- **RAM**: 64 GB system memory
- **Disk Space**: 100+ GB for workspace and outputs
- **GPU**: NVIDIA RTX 4090 or professional GPUs
### Performance Notes
- FP16 precision provides best quality but highest VRAM usage
- Consider FP8 version if VRAM is limited (see `flux-dev-fp8` directory)
- Generation time: ~30-60 seconds per image at 1024x1024 (depending on GPU)
## Usage Examples
### Using with Diffusers Library
```python
import torch
from diffusers import FluxPipeline
# Load the pipeline with local model files
pipe = FluxPipeline.from_pretrained(
"E:/huggingface/flux-dev-fp16",
torch_dtype=torch.float16
)
pipe = pipe.to("cuda")
# Generate an image
prompt = "A majestic lion standing on a cliff at sunset, cinematic lighting, photorealistic"
image = pipe(
prompt=prompt,
num_inference_steps=50,
guidance_scale=7.5,
height=1024,
width=1024
).images[0]
image.save("output.png")
```
### Using with ComfyUI
1. Copy model files to ComfyUI directories:
- `checkpoints/flux/flux1-dev-fp16.safetensors` β†’ `ComfyUI/models/checkpoints/`
- `text_encoders/*.safetensors` β†’ `ComfyUI/models/clip/`
- `vae/flux/flux-vae-bf16.safetensors` β†’ `ComfyUI/models/vae/`
2. In ComfyUI:
- Load Checkpoint: Select `flux1-dev-fp16`
- Text Encoder: Automatically loaded
- VAE: Select `flux-vae-bf16`
### Using Individual Components
```python
from diffusers import AutoencoderKL
from transformers import T5EncoderModel, CLIPTextModel
# Load text encoders
t5_encoder = T5EncoderModel.from_pretrained(
"E:/huggingface/flux-dev-fp16/text_encoders",
torch_dtype=torch.float16,
filename="t5xxl_fp16.safetensors"
)
clip_encoder = CLIPTextModel.from_pretrained(
"E:/huggingface/flux-dev-fp16/text_encoders",
torch_dtype=torch.float16,
filename="clip_l.safetensors"
)
# Load VAE
vae = AutoencoderKL.from_pretrained(
"E:/huggingface/flux-dev-fp16/vae/flux",
torch_dtype=torch.bfloat16,
filename="flux-vae-bf16.safetensors"
)
```
## Model Specifications
**Architecture**:
- **Type**: Latent Diffusion Transformer
- **Parameters**: ~12B (diffusion model)
- **Text Encoders**:
- T5-XXL: 4.7B parameters (FP16)
- CLIP-G: 1.3B parameters
- CLIP-L: 235M parameters
- **VAE**: BF16 precision (160M parameters)
**Precision**:
- **Diffusion Model**: FP16 (float16)
- **Text Encoders**: FP16 (float16)
- **VAE**: BF16 (bfloat16)
**Format**:
- `.safetensors` - Secure tensor format with fast loading
**Resolution Support**:
- Native: 1024x1024
- Range: 512x512 to 2048x2048
- Aspect ratios: Supports non-square resolutions
## Performance Tips
### Memory Optimization
```python
# Enable memory efficient attention
pipe.enable_attention_slicing()
# Enable VAE tiling for high resolutions
pipe.enable_vae_tiling()
# Use CPU offloading if VRAM limited (slower)
pipe.enable_sequential_cpu_offload()
```
### Speed Optimization
```python
# Use torch.compile for faster inference (PyTorch 2.0+)
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
# Reduce inference steps (trade quality for speed)
image = pipe(prompt, num_inference_steps=25) # Default is 50
```
### Quality Optimization
- Use 50-75 inference steps for best quality
- Guidance scale: 7-9 for balanced results
- Higher guidance (10-15) for stronger prompt adherence
- Consider prompt engineering for better results
## License
This model is released under the **Apache 2.0 License**.
**Usage Terms**:
- βœ… Commercial use allowed
- βœ… Modification and redistribution allowed
- βœ… Patent use allowed
- ⚠️ Requires attribution to Black Forest Labs
See the LICENSE file for full terms.
## Citation
If you use this model in your research or projects, please cite:
```bibtex
@misc{flux-dev,
title={FLUX.1-dev: High-Quality Text-to-Image Generation},
author={Black Forest Labs},
year={2024},
howpublished={\url{https://blackforestlabs.ai/}}
}
```
## Related Resources
- **Official Website**: https://blackforestlabs.ai/
- **Model Card**: https://huggingface.co/black-forest-labs/FLUX.1-dev
- **Documentation**: https://huggingface.co/docs/diffusers/en/api/pipelines/flux
- **Community**: https://huggingface.co/black-forest-labs
## Version Information
- **Model Version**: FLUX.1-dev
- **Precision**: FP16
- **Release**: 2024
- **README Version**: v1.4
---
For FP8 precision version (lower VRAM usage), see `E:/huggingface/flux-dev-fp8/`