|
|
--- |
|
|
license: apache-2.0 |
|
|
library_name: diffusers |
|
|
pipeline_tag: text-to-image |
|
|
tags: |
|
|
- flux |
|
|
- text-to-image |
|
|
- image-generation |
|
|
- fp16 |
|
|
--- |
|
|
|
|
|
<!-- README Version: v1.4 --> |
|
|
|
|
|
# FLUX.1-dev FP16 |
|
|
|
|
|
High-quality text-to-image generation model from Black Forest Labs. This repository contains the FLUX.1-dev model in FP16 precision for optimal quality and compatibility with modern GPUs. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
FLUX.1-dev is a state-of-the-art text-to-image diffusion model designed for high-fidelity image generation. This FP16 version maintains full precision for maximum quality output, ideal for creative professionals and researchers requiring the highest image quality. |
|
|
|
|
|
**Key Capabilities**: |
|
|
- High-resolution text-to-image generation |
|
|
- Advanced prompt understanding with T5-XXL text encoder |
|
|
- Superior detail and coherence in generated images |
|
|
- Wide range of artistic styles and subjects |
|
|
- Multi-text encoder architecture (CLIP + T5) |
|
|
|
|
|
## Repository Contents |
|
|
|
|
|
``` |
|
|
flux-dev-fp16/ |
|
|
βββ checkpoints/flux/ |
|
|
β βββ flux1-dev-fp16.safetensors # 23 GB - Complete model checkpoint |
|
|
βββ clip/ |
|
|
β βββ t5xxl_fp16.safetensors # 9.2 GB - T5-XXL text encoder |
|
|
βββ clip_vision/ |
|
|
β βββ clip_vision_h.safetensors # CLIP vision encoder |
|
|
βββ diffusion_models/flux/ |
|
|
β βββ flux1-dev-fp16.safetensors # 23 GB - Diffusion model |
|
|
βββ text_encoders/ |
|
|
β βββ clip-vit-large.safetensors # 1.6 GB - CLIP ViT-Large encoder |
|
|
β βββ clip_g.safetensors # 1.3 GB - CLIP-G encoder |
|
|
β βββ clip_l.safetensors # 235 MB - CLIP-L encoder |
|
|
β βββ t5xxl_fp16.safetensors # 9.2 GB - T5-XXL encoder |
|
|
βββ vae/flux/ |
|
|
βββ flux-vae-bf16.safetensors # 160 MB - VAE decoder (BF16) |
|
|
|
|
|
Total Size: ~72 GB |
|
|
``` |
|
|
|
|
|
## Hardware Requirements |
|
|
|
|
|
### Minimum Requirements |
|
|
- **VRAM**: 24 GB (RTX 3090, RTX 4090, A5000, A6000) |
|
|
- **RAM**: 32 GB system memory |
|
|
- **Disk Space**: 80 GB free space |
|
|
- **GPU**: NVIDIA GPU with Compute Capability 7.0+ (Volta or newer) |
|
|
|
|
|
### Recommended Requirements |
|
|
- **VRAM**: 32+ GB (RTX 6000 Ada, A6000, H100) |
|
|
- **RAM**: 64 GB system memory |
|
|
- **Disk Space**: 100+ GB for workspace and outputs |
|
|
- **GPU**: NVIDIA RTX 4090 or professional GPUs |
|
|
|
|
|
### Performance Notes |
|
|
- FP16 precision provides best quality but highest VRAM usage |
|
|
- Consider FP8 version if VRAM is limited (see `flux-dev-fp8` directory) |
|
|
- Generation time: ~30-60 seconds per image at 1024x1024 (depending on GPU) |
|
|
|
|
|
## Usage Examples |
|
|
|
|
|
### Using with Diffusers Library |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from diffusers import FluxPipeline |
|
|
|
|
|
# Load the pipeline with local model files |
|
|
pipe = FluxPipeline.from_pretrained( |
|
|
"E:/huggingface/flux-dev-fp16", |
|
|
torch_dtype=torch.float16 |
|
|
) |
|
|
pipe = pipe.to("cuda") |
|
|
|
|
|
# Generate an image |
|
|
prompt = "A majestic lion standing on a cliff at sunset, cinematic lighting, photorealistic" |
|
|
image = pipe( |
|
|
prompt=prompt, |
|
|
num_inference_steps=50, |
|
|
guidance_scale=7.5, |
|
|
height=1024, |
|
|
width=1024 |
|
|
).images[0] |
|
|
|
|
|
image.save("output.png") |
|
|
``` |
|
|
|
|
|
### Using with ComfyUI |
|
|
|
|
|
1. Copy model files to ComfyUI directories: |
|
|
- `checkpoints/flux/flux1-dev-fp16.safetensors` β `ComfyUI/models/checkpoints/` |
|
|
- `text_encoders/*.safetensors` β `ComfyUI/models/clip/` |
|
|
- `vae/flux/flux-vae-bf16.safetensors` β `ComfyUI/models/vae/` |
|
|
|
|
|
2. In ComfyUI: |
|
|
- Load Checkpoint: Select `flux1-dev-fp16` |
|
|
- Text Encoder: Automatically loaded |
|
|
- VAE: Select `flux-vae-bf16` |
|
|
|
|
|
### Using Individual Components |
|
|
|
|
|
```python |
|
|
from diffusers import AutoencoderKL |
|
|
from transformers import T5EncoderModel, CLIPTextModel |
|
|
|
|
|
# Load text encoders |
|
|
t5_encoder = T5EncoderModel.from_pretrained( |
|
|
"E:/huggingface/flux-dev-fp16/text_encoders", |
|
|
torch_dtype=torch.float16, |
|
|
filename="t5xxl_fp16.safetensors" |
|
|
) |
|
|
|
|
|
clip_encoder = CLIPTextModel.from_pretrained( |
|
|
"E:/huggingface/flux-dev-fp16/text_encoders", |
|
|
torch_dtype=torch.float16, |
|
|
filename="clip_l.safetensors" |
|
|
) |
|
|
|
|
|
# Load VAE |
|
|
vae = AutoencoderKL.from_pretrained( |
|
|
"E:/huggingface/flux-dev-fp16/vae/flux", |
|
|
torch_dtype=torch.bfloat16, |
|
|
filename="flux-vae-bf16.safetensors" |
|
|
) |
|
|
``` |
|
|
|
|
|
## Model Specifications |
|
|
|
|
|
**Architecture**: |
|
|
- **Type**: Latent Diffusion Transformer |
|
|
- **Parameters**: ~12B (diffusion model) |
|
|
- **Text Encoders**: |
|
|
- T5-XXL: 4.7B parameters (FP16) |
|
|
- CLIP-G: 1.3B parameters |
|
|
- CLIP-L: 235M parameters |
|
|
- **VAE**: BF16 precision (160M parameters) |
|
|
|
|
|
**Precision**: |
|
|
- **Diffusion Model**: FP16 (float16) |
|
|
- **Text Encoders**: FP16 (float16) |
|
|
- **VAE**: BF16 (bfloat16) |
|
|
|
|
|
**Format**: |
|
|
- `.safetensors` - Secure tensor format with fast loading |
|
|
|
|
|
**Resolution Support**: |
|
|
- Native: 1024x1024 |
|
|
- Range: 512x512 to 2048x2048 |
|
|
- Aspect ratios: Supports non-square resolutions |
|
|
|
|
|
## Performance Tips |
|
|
|
|
|
### Memory Optimization |
|
|
```python |
|
|
# Enable memory efficient attention |
|
|
pipe.enable_attention_slicing() |
|
|
|
|
|
# Enable VAE tiling for high resolutions |
|
|
pipe.enable_vae_tiling() |
|
|
|
|
|
# Use CPU offloading if VRAM limited (slower) |
|
|
pipe.enable_sequential_cpu_offload() |
|
|
``` |
|
|
|
|
|
### Speed Optimization |
|
|
```python |
|
|
# Use torch.compile for faster inference (PyTorch 2.0+) |
|
|
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True) |
|
|
|
|
|
# Reduce inference steps (trade quality for speed) |
|
|
image = pipe(prompt, num_inference_steps=25) # Default is 50 |
|
|
``` |
|
|
|
|
|
### Quality Optimization |
|
|
- Use 50-75 inference steps for best quality |
|
|
- Guidance scale: 7-9 for balanced results |
|
|
- Higher guidance (10-15) for stronger prompt adherence |
|
|
- Consider prompt engineering for better results |
|
|
|
|
|
## License |
|
|
|
|
|
This model is released under the **Apache 2.0 License**. |
|
|
|
|
|
**Usage Terms**: |
|
|
- β
Commercial use allowed |
|
|
- β
Modification and redistribution allowed |
|
|
- β
Patent use allowed |
|
|
- β οΈ Requires attribution to Black Forest Labs |
|
|
|
|
|
See the LICENSE file for full terms. |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model in your research or projects, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{flux-dev, |
|
|
title={FLUX.1-dev: High-Quality Text-to-Image Generation}, |
|
|
author={Black Forest Labs}, |
|
|
year={2024}, |
|
|
howpublished={\url{https://blackforestlabs.ai/}} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Related Resources |
|
|
|
|
|
- **Official Website**: https://blackforestlabs.ai/ |
|
|
- **Model Card**: https://huggingface.co/black-forest-labs/FLUX.1-dev |
|
|
- **Documentation**: https://huggingface.co/docs/diffusers/en/api/pipelines/flux |
|
|
- **Community**: https://huggingface.co/black-forest-labs |
|
|
|
|
|
## Version Information |
|
|
|
|
|
- **Model Version**: FLUX.1-dev |
|
|
- **Precision**: FP16 |
|
|
- **Release**: 2024 |
|
|
- **README Version**: v1.4 |
|
|
|
|
|
--- |
|
|
|
|
|
For FP8 precision version (lower VRAM usage), see `E:/huggingface/flux-dev-fp8/` |
|
|
|