File size: 6,609 Bytes
c22a27f e8413da c22a27f fa4393e e8413da fa4393e c22a27f fa4393e c22a27f fa4393e f061223 fa4393e c22a27f f061223 fa4393e f061223 fa4393e 1b86108 fa4393e e8413da fa4393e c22a27f fa4393e f061223 c22a27f fa4393e f061223 fa4393e f061223 fa4393e f061223 fa4393e f061223 fa4393e c22a27f f061223 fa4393e c22a27f f061223 c22a27f fa4393e c22a27f fa4393e c22a27f fa4393e c22a27f f061223 c22a27f f061223 c22a27f fa4393e f061223 fa4393e f061223 fa4393e f061223 fa4393e f061223 fa4393e f061223 fa4393e f061223 fa4393e f061223 fa4393e f061223 fa4393e f061223 fa4393e f061223 fa4393e f061223 fa4393e f061223 fa4393e f061223 fa4393e c22a27f fa4393e f061223 c22a27f fa4393e f061223 fa4393e c22a27f fa4393e f061223 fa4393e f061223 fa4393e c22a27f fa4393e f061223 c22a27f fa4393e c22a27f fa4393e f061223 fa4393e f061223 fa4393e f061223 fa4393e f061223 c22a27f fa4393e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 |
---
license: apache-2.0
library_name: diffusers
pipeline_tag: text-to-image
tags:
- flux
- text-to-image
- image-generation
- fp16
---
<!-- README Version: v1.4 -->
# FLUX.1-dev FP16
High-quality text-to-image generation model from Black Forest Labs. This repository contains the FLUX.1-dev model in FP16 precision for optimal quality and compatibility with modern GPUs.
## Model Description
FLUX.1-dev is a state-of-the-art text-to-image diffusion model designed for high-fidelity image generation. This FP16 version maintains full precision for maximum quality output, ideal for creative professionals and researchers requiring the highest image quality.
**Key Capabilities**:
- High-resolution text-to-image generation
- Advanced prompt understanding with T5-XXL text encoder
- Superior detail and coherence in generated images
- Wide range of artistic styles and subjects
- Multi-text encoder architecture (CLIP + T5)
## Repository Contents
```
flux-dev-fp16/
βββ checkpoints/flux/
β βββ flux1-dev-fp16.safetensors # 23 GB - Complete model checkpoint
βββ clip/
β βββ t5xxl_fp16.safetensors # 9.2 GB - T5-XXL text encoder
βββ clip_vision/
β βββ clip_vision_h.safetensors # CLIP vision encoder
βββ diffusion_models/flux/
β βββ flux1-dev-fp16.safetensors # 23 GB - Diffusion model
βββ text_encoders/
β βββ clip-vit-large.safetensors # 1.6 GB - CLIP ViT-Large encoder
β βββ clip_g.safetensors # 1.3 GB - CLIP-G encoder
β βββ clip_l.safetensors # 235 MB - CLIP-L encoder
β βββ t5xxl_fp16.safetensors # 9.2 GB - T5-XXL encoder
βββ vae/flux/
βββ flux-vae-bf16.safetensors # 160 MB - VAE decoder (BF16)
Total Size: ~72 GB
```
## Hardware Requirements
### Minimum Requirements
- **VRAM**: 24 GB (RTX 3090, RTX 4090, A5000, A6000)
- **RAM**: 32 GB system memory
- **Disk Space**: 80 GB free space
- **GPU**: NVIDIA GPU with Compute Capability 7.0+ (Volta or newer)
### Recommended Requirements
- **VRAM**: 32+ GB (RTX 6000 Ada, A6000, H100)
- **RAM**: 64 GB system memory
- **Disk Space**: 100+ GB for workspace and outputs
- **GPU**: NVIDIA RTX 4090 or professional GPUs
### Performance Notes
- FP16 precision provides best quality but highest VRAM usage
- Consider FP8 version if VRAM is limited (see `flux-dev-fp8` directory)
- Generation time: ~30-60 seconds per image at 1024x1024 (depending on GPU)
## Usage Examples
### Using with Diffusers Library
```python
import torch
from diffusers import FluxPipeline
# Load the pipeline with local model files
pipe = FluxPipeline.from_pretrained(
"E:/huggingface/flux-dev-fp16",
torch_dtype=torch.float16
)
pipe = pipe.to("cuda")
# Generate an image
prompt = "A majestic lion standing on a cliff at sunset, cinematic lighting, photorealistic"
image = pipe(
prompt=prompt,
num_inference_steps=50,
guidance_scale=7.5,
height=1024,
width=1024
).images[0]
image.save("output.png")
```
### Using with ComfyUI
1. Copy model files to ComfyUI directories:
- `checkpoints/flux/flux1-dev-fp16.safetensors` β `ComfyUI/models/checkpoints/`
- `text_encoders/*.safetensors` β `ComfyUI/models/clip/`
- `vae/flux/flux-vae-bf16.safetensors` β `ComfyUI/models/vae/`
2. In ComfyUI:
- Load Checkpoint: Select `flux1-dev-fp16`
- Text Encoder: Automatically loaded
- VAE: Select `flux-vae-bf16`
### Using Individual Components
```python
from diffusers import AutoencoderKL
from transformers import T5EncoderModel, CLIPTextModel
# Load text encoders
t5_encoder = T5EncoderModel.from_pretrained(
"E:/huggingface/flux-dev-fp16/text_encoders",
torch_dtype=torch.float16,
filename="t5xxl_fp16.safetensors"
)
clip_encoder = CLIPTextModel.from_pretrained(
"E:/huggingface/flux-dev-fp16/text_encoders",
torch_dtype=torch.float16,
filename="clip_l.safetensors"
)
# Load VAE
vae = AutoencoderKL.from_pretrained(
"E:/huggingface/flux-dev-fp16/vae/flux",
torch_dtype=torch.bfloat16,
filename="flux-vae-bf16.safetensors"
)
```
## Model Specifications
**Architecture**:
- **Type**: Latent Diffusion Transformer
- **Parameters**: ~12B (diffusion model)
- **Text Encoders**:
- T5-XXL: 4.7B parameters (FP16)
- CLIP-G: 1.3B parameters
- CLIP-L: 235M parameters
- **VAE**: BF16 precision (160M parameters)
**Precision**:
- **Diffusion Model**: FP16 (float16)
- **Text Encoders**: FP16 (float16)
- **VAE**: BF16 (bfloat16)
**Format**:
- `.safetensors` - Secure tensor format with fast loading
**Resolution Support**:
- Native: 1024x1024
- Range: 512x512 to 2048x2048
- Aspect ratios: Supports non-square resolutions
## Performance Tips
### Memory Optimization
```python
# Enable memory efficient attention
pipe.enable_attention_slicing()
# Enable VAE tiling for high resolutions
pipe.enable_vae_tiling()
# Use CPU offloading if VRAM limited (slower)
pipe.enable_sequential_cpu_offload()
```
### Speed Optimization
```python
# Use torch.compile for faster inference (PyTorch 2.0+)
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
# Reduce inference steps (trade quality for speed)
image = pipe(prompt, num_inference_steps=25) # Default is 50
```
### Quality Optimization
- Use 50-75 inference steps for best quality
- Guidance scale: 7-9 for balanced results
- Higher guidance (10-15) for stronger prompt adherence
- Consider prompt engineering for better results
## License
This model is released under the **Apache 2.0 License**.
**Usage Terms**:
- β
Commercial use allowed
- β
Modification and redistribution allowed
- β
Patent use allowed
- β οΈ Requires attribution to Black Forest Labs
See the LICENSE file for full terms.
## Citation
If you use this model in your research or projects, please cite:
```bibtex
@misc{flux-dev,
title={FLUX.1-dev: High-Quality Text-to-Image Generation},
author={Black Forest Labs},
year={2024},
howpublished={\url{https://blackforestlabs.ai/}}
}
```
## Related Resources
- **Official Website**: https://blackforestlabs.ai/
- **Model Card**: https://huggingface.co/black-forest-labs/FLUX.1-dev
- **Documentation**: https://huggingface.co/docs/diffusers/en/api/pipelines/flux
- **Community**: https://huggingface.co/black-forest-labs
## Version Information
- **Model Version**: FLUX.1-dev
- **Precision**: FP16
- **Release**: 2024
- **README Version**: v1.4
---
For FP8 precision version (lower VRAM usage), see `E:/huggingface/flux-dev-fp8/`
|