File size: 6,609 Bytes
c22a27f
 
 
 
 
 
e8413da
c22a27f
 
 
 
fa4393e
e8413da
fa4393e
c22a27f
fa4393e
c22a27f
 
 
fa4393e
f061223
 
fa4393e
 
 
 
 
c22a27f
 
 
f061223
 
 
fa4393e
f061223
fa4393e
1b86108
fa4393e
 
 
 
 
 
 
 
e8413da
fa4393e
c22a27f
fa4393e
f061223
c22a27f
 
 
fa4393e
 
f061223
fa4393e
 
f061223
fa4393e
 
f061223
fa4393e
 
f061223
fa4393e
 
 
 
c22a27f
f061223
 
fa4393e
c22a27f
 
 
f061223
c22a27f
fa4393e
 
 
c22a27f
 
fa4393e
c22a27f
 
fa4393e
c22a27f
f061223
c22a27f
f061223
 
 
c22a27f
 
 
 
 
fa4393e
 
 
 
 
 
 
 
 
 
 
 
 
f061223
 
fa4393e
 
f061223
fa4393e
 
f061223
fa4393e
 
f061223
 
fa4393e
f061223
fa4393e
 
f061223
 
fa4393e
 
 
 
 
f061223
 
 
fa4393e
f061223
fa4393e
 
 
 
 
 
 
 
f061223
fa4393e
 
 
 
f061223
fa4393e
 
f061223
fa4393e
 
 
 
f061223
fa4393e
f061223
fa4393e
 
 
 
 
 
 
c22a27f
fa4393e
 
f061223
c22a27f
fa4393e
 
 
 
 
 
 
 
f061223
fa4393e
 
 
 
 
c22a27f
 
 
fa4393e
f061223
 
fa4393e
 
 
 
f061223
fa4393e
c22a27f
 
 
fa4393e
f061223
c22a27f
fa4393e
 
 
 
 
c22a27f
 
 
fa4393e
f061223
fa4393e
 
 
 
f061223
fa4393e
f061223
fa4393e
 
 
 
f061223
 
c22a27f
fa4393e
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
---
license: apache-2.0
library_name: diffusers
pipeline_tag: text-to-image
tags:
  - flux
  - text-to-image
  - image-generation
  - fp16
---

<!-- README Version: v1.4 -->

# FLUX.1-dev FP16

High-quality text-to-image generation model from Black Forest Labs. This repository contains the FLUX.1-dev model in FP16 precision for optimal quality and compatibility with modern GPUs.

## Model Description

FLUX.1-dev is a state-of-the-art text-to-image diffusion model designed for high-fidelity image generation. This FP16 version maintains full precision for maximum quality output, ideal for creative professionals and researchers requiring the highest image quality.

**Key Capabilities**:
- High-resolution text-to-image generation
- Advanced prompt understanding with T5-XXL text encoder
- Superior detail and coherence in generated images
- Wide range of artistic styles and subjects
- Multi-text encoder architecture (CLIP + T5)

## Repository Contents

```
flux-dev-fp16/
β”œβ”€β”€ checkpoints/flux/
β”‚   └── flux1-dev-fp16.safetensors          # 23 GB - Complete model checkpoint
β”œβ”€β”€ clip/
β”‚   └── t5xxl_fp16.safetensors              # 9.2 GB - T5-XXL text encoder
β”œβ”€β”€ clip_vision/
β”‚   └── clip_vision_h.safetensors           # CLIP vision encoder
β”œβ”€β”€ diffusion_models/flux/
β”‚   └── flux1-dev-fp16.safetensors          # 23 GB - Diffusion model
β”œβ”€β”€ text_encoders/
β”‚   β”œβ”€β”€ clip-vit-large.safetensors          # 1.6 GB - CLIP ViT-Large encoder
β”‚   β”œβ”€β”€ clip_g.safetensors                  # 1.3 GB - CLIP-G encoder
β”‚   β”œβ”€β”€ clip_l.safetensors                  # 235 MB - CLIP-L encoder
β”‚   └── t5xxl_fp16.safetensors              # 9.2 GB - T5-XXL encoder
└── vae/flux/
    └── flux-vae-bf16.safetensors           # 160 MB - VAE decoder (BF16)

Total Size: ~72 GB
```

## Hardware Requirements

### Minimum Requirements
- **VRAM**: 24 GB (RTX 3090, RTX 4090, A5000, A6000)
- **RAM**: 32 GB system memory
- **Disk Space**: 80 GB free space
- **GPU**: NVIDIA GPU with Compute Capability 7.0+ (Volta or newer)

### Recommended Requirements
- **VRAM**: 32+ GB (RTX 6000 Ada, A6000, H100)
- **RAM**: 64 GB system memory
- **Disk Space**: 100+ GB for workspace and outputs
- **GPU**: NVIDIA RTX 4090 or professional GPUs

### Performance Notes
- FP16 precision provides best quality but highest VRAM usage
- Consider FP8 version if VRAM is limited (see `flux-dev-fp8` directory)
- Generation time: ~30-60 seconds per image at 1024x1024 (depending on GPU)

## Usage Examples

### Using with Diffusers Library

```python
import torch
from diffusers import FluxPipeline

# Load the pipeline with local model files
pipe = FluxPipeline.from_pretrained(
    "E:/huggingface/flux-dev-fp16",
    torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

# Generate an image
prompt = "A majestic lion standing on a cliff at sunset, cinematic lighting, photorealistic"
image = pipe(
    prompt=prompt,
    num_inference_steps=50,
    guidance_scale=7.5,
    height=1024,
    width=1024
).images[0]

image.save("output.png")
```

### Using with ComfyUI

1. Copy model files to ComfyUI directories:
   - `checkpoints/flux/flux1-dev-fp16.safetensors` β†’ `ComfyUI/models/checkpoints/`
   - `text_encoders/*.safetensors` β†’ `ComfyUI/models/clip/`
   - `vae/flux/flux-vae-bf16.safetensors` β†’ `ComfyUI/models/vae/`

2. In ComfyUI:
   - Load Checkpoint: Select `flux1-dev-fp16`
   - Text Encoder: Automatically loaded
   - VAE: Select `flux-vae-bf16`

### Using Individual Components

```python
from diffusers import AutoencoderKL
from transformers import T5EncoderModel, CLIPTextModel

# Load text encoders
t5_encoder = T5EncoderModel.from_pretrained(
    "E:/huggingface/flux-dev-fp16/text_encoders",
    torch_dtype=torch.float16,
    filename="t5xxl_fp16.safetensors"
)

clip_encoder = CLIPTextModel.from_pretrained(
    "E:/huggingface/flux-dev-fp16/text_encoders",
    torch_dtype=torch.float16,
    filename="clip_l.safetensors"
)

# Load VAE
vae = AutoencoderKL.from_pretrained(
    "E:/huggingface/flux-dev-fp16/vae/flux",
    torch_dtype=torch.bfloat16,
    filename="flux-vae-bf16.safetensors"
)
```

## Model Specifications

**Architecture**:
- **Type**: Latent Diffusion Transformer
- **Parameters**: ~12B (diffusion model)
- **Text Encoders**:
  - T5-XXL: 4.7B parameters (FP16)
  - CLIP-G: 1.3B parameters
  - CLIP-L: 235M parameters
- **VAE**: BF16 precision (160M parameters)

**Precision**:
- **Diffusion Model**: FP16 (float16)
- **Text Encoders**: FP16 (float16)
- **VAE**: BF16 (bfloat16)

**Format**:
- `.safetensors` - Secure tensor format with fast loading

**Resolution Support**:
- Native: 1024x1024
- Range: 512x512 to 2048x2048
- Aspect ratios: Supports non-square resolutions

## Performance Tips

### Memory Optimization
```python
# Enable memory efficient attention
pipe.enable_attention_slicing()

# Enable VAE tiling for high resolutions
pipe.enable_vae_tiling()

# Use CPU offloading if VRAM limited (slower)
pipe.enable_sequential_cpu_offload()
```

### Speed Optimization
```python
# Use torch.compile for faster inference (PyTorch 2.0+)
pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)

# Reduce inference steps (trade quality for speed)
image = pipe(prompt, num_inference_steps=25)  # Default is 50
```

### Quality Optimization
- Use 50-75 inference steps for best quality
- Guidance scale: 7-9 for balanced results
- Higher guidance (10-15) for stronger prompt adherence
- Consider prompt engineering for better results

## License

This model is released under the **Apache 2.0 License**.

**Usage Terms**:
- βœ… Commercial use allowed
- βœ… Modification and redistribution allowed
- βœ… Patent use allowed
- ⚠️ Requires attribution to Black Forest Labs

See the LICENSE file for full terms.

## Citation

If you use this model in your research or projects, please cite:

```bibtex
@misc{flux-dev,
  title={FLUX.1-dev: High-Quality Text-to-Image Generation},
  author={Black Forest Labs},
  year={2024},
  howpublished={\url{https://blackforestlabs.ai/}}
}
```

## Related Resources

- **Official Website**: https://blackforestlabs.ai/
- **Model Card**: https://huggingface.co/black-forest-labs/FLUX.1-dev
- **Documentation**: https://huggingface.co/docs/diffusers/en/api/pipelines/flux
- **Community**: https://huggingface.co/black-forest-labs

## Version Information

- **Model Version**: FLUX.1-dev
- **Precision**: FP16
- **Release**: 2024
- **README Version**: v1.4

---

For FP8 precision version (lower VRAM usage), see `E:/huggingface/flux-dev-fp8/`