flux-dev-fp16 / README.md

Upload folder using huggingface_hub

fa4393e verified 2 months ago

6.61 kB

	---
	license: apache-2.0
	library_name: diffusers
	pipeline_tag: text-to-image
	tags:
	- flux
	- text-to-image
	- image-generation
	- fp16
	---

	<!-- README Version: v1.4 -->

	# FLUX.1-dev FP16

	High-quality text-to-image generation model from Black Forest Labs. This repository contains the FLUX.1-dev model in FP16 precision for optimal quality and compatibility with modern GPUs.

	## Model Description

	FLUX.1-dev is a state-of-the-art text-to-image diffusion model designed for high-fidelity image generation. This FP16 version maintains full precision for maximum quality output, ideal for creative professionals and researchers requiring the highest image quality.

	Key Capabilities:
	- High-resolution text-to-image generation
	- Advanced prompt understanding with T5-XXL text encoder
	- Superior detail and coherence in generated images
	- Wide range of artistic styles and subjects
	- Multi-text encoder architecture (CLIP + T5)

	## Repository Contents

	```
	flux-dev-fp16/
	├── checkpoints/flux/
	│ └── flux1-dev-fp16.safetensors # 23 GB - Complete model checkpoint
	├── clip/
	│ └── t5xxl_fp16.safetensors # 9.2 GB - T5-XXL text encoder
	├── clip_vision/
	│ └── clip_vision_h.safetensors # CLIP vision encoder
	├── diffusion_models/flux/
	│ └── flux1-dev-fp16.safetensors # 23 GB - Diffusion model
	├── text_encoders/
	│ ├── clip-vit-large.safetensors # 1.6 GB - CLIP ViT-Large encoder
	│ ├── clip_g.safetensors # 1.3 GB - CLIP-G encoder
	│ ├── clip_l.safetensors # 235 MB - CLIP-L encoder
	│ └── t5xxl_fp16.safetensors # 9.2 GB - T5-XXL encoder
	└── vae/flux/
	└── flux-vae-bf16.safetensors # 160 MB - VAE decoder (BF16)

	Total Size: ~72 GB
	```

	## Hardware Requirements

	### Minimum Requirements
	- VRAM: 24 GB (RTX 3090, RTX 4090, A5000, A6000)
	- RAM: 32 GB system memory
	- Disk Space: 80 GB free space
	- GPU: NVIDIA GPU with Compute Capability 7.0+ (Volta or newer)

	### Recommended Requirements
	- VRAM: 32+ GB (RTX 6000 Ada, A6000, H100)
	- RAM: 64 GB system memory
	- Disk Space: 100+ GB for workspace and outputs
	- GPU: NVIDIA RTX 4090 or professional GPUs

	### Performance Notes
	- FP16 precision provides best quality but highest VRAM usage
	- Consider FP8 version if VRAM is limited (see `flux-dev-fp8` directory)
	- Generation time: ~30-60 seconds per image at 1024x1024 (depending on GPU)

	## Usage Examples

	### Using with Diffusers Library

	```python
	import torch
	from diffusers import FluxPipeline

	# Load the pipeline with local model files
	pipe = FluxPipeline.from_pretrained(
	"E:/huggingface/flux-dev-fp16",
	torch_dtype=torch.float16
	)
	pipe = pipe.to("cuda")

	# Generate an image
	prompt = "A majestic lion standing on a cliff at sunset, cinematic lighting, photorealistic"
	image = pipe(
	prompt=prompt,
	num_inference_steps=50,
	guidance_scale=7.5,
	height=1024,
	width=1024
	).images[0]

	image.save("output.png")
	```

	### Using with ComfyUI

	1. Copy model files to ComfyUI directories:
	- `checkpoints/flux/flux1-dev-fp16.safetensors` → `ComfyUI/models/checkpoints/`
	- `text_encoders/*.safetensors` → `ComfyUI/models/clip/`
	- `vae/flux/flux-vae-bf16.safetensors` → `ComfyUI/models/vae/`

	2. In ComfyUI:
	- Load Checkpoint: Select `flux1-dev-fp16`
	- Text Encoder: Automatically loaded
	- VAE: Select `flux-vae-bf16`

	### Using Individual Components

	```python
	from diffusers import AutoencoderKL
	from transformers import T5EncoderModel, CLIPTextModel

	# Load text encoders
	t5_encoder = T5EncoderModel.from_pretrained(
	"E:/huggingface/flux-dev-fp16/text_encoders",
	torch_dtype=torch.float16,
	filename="t5xxl_fp16.safetensors"
	)

	clip_encoder = CLIPTextModel.from_pretrained(
	"E:/huggingface/flux-dev-fp16/text_encoders",
	torch_dtype=torch.float16,
	filename="clip_l.safetensors"
	)

	# Load VAE
	vae = AutoencoderKL.from_pretrained(
	"E:/huggingface/flux-dev-fp16/vae/flux",
	torch_dtype=torch.bfloat16,
	filename="flux-vae-bf16.safetensors"
	)
	```

	## Model Specifications

	Architecture:
	- Type: Latent Diffusion Transformer
	- Parameters: ~12B (diffusion model)
	- Text Encoders:
	- T5-XXL: 4.7B parameters (FP16)
	- CLIP-G: 1.3B parameters
	- CLIP-L: 235M parameters
	- VAE: BF16 precision (160M parameters)

	Precision:
	- Diffusion Model: FP16 (float16)
	- Text Encoders: FP16 (float16)
	- VAE: BF16 (bfloat16)

	Format:
	- `.safetensors` - Secure tensor format with fast loading

	Resolution Support:
	- Native: 1024x1024
	- Range: 512x512 to 2048x2048
	- Aspect ratios: Supports non-square resolutions

	## Performance Tips

	### Memory Optimization
	```python
	# Enable memory efficient attention
	pipe.enable_attention_slicing()

	# Enable VAE tiling for high resolutions
	pipe.enable_vae_tiling()

	# Use CPU offloading if VRAM limited (slower)
	pipe.enable_sequential_cpu_offload()
	```

	### Speed Optimization
	```python
	# Use torch.compile for faster inference (PyTorch 2.0+)
	pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)

	# Reduce inference steps (trade quality for speed)
	image = pipe(prompt, num_inference_steps=25) # Default is 50
	```

	### Quality Optimization
	- Use 50-75 inference steps for best quality
	- Guidance scale: 7-9 for balanced results
	- Higher guidance (10-15) for stronger prompt adherence
	- Consider prompt engineering for better results

	## License

	This model is released under the Apache 2.0 License.

	Usage Terms:
	- ✅ Commercial use allowed
	- ✅ Modification and redistribution allowed
	- ✅ Patent use allowed
	- ⚠️ Requires attribution to Black Forest Labs

	See the LICENSE file for full terms.

	## Citation

	If you use this model in your research or projects, please cite:

	```bibtex
	@misc{flux-dev,
	title={FLUX.1-dev: High-Quality Text-to-Image Generation},
	author={Black Forest Labs},
	year={2024},
	howpublished={\url{https://blackforestlabs.ai/}}
	}
	```

	## Related Resources

	- Official Website: https://blackforestlabs.ai/
	- Model Card: https://huggingface.co/black-forest-labs/FLUX.1-dev
	- Documentation: https://huggingface.co/docs/diffusers/en/api/pipelines/flux
	- Community: https://huggingface.co/black-forest-labs

	## Version Information

	- Model Version: FLUX.1-dev
	- Precision: FP16
	- Release: 2024
	- README Version: v1.4

	---

	For FP8 precision version (lower VRAM usage), see `E:/huggingface/flux-dev-fp8/`