tiny-flux / README.md
AbstractPhil's picture
Update README.md
f6ce539 verified
---
license: mit
language:
- en
tags:
- diffusion
- flow-matching
- flux
- text-to-image
- image-generation
- tiny
- experimental
library_name: pytorch
pipeline_tag: text-to-image
base_model:
- black-forest-labs/FLUX.1-schnell
datasets:
- AbstractPhil/flux-schnell-teacher-latents
---
# TinyFlux
A **/12 scaled** Flux architecture for experimentation and research. TinyFlux maintains the core MMDiT (Multimodal Diffusion Transformer) design of Flux while dramatically reducing parameter count for faster iteration and lower resource requirements.
## Model Description
TinyFlux is a miniaturized version of [FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) that preserves the essential architectural components:
- **Double-stream blocks** (MMDiT style) - separate text/image pathways with joint attention
- **Single-stream blocks** - concatenated text+image with shared weights
- **AdaLN-Zero modulation** - adaptive layer norm with gating
- **3D RoPE** - rotary position embeddings for temporal + spatial positions
- **Flow matching** - rectified flow training objective
### Architecture Comparison
| Component | Flux | TinyFlux | Scale |
|-----------|------|----------|-------|
| Hidden size | 3072 | 256 | /12 |
| Attention heads | 24 | 2 | /12 |
| Head dimension | 128 | 128 | preserved |
| Double-stream layers | 19 | 3 | /6 |
| Single-stream layers | 38 | 3 | /12 |
| VAE channels | 16 | 16 | preserved |
| **Total params** | ~12B | ~8M | /1500 |
### Text Encoders
TinyFlux uses smaller text encoders than standard Flux:
| Role | Flux | TinyFlux |
|------|------|----------|
| Sequence encoder | T5-XXL (4096 dim) | flan-t5-base (768 dim) |
| Pooled encoder | CLIP-L (768 dim) | CLIP-L (768 dim) |
## Training
### Dataset
Trained on [AbstractPhil/flux-schnell-teacher-latents](https://huggingface.co/datasets/AbstractPhil/flux-schnell-teacher-latents):
- 10,000 samples
- Pre-computed VAE latents (16, 64, 64) from 512Γ—512 images
- Diverse prompts covering people, objects, scenes, styles
### Training Details
- **Objective**: Flow matching (rectified flow)
- **Timestep sampling**: Logit-normal with Flux shift (s=3.0)
- **Loss weighting**: Min-SNR-Ξ³ (Ξ³=5.0)
- **Optimizer**: AdamW (lr=1e-4, Ξ²=(0.9, 0.99), wd=0.01)
- **Schedule**: Cosine with warmup
- **Precision**: bfloat16
### Flow Matching Formulation
```
Interpolation: x_t = (1 - t) * noise + t * data
Target velocity: v = data - noise
Loss: MSE(predicted_v, target_v) * min_snr_weight(t)
```
## Usage
### Installation
```bash
pip install torch transformers diffusers safetensors huggingface_hub
```
### Inference
```python
import torch
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
from transformers import T5EncoderModel, T5Tokenizer, CLIPTextModel, CLIPTokenizer
from diffusers import AutoencoderKL
# Load model (copy TinyFlux class definition first)
config = TinyFluxConfig()
model = TinyFlux(config).to("cuda").to(torch.bfloat16)
weights = load_file(hf_hub_download("AbstractPhil/tiny-flux", "model.safetensors"))
model.load_state_dict(weights)
model.eval()
# Load encoders
t5_tok = T5Tokenizer.from_pretrained("google/flan-t5-base")
t5_enc = T5EncoderModel.from_pretrained("google/flan-t5-base", torch_dtype=torch.bfloat16).to("cuda")
clip_tok = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
clip_enc = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14", torch_dtype=torch.bfloat16).to("cuda")
vae = AutoencoderKL.from_pretrained("black-forest-labs/FLUX.1-schnell", subfolder="vae", torch_dtype=torch.bfloat16).to("cuda")
# Encode prompt
prompt = "a photo of a cat"
t5_in = t5_tok(prompt, max_length=128, padding="max_length", truncation=True, return_tensors="pt").to("cuda")
t5_out = t5_enc(**t5_in).last_hidden_state
clip_in = clip_tok(prompt, max_length=77, padding="max_length", truncation=True, return_tensors="pt").to("cuda")
clip_out = clip_enc(**clip_in).pooler_output
# Euler sampling (t: 0→1, noise→data)
x = torch.randn(1, 64*64, 16, device="cuda", dtype=torch.bfloat16)
img_ids = TinyFlux.create_img_ids(1, 64, 64, "cuda")
timesteps = torch.linspace(0, 1, 21, device="cuda")
for i in range(20):
t = timesteps[i].unsqueeze(0)
dt = timesteps[i+1] - timesteps[i]
guidance = torch.tensor([3.5], device="cuda", dtype=torch.bfloat16)
v = model(
hidden_states=x,
encoder_hidden_states=t5_out,
pooled_projections=clip_out,
timestep=t,
img_ids=img_ids,
guidance=guidance,
)
x = x + v * dt
# Decode
latents = x.reshape(1, 64, 64, 16).permute(0, 3, 1, 2)
latents = latents / vae.config.scaling_factor
image = vae.decode(latents.float()).sample
image = (image / 2 + 0.5).clamp(0, 1)
```
### Full Inference Script
See the [inference_colab.py](https://huggingface.co/AbstractPhil/tiny-flux/blob/main/inference_colab.py) for a complete generation pipeline with:
- Classifier-free guidance
- Batch generation
- Image saving
## Files
```
AbstractPhil/tiny-flux/
β”œβ”€β”€ model.safetensors # Model weights (~32MB)
β”œβ”€β”€ config.json # Model configuration
β”œβ”€β”€ README.md # This file
β”œβ”€β”€ model.py # Model architecture definition
β”œβ”€β”€ inference_colab.py # Inference script
β”œβ”€β”€ train_colab.py # Training script
β”œβ”€β”€ checkpoints/ # Training checkpoints
β”‚ └── step_*.safetensors
β”œβ”€β”€ logs/ # Tensorboard logs
└── samples/ # Generated samples during training
```
## Limitations
- **Resolution**: Trained on 512Γ—512 only
- **Quality**: Significantly lower than full Flux due to reduced capacity
- **Text understanding**: Limited by smaller T5 encoder (768 vs 4096 dim)
- **Fine details**: May struggle with complex scenes or fine-grained details
- **Experimental**: Intended for research and learning, not production use
## Intended Use
- Understanding Flux/MMDiT architecture
- Rapid prototyping and experimentation
- Educational purposes
- Resource-constrained environments
- Baseline for architecture modifications
## Citation
If you use TinyFlux in your research, please cite:
```bibtex
@misc{tinyflux2025,
title={TinyFlux: A Miniaturized Flux Architecture for Experimentation},
author={AbstractPhil},
year={2025},
url={https://huggingface.co/AbstractPhil/tiny-flux}
}
```
## Acknowledgments
- [Black Forest Labs](https://blackforestlabs.ai/) for the original Flux architecture
- [Hugging Face](https://huggingface.co/) for diffusers and transformers libraries
## License
MIT License - See LICENSE file for details.
---
**Note**: This is an experimental research model. For high-quality image generation, use the full [FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) or [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) models.