File size: 6,907 Bytes

---
license: mit
language:
- en
tags:
- diffusion
- flow-matching
- flux
- text-to-image
- image-generation
- tiny
- experimental
library_name: pytorch
pipeline_tag: text-to-image
base_model:
- black-forest-labs/FLUX.1-schnell
datasets:
- AbstractPhil/flux-schnell-teacher-latents
---

# TinyFlux

A **/12 scaled** Flux architecture for experimentation and research. TinyFlux maintains the core MMDiT (Multimodal Diffusion Transformer) design of Flux while dramatically reducing parameter count for faster iteration and lower resource requirements.

## Model Description

TinyFlux is a miniaturized version of [FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) that preserves the essential architectural components:

- **Double-stream blocks** (MMDiT style) - separate text/image pathways with joint attention
- **Single-stream blocks** - concatenated text+image with shared weights  
- **AdaLN-Zero modulation** - adaptive layer norm with gating
- **3D RoPE** - rotary position embeddings for temporal + spatial positions
- **Flow matching** - rectified flow training objective

### Architecture Comparison

| Component | Flux | TinyFlux | Scale |
|-----------|------|----------|-------|
| Hidden size | 3072 | 256 | /12 |
| Attention heads | 24 | 2 | /12 |
| Head dimension | 128 | 128 | preserved |
| Double-stream layers | 19 | 3 | /6 |
| Single-stream layers | 38 | 3 | /12 |
| VAE channels | 16 | 16 | preserved |
| **Total params** | ~12B | ~8M | /1500 |

### Text Encoders

TinyFlux uses smaller text encoders than standard Flux:

| Role | Flux | TinyFlux |
|------|------|----------|
| Sequence encoder | T5-XXL (4096 dim) | flan-t5-base (768 dim) |
| Pooled encoder | CLIP-L (768 dim) | CLIP-L (768 dim) |

## Training

### Dataset

Trained on [AbstractPhil/flux-schnell-teacher-latents](https://huggingface.co/datasets/AbstractPhil/flux-schnell-teacher-latents):
- 10,000 samples
- Pre-computed VAE latents (16, 64, 64) from 512×512 images
- Diverse prompts covering people, objects, scenes, styles

### Training Details

- **Objective**: Flow matching (rectified flow)
- **Timestep sampling**: Logit-normal with Flux shift (s=3.0)
- **Loss weighting**: Min-SNR-γ (γ=5.0)
- **Optimizer**: AdamW (lr=1e-4, β=(0.9, 0.99), wd=0.01)
- **Schedule**: Cosine with warmup
- **Precision**: bfloat16

### Flow Matching Formulation

```
Interpolation: x_t = (1 - t) * noise + t * data
Target velocity: v = data - noise
Loss: MSE(predicted_v, target_v) * min_snr_weight(t)
```

## Usage

### Installation

```bash
pip install torch transformers diffusers safetensors huggingface_hub
```

### Inference

```python
import torch
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
from transformers import T5EncoderModel, T5Tokenizer, CLIPTextModel, CLIPTokenizer
from diffusers import AutoencoderKL

# Load model (copy TinyFlux class definition first)
config = TinyFluxConfig()
model = TinyFlux(config).to("cuda").to(torch.bfloat16)

weights = load_file(hf_hub_download("AbstractPhil/tiny-flux", "model.safetensors"))
model.load_state_dict(weights)
model.eval()

# Load encoders
t5_tok = T5Tokenizer.from_pretrained("google/flan-t5-base")
t5_enc = T5EncoderModel.from_pretrained("google/flan-t5-base", torch_dtype=torch.bfloat16).to("cuda")
clip_tok = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
clip_enc = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14", torch_dtype=torch.bfloat16).to("cuda")
vae = AutoencoderKL.from_pretrained("black-forest-labs/FLUX.1-schnell", subfolder="vae", torch_dtype=torch.bfloat16).to("cuda")

# Encode prompt
prompt = "a photo of a cat"
t5_in = t5_tok(prompt, max_length=128, padding="max_length", truncation=True, return_tensors="pt").to("cuda")
t5_out = t5_enc(**t5_in).last_hidden_state
clip_in = clip_tok(prompt, max_length=77, padding="max_length", truncation=True, return_tensors="pt").to("cuda")
clip_out = clip_enc(**clip_in).pooler_output

# Euler sampling (t: 0→1, noise→data)
x = torch.randn(1, 64*64, 16, device="cuda", dtype=torch.bfloat16)
img_ids = TinyFlux.create_img_ids(1, 64, 64, "cuda")
timesteps = torch.linspace(0, 1, 21, device="cuda")

for i in range(20):
    t = timesteps[i].unsqueeze(0)
    dt = timesteps[i+1] - timesteps[i]
    guidance = torch.tensor([3.5], device="cuda", dtype=torch.bfloat16)
    
    v = model(
        hidden_states=x,
        encoder_hidden_states=t5_out,
        pooled_projections=clip_out,
        timestep=t,
        img_ids=img_ids,
        guidance=guidance,
    )
    x = x + v * dt

# Decode
latents = x.reshape(1, 64, 64, 16).permute(0, 3, 1, 2)
latents = latents / vae.config.scaling_factor
image = vae.decode(latents.float()).sample
image = (image / 2 + 0.5).clamp(0, 1)
```

### Full Inference Script

See the [inference_colab.py](https://huggingface.co/AbstractPhil/tiny-flux/blob/main/inference_colab.py) for a complete generation pipeline with:
- Classifier-free guidance
- Batch generation
- Image saving

## Files

```
AbstractPhil/tiny-flux/
├── model.safetensors      # Model weights (~32MB)
├── config.json            # Model configuration
├── README.md              # This file
├── model.py               # Model architecture definition
├── inference_colab.py     # Inference script
├── train_colab.py         # Training script
├── checkpoints/           # Training checkpoints
│   └── step_*.safetensors
├── logs/                  # Tensorboard logs
└── samples/               # Generated samples during training
```

## Limitations

- **Resolution**: Trained on 512×512 only
- **Quality**: Significantly lower than full Flux due to reduced capacity
- **Text understanding**: Limited by smaller T5 encoder (768 vs 4096 dim)
- **Fine details**: May struggle with complex scenes or fine-grained details
- **Experimental**: Intended for research and learning, not production use

## Intended Use

- Understanding Flux/MMDiT architecture
- Rapid prototyping and experimentation
- Educational purposes
- Resource-constrained environments
- Baseline for architecture modifications

## Citation

If you use TinyFlux in your research, please cite:

```bibtex
@misc{tinyflux2025,
  title={TinyFlux: A Miniaturized Flux Architecture for Experimentation},
  author={AbstractPhil},
  year={2025},
  url={https://huggingface.co/AbstractPhil/tiny-flux}
}
```

## Acknowledgments

- [Black Forest Labs](https://blackforestlabs.ai/) for the original Flux architecture
- [Hugging Face](https://huggingface.co/) for diffusers and transformers libraries

## License

MIT License - See LICENSE file for details.

---

**Note**: This is an experimental research model. For high-quality image generation, use the full [FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) or [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) models.