AbstractPhil
/

tiny-flux

+---
+license: mit
+language:
+- en
+tags:
+- diffusion
+- flow-matching
+- flux
+- text-to-image
+- image-generation
+- tiny
+- experimental
+library_name: pytorch
+pipeline_tag: text-to-image
+base_model:
+- black-forest-labs/FLUX.1-schnell
+datasets:
+- AbstractPhil/flux-schnell-teacher-latents
+---
+# TinyFlux
+A **/12 scaled** Flux architecture for experimentation and research. TinyFlux maintains the core MMDiT (Multimodal Diffusion Transformer) design of Flux while dramatically reducing parameter count for faster iteration and lower resource requirements.
+## Model Description
+TinyFlux is a miniaturized version of [FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) that preserves the essential architectural components:
+- **Double-stream blocks** (MMDiT style) - separate text/image pathways with joint attention
+- **Single-stream blocks** - concatenated text+image with shared weights
+- **AdaLN-Zero modulation** - adaptive layer norm with gating
+- **3D RoPE** - rotary position embeddings for temporal + spatial positions
+- **Flow matching** - rectified flow training objective
+### Architecture Comparison
+| Component | Flux | TinyFlux | Scale |
+|-----------|------|----------|-------|
+| Hidden size | 3072 | 256 | /12 |
+| Attention heads | 24 | 2 | /12 |
+| Head dimension | 128 | 128 | preserved |
+| Double-stream layers | 19 | 3 | /6 |
+| Single-stream layers | 38 | 3 | /12 |
+| VAE channels | 16 | 16 | preserved |
+| **Total params** | ~12B | ~8M | /1500 |
+### Text Encoders
+TinyFlux uses smaller text encoders than standard Flux:
+| Role | Flux | TinyFlux |
+|------|------|----------|
+| Sequence encoder | T5-XXL (4096 dim) | flan-t5-base (768 dim) |
+| Pooled encoder | CLIP-L (768 dim) | CLIP-L (768 dim) |
+## Training
+### Dataset
+Trained on [AbstractPhil/flux-schnell-teacher-latents](https://huggingface.co/datasets/AbstractPhil/flux-schnell-teacher-latents):
+- 10,000 samples
+- Pre-computed VAE latents (16, 64, 64) from 512×512 images
+- Diverse prompts covering people, objects, scenes, styles
+### Training Details
+- **Objective**: Flow matching (rectified flow)
+- **Timestep sampling**: Logit-normal with Flux shift (s=3.0)
+- **Loss weighting**: Min-SNR-γ (γ=5.0)
+- **Optimizer**: AdamW (lr=1e-4, β=(0.9, 0.99), wd=0.01)
+- **Schedule**: Cosine with warmup
+- **Precision**: bfloat16
+### Flow Matching Formulation
+```
+Interpolation: x_t = (1 - t) * noise + t * data
+Target velocity: v = data - noise
+Loss: MSE(predicted_v, target_v) * min_snr_weight(t)
+```
+## Usage
+### Installation
+```bash
+pip install torch transformers diffusers safetensors huggingface_hub
+```
+### Inference
+```python
+import torch
+from huggingface_hub import hf_hub_download
+from safetensors.torch import load_file
+from transformers import T5EncoderModel, T5Tokenizer, CLIPTextModel, CLIPTokenizer
+from diffusers import AutoencoderKL
+# Load model (copy TinyFlux class definition first)
+config = TinyFluxConfig()
+model = TinyFlux(config).to("cuda").to(torch.bfloat16)
+weights = load_file(hf_hub_download("AbstractPhil/tiny-flux", "model.safetensors"))
+model.load_state_dict(weights)
+model.eval()
+# Load encoders
+t5_tok = T5Tokenizer.from_pretrained("google/flan-t5-base")
+t5_enc = T5EncoderModel.from_pretrained("google/flan-t5-base", torch_dtype=torch.bfloat16).to("cuda")
+clip_tok = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
+clip_enc = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14", torch_dtype=torch.bfloat16).to("cuda")
+vae = AutoencoderKL.from_pretrained("black-forest-labs/FLUX.1-schnell", subfolder="vae", torch_dtype=torch.bfloat16).to("cuda")
+# Encode prompt
+prompt = "a photo of a cat"
+t5_in = t5_tok(prompt, max_length=128, padding="max_length", truncation=True, return_tensors="pt").to("cuda")
+t5_out = t5_enc(**t5_in).last_hidden_state
+clip_in = clip_tok(prompt, max_length=77, padding="max_length", truncation=True, return_tensors="pt").to("cuda")
+clip_out = clip_enc(**clip_in).pooler_output
+# Euler sampling (t: 0→1, noise→data)
+x = torch.randn(1, 64*64, 16, device="cuda", dtype=torch.bfloat16)
+img_ids = TinyFlux.create_img_ids(1, 64, 64, "cuda")
+timesteps = torch.linspace(0, 1, 21, device="cuda")
+for i in range(20):
+    t = timesteps[i].unsqueeze(0)
+    dt = timesteps[i+1] - timesteps[i]
+    guidance = torch.tensor([3.5], device="cuda", dtype=torch.bfloat16)
+    v = model(
+        hidden_states=x,
+        encoder_hidden_states=t5_out,
+        pooled_projections=clip_out,
+        timestep=t,
+        img_ids=img_ids,
+        guidance=guidance,
+    )
+    x = x + v * dt
+# Decode
+latents = x.reshape(1, 64, 64, 16).permute(0, 3, 1, 2)
+latents = latents / vae.config.scaling_factor
+image = vae.decode(latents.float()).sample
+image = (image / 2 + 0.5).clamp(0, 1)
+```
+### Full Inference Script
+See the [inference_colab.py](https://huggingface.co/AbstractPhil/tiny-flux/blob/main/inference_colab.py) for a complete generation pipeline with:
+- Classifier-free guidance
+- Batch generation
+- Image saving
+## Files
+```
+AbstractPhil/tiny-flux/
+├── model.safetensors      # Model weights (~32MB)
+├── config.json            # Model configuration
+├── README.md              # This file
+├── model.py               # Model architecture definition
+├── inference_colab.py     # Inference script
+├── train_colab.py         # Training script
+├── checkpoints/           # Training checkpoints
+│   └── step_*.safetensors
+├── logs/                  # Tensorboard logs
+└── samples/               # Generated samples during training
+```
+## Limitations
+- **Resolution**: Trained on 512×512 only
+- **Quality**: Significantly lower than full Flux due to reduced capacity
+- **Text understanding**: Limited by smaller T5 encoder (768 vs 4096 dim)
+- **Fine details**: May struggle with complex scenes or fine-grained details
+- **Experimental**: Intended for research and learning, not production use
+## Intended Use
+- Understanding Flux/MMDiT architecture
+- Rapid prototyping and experimentation
+- Educational purposes
+- Resource-constrained environments
+- Baseline for architecture modifications
+## Citation
+If you use TinyFlux in your research, please cite:
+```bibtex
+@misc{tinyflux2025,
+  title={TinyFlux: A Miniaturized Flux Architecture for Experimentation},
+  author={AbstractPhil},
+  year={2025},
+  url={https://huggingface.co/AbstractPhil/tiny-flux}
+}
+```
+## Acknowledgments
+- [Black Forest Labs](https://blackforestlabs.ai/) for the original Flux architecture
+- [Hugging Face](https://huggingface.co/) for diffusers and transformers libraries
+## License
+MIT License - See LICENSE file for details.
+---
+**Note**: This is an experimental research model. For high-quality image generation, use the full [FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) or [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) models.