tiny-flux-deep / README.md
AbstractPhil's picture
Update README.md
ceb8f46 verified
|
raw
history blame
7.89 kB
---
license: mit
language:
- en
tags:
- diffusion
- flow-matching
- flux
- text-to-image
- image-generation
- tinyflux
- lailah
- experimental
library_name: pytorch
pipeline_tag: text-to-image
base_model:
- AbstractPhil/tiny-flux
- black-forest-labs/FLUX.1-schnell
datasets:
- AbstractPhil/flux-schnell-teacher-latents
- AbstractPhil/imagenet-synthetic
---
# TinyFlux-Deep (Lailah)
**TinyFlux-Lailah** is an expanded TinyFlux architecture with increased depth and width. Originally ported from [TinyFlux](https://huggingface.co/AbstractPhil/tiny-flux) with strategic layer expansion and attention head doubling, now training end-to-end on teacher latents.
## Quick Start (Colab)
The easiest way to test Lailah:
1. Open [Google Colab](https://colab.research.google.com/)
2. Copy the contents of [`inference_v3.py`](./inference_v3.py) and [`model_v3.py`](./model_v3.py)
3. Run the cells
```python
# Or fetch directly:
!wget https://huggingface.co/AbstractPhil/tiny-flux-deep/raw/main/inference_v3.py
%run inference_v3.py
```
## Fair Weights
### ImageNet Synthetic step_346875
* Handles multiple animal combination variants with high fidelity
https://huggingface.co/datasets/AbstractPhil/imagenet-synthetic
"subject, animal, cat, photograph of a tiger, natural habitat"
![image](https://cdn-uploads.huggingface.co/production/uploads/630cf55b15433862cfc9556f/uJ9Ffh780iLgEIJhmafod.png)
"subject, bird, blue beak, red eyes, green claws"
![image](https://cdn-uploads.huggingface.co/production/uploads/630cf55b15433862cfc9556f/GRS5tyaFFa0HV2xSJCsin.png)
"subject, bird, red haired bird in a tree"
![image](https://cdn-uploads.huggingface.co/production/uploads/630cf55b15433862cfc9556f/rGourHokJsPtYNnoFi3Eq.png)
![image](https://cdn-uploads.huggingface.co/production/uploads/630cf55b15433862cfc9556f/O_z6DLc32HDNBq3ZwEjqf.png)
## Architecture
| Component | TinyFlux | TinyFlux-Lailah | Flux |
|-----------|----------|-----------------|------|
| Hidden size | 256 | **512** | 3072 |
| Attention heads | 2 | **4** | 24 |
| Head dimension | 128 | 128 | 128 |
| Double-stream layers | 3 | **15** | 19 |
| Single-stream layers | 3 | **25** | 38 |
| VAE channels | 16 | 16 | 16 |
| **Total params** | ~10.7M | **~241.8M** | ~12B |
### Text Encoders
| Role | Model | Dimension |
|------|-------|-----------|
| Sequence encoder | flan-t5-base | 768 |
| Pooled encoder | CLIP-L | 768 |
## Training
### Current Approach
All parameters are trainable. The model was initially ported from TinyFlux with frozen anchor layers, but current training runs with everything unfrozen for maximum flexibility.
### Dataset
Training on [AbstractPhil/flux-schnell-teacher-latents](https://huggingface.co/datasets/AbstractPhil/flux-schnell-teacher-latents):
- Pre-computed VAE latents from Flux-Schnell generations
- 512Γ—512 resolution (64Γ—64 latent space)
- Diverse prompts covering people, objects, scenes, styles
### Training Details
- **Objective**: Flow matching (rectified flow)
- **Timestep sampling**: Logit-normal with Flux shift (s=3.0)
- **Loss weighting**: Min-SNR-Ξ³ (Ξ³=5.0)
- **Optimizer**: AdamW (lr=3e-4, Ξ²=(0.9, 0.99), wd=0.01)
- **Schedule**: Cosine with warmup
- **Precision**: bfloat16
- **Batch size**: 32 (16 Γ— 2 gradient accumulation)
- **EMA decay**: 0.9999
### Checkpoints
Checkpoints are saved every epoch or so with both main and EMA weights:
- `checkpoints/step_XXXXX.safetensors` - Training weights
- `checkpoints/step_XXXXX_ema.safetensors` - EMA weights (currently very broken and retraining, use standard step to inference)
## Usage
### Dependencies
```bash
pip install torch transformers diffusers safetensors huggingface_hub
```
### Basic Inference
```python
import torch
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
# Load model (requires TinyFluxDeep class from tinyflux_deep.py)
config = TinyFluxDeepConfig()
model = TinyFluxDeep(config).to("cuda", torch.bfloat16)
# Load EMA weights (broken) or main weights
weights = load_file(hf_hub_download(
"AbstractPhil/tiny-flux-deep",
"checkpoints/step_286250_ema.safetensors" # EMA will be better later, for now it's broken.
))
model.load_state_dict(weights, strict=False)
model.eval()
```
### Sampling
Lailah uses Euler discrete sampling with Flux timestep shift:
```python
def flux_shift(t, s=3.0):
"""Bias timesteps toward data (higher t)."""
return s * t / (1 + (s - 1) * t)
# 20-50 steps recommended
timesteps = flux_shift(torch.linspace(0, 1, num_steps + 1))
for i in range(num_steps):
t_curr, t_next = timesteps[i], timesteps[i + 1]
dt = t_next - t_curr
v = model(hidden_states=x, encoder_hidden_states=t5_out, ...)
x = x + v * dt # Euler step
```
### Configuration
```python
@dataclass
class TinyFluxDeepConfig:
hidden_size: int = 512
num_attention_heads: int = 4
attention_head_dim: int = 128
in_channels: int = 16
joint_attention_dim: int = 768
pooled_projection_dim: int = 768
num_double_layers: int = 15
num_single_layers: int = 25
mlp_ratio: float = 4.0
axes_dims_rope: Tuple[int, int, int] = (16, 56, 56)
guidance_embeds: bool = True
```
## Files
```
AbstractPhil/tiny-flux-deep/
β”œβ”€β”€ model.safetensors # Latest best weights
β”œβ”€β”€ tinyflux_deep.py # Model architecture
β”œβ”€β”€ colab_inference_lailah_early.py # Ready-to-run Colab inference
β”œβ”€β”€ inference_tinyflux_deep.py # Standalone inference script
β”œβ”€β”€ train_tinyflux_deep.py # Training script
β”œβ”€β”€ checkpoints/
β”‚ β”œβ”€β”€ step_286250.safetensors # Training weights
β”‚ └── step_286250_ema.safetensors # EMA weights (currently broken)
β”œβ”€β”€ samples/ # Generated samples during training
└── README.md
```
## Origin: Porting from TinyFlux
Lailah was initialized by porting TinyFlux weights:
1. **Attention head expansion** (2 β†’ 4): Original heads copied to positions 0-1, new heads 2-3 Xavier initialized
2. **Hidden dimension expansion** (256 β†’ 512): Weights tiled and scaled
3. **Layer distribution**: Original 3 layers distributed across 15/25 positions as initialization anchors
The initial port used selective freezing of anchor layers, but current training leaves all parameters unfrozen.
## Comparison
| Aspect | TinyFlux | Lailah | Full Flux |
|--------|----------|--------|-----------|
| Parameters | 10.7M | 241.8M | 12B |
| Memory (bf16) | ~22MB | ~484MB | ~24GB |
| Quality | Limited | Moderate | High |
| Speed (A100) | ~10ms | ~40ms | ~200ms |
## Limitations
- **Resolution**: 512Γ—512 only (64Γ—64 latent)
- **Early training**: Quality improving but not production-ready
- **Text capacity**: Limited by flan-t5-base (768 dim vs Flux's 4096)
- **Experimental**: Research model, expect artifacts
## Intended Use
- Rapid prototyping and iteration
- Studying flow matching at moderate scale
- Architecture experiments
- Educational purposes
- Baseline comparisons
## Name
**Lailah** (ΧœΧ™ΧœΧ”) - Angel of the night in Jewish tradition, said to guard souls. Chosen for this model's role as a smaller guardian exploring the same space as larger models.
## Citation
```bibtex
@misc{tinyfluxlailah2026,
title={TinyFlux-Lailah: Compact Flow Matching for Text-to-Image},
author={AbstractPhil},
year={2026},
url={https://huggingface.co/AbstractPhil/tiny-flux-deep}
}
```
## Related
- [AbstractPhil/tiny-flux](https://huggingface.co/AbstractPhil/tiny-flux) - Base model (10.7M)
- [AbstractPhil/flux-schnell-teacher-latents](https://huggingface.co/datasets/AbstractPhil/flux-schnell-teacher-latents) - Training data
- [black-forest-labs/FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) - Teacher model
## License
MIT License
---
**Status**: Active training. Checkpoints updated regularly. Use standard weights for best results.