File size: 7,188 Bytes

---
license: mit
language:
- en
tags:
- diffusion
- flow-matching
- flux
- text-to-image
- image-generation
- tinyflux
- lailah
- experimental
library_name: pytorch
pipeline_tag: text-to-image
base_model:
- AbstractPhil/tiny-flux
- black-forest-labs/FLUX.1-schnell
datasets:
- AbstractPhil/flux-schnell-teacher-latents
---

# TinyFlux-Deep (Lailah)

**TinyFlux-Lailah** is an expanded TinyFlux architecture with increased depth and width. Originally ported from [TinyFlux](https://huggingface.co/AbstractPhil/tiny-flux) with strategic layer expansion and attention head doubling, now training end-to-end on teacher latents.

> **Current checkpoint:** `step_286250` | **Status:** Active training

## Quick Start (Colab)

The easiest way to test Lailah:

1. Open [Google Colab](https://colab.research.google.com/)
2. Copy the contents of [`colab_inference_lailah_early.py`](./colab_inference_lailah_early.py) 
3. Run the cells

```python
# Or fetch directly:
!wget https://huggingface.co/AbstractPhil/tiny-flux-deep/raw/main/colab_inference_lailah_early.py
%run colab_inference_lailah_early.py
```


![image](https://cdn-uploads.huggingface.co/production/uploads/630cf55b15433862cfc9556f/O_z6DLc32HDNBq3ZwEjqf.png)

## Architecture

| Component | TinyFlux | TinyFlux-Lailah | Flux |
|-----------|----------|-----------------|------|
| Hidden size | 256 | **512** | 3072 |
| Attention heads | 2 | **4** | 24 |
| Head dimension | 128 | 128 | 128 |
| Double-stream layers | 3 | **15** | 19 |
| Single-stream layers | 3 | **25** | 38 |
| VAE channels | 16 | 16 | 16 |
| **Total params** | ~10.7M | **~241.8M** | ~12B |

### Text Encoders

| Role | Model | Dimension |
|------|-------|-----------|
| Sequence encoder | flan-t5-base | 768 |
| Pooled encoder | CLIP-L | 768 |

## Training

### Current Approach

All parameters are trainable. The model was initially ported from TinyFlux with frozen anchor layers, but current training runs with everything unfrozen for maximum flexibility.

### Dataset

Training on [AbstractPhil/flux-schnell-teacher-latents](https://huggingface.co/datasets/AbstractPhil/flux-schnell-teacher-latents):
- Pre-computed VAE latents from Flux-Schnell generations
- 512×512 resolution (64×64 latent space)
- Diverse prompts covering people, objects, scenes, styles

### Training Details

- **Objective**: Flow matching (rectified flow)
- **Timestep sampling**: Logit-normal with Flux shift (s=3.0)
- **Loss weighting**: Min-SNR-γ (γ=5.0)
- **Optimizer**: AdamW (lr=3e-4, β=(0.9, 0.99), wd=0.01)
- **Schedule**: Cosine with warmup
- **Precision**: bfloat16
- **Batch size**: 32 (16 × 2 gradient accumulation)
- **EMA decay**: 0.9999

### Checkpoints

Checkpoints are saved every 625 steps with both main and EMA weights:
- `checkpoints/step_XXXXX.safetensors` - Training weights
- `checkpoints/step_XXXXX_ema.safetensors` - EMA weights (recommended for inference)

## Usage

### Dependencies

```bash
pip install torch transformers diffusers safetensors huggingface_hub
```

### Basic Inference

```python
import torch
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file

# Load model (requires TinyFluxDeep class from tinyflux_deep.py)
config = TinyFluxDeepConfig()
model = TinyFluxDeep(config).to("cuda", torch.bfloat16)

# Load EMA weights (recommended) or main weights
weights = load_file(hf_hub_download(
    "AbstractPhil/tiny-flux-deep", 
    "checkpoints/step_286250_ema.safetensors"  # Use _ema for best quality
))
model.load_state_dict(weights, strict=False)
model.eval()
```

### Sampling

Lailah uses Euler discrete sampling with Flux timestep shift:

```python
def flux_shift(t, s=3.0):
    """Bias timesteps toward data (higher t)."""
    return s * t / (1 + (s - 1) * t)

# 20-50 steps recommended
timesteps = flux_shift(torch.linspace(0, 1, num_steps + 1))

for i in range(num_steps):
    t_curr, t_next = timesteps[i], timesteps[i + 1]
    dt = t_next - t_curr
    
    v = model(hidden_states=x, encoder_hidden_states=t5_out, ...)
    x = x + v * dt  # Euler step
```

### Configuration

```python
@dataclass
class TinyFluxDeepConfig:
    hidden_size: int = 512
    num_attention_heads: int = 4
    attention_head_dim: int = 128
    in_channels: int = 16
    joint_attention_dim: int = 768
    pooled_projection_dim: int = 768
    num_double_layers: int = 15
    num_single_layers: int = 25
    mlp_ratio: float = 4.0
    axes_dims_rope: Tuple[int, int, int] = (16, 56, 56)
    guidance_embeds: bool = True
```

## Files

```
AbstractPhil/tiny-flux-deep/
├── model.safetensors              # Latest best weights
├── tinyflux_deep.py               # Model architecture
├── colab_inference_lailah_early.py # Ready-to-run Colab inference
├── inference_tinyflux_deep.py     # Standalone inference script
├── train_tinyflux_deep.py         # Training script
├── checkpoints/
│   ├── step_286250.safetensors    # Training weights
│   └── step_286250_ema.safetensors # EMA weights (use this)
├── samples/                        # Generated samples during training
└── README.md
```

## Origin: Porting from TinyFlux

Lailah was initialized by porting TinyFlux weights:

1. **Attention head expansion** (2 → 4): Original heads copied to positions 0-1, new heads 2-3 Xavier initialized
2. **Hidden dimension expansion** (256 → 512): Weights tiled and scaled
3. **Layer distribution**: Original 3 layers distributed across 15/25 positions as initialization anchors

The initial port used selective freezing of anchor layers, but current training leaves all parameters unfrozen.

## Comparison

| Aspect | TinyFlux | Lailah | Full Flux |
|--------|----------|--------|-----------|
| Parameters | 10.7M | 241.8M | 12B |
| Memory (bf16) | ~22MB | ~484MB | ~24GB |
| Quality | Limited | Moderate | High |
| Speed (A100) | ~10ms | ~40ms | ~200ms |

## Limitations

- **Resolution**: 512×512 only (64×64 latent)
- **Early training**: Quality improving but not production-ready
- **Text capacity**: Limited by flan-t5-base (768 dim vs Flux's 4096)
- **Experimental**: Research model, expect artifacts

## Intended Use

- Rapid prototyping and iteration
- Studying flow matching at moderate scale
- Architecture experiments
- Educational purposes
- Baseline comparisons

## Name

**Lailah** (לילה) - Angel of the night in Jewish tradition, said to guard souls. Chosen for this model's role as a smaller guardian exploring the same space as larger models.

## Citation

```bibtex
@misc{tinyfluxlailah2026,
  title={TinyFlux-Lailah: Compact Flow Matching for Text-to-Image},
  author={AbstractPhil},
  year={2026},
  url={https://huggingface.co/AbstractPhil/tiny-flux-deep}
}
```

## Related

- [AbstractPhil/tiny-flux](https://huggingface.co/AbstractPhil/tiny-flux) - Base model (10.7M)
- [AbstractPhil/flux-schnell-teacher-latents](https://huggingface.co/datasets/AbstractPhil/flux-schnell-teacher-latents) - Training data
- [black-forest-labs/FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) - Teacher model

## License

MIT License

---

**Status**: Active training. Checkpoints updated regularly. Use EMA weights for best results.