File size: 7,885 Bytes
82ba681
 
 
 
 
 
 
 
 
 
418bd36
 
82ba681
 
 
 
 
 
 
 
3fdc9fe
82ba681
 
418bd36
82ba681
418bd36
82ba681
 
418bd36
82ba681
418bd36
82ba681
418bd36
ceb8f46
418bd36
 
 
 
1af00db
 
418bd36
 
712bf73
 
 
 
3fdc9fe
3f5415d
 
2f1e426
 
3f5415d
c68dcb9
33d21d5
3f5415d
 
33d21d5
712bf73
456b1b7
 
 
418bd36
 
 
 
82ba681
 
 
 
 
 
418bd36
82ba681
 
 
418bd36
 
 
 
82ba681
 
 
418bd36
82ba681
418bd36
82ba681
 
 
418bd36
 
 
82ba681
 
 
 
 
 
 
418bd36
82ba681
 
 
418bd36
 
 
 
1af00db
418bd36
1af00db
82ba681
 
 
418bd36
82ba681
 
 
 
 
418bd36
82ba681
 
 
 
 
 
418bd36
82ba681
418bd36
 
1af00db
418bd36
 
1af00db
418bd36
 
82ba681
418bd36
 
 
 
 
82ba681
418bd36
82ba681
418bd36
82ba681
 
418bd36
 
82ba681
418bd36
 
 
82ba681
418bd36
 
82ba681
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
418bd36
 
 
 
 
 
 
1af00db
418bd36
 
82ba681
 
418bd36
82ba681
418bd36
82ba681
418bd36
 
 
 
 
82ba681
418bd36
82ba681
418bd36
 
 
 
 
 
82ba681
 
 
418bd36
 
 
 
82ba681
 
 
418bd36
 
 
82ba681
418bd36
 
 
 
 
82ba681
 
 
 
418bd36
 
82ba681
 
 
 
 
 
418bd36
82ba681
418bd36
 
 
82ba681
 
 
418bd36
82ba681
 
 
1af00db
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
---
license: mit
language:
- en
tags:
- diffusion
- flow-matching
- flux
- text-to-image
- image-generation
- tinyflux
- lailah
- experimental
library_name: pytorch
pipeline_tag: text-to-image
base_model:
- AbstractPhil/tiny-flux
- black-forest-labs/FLUX.1-schnell
datasets:
- AbstractPhil/flux-schnell-teacher-latents
- AbstractPhil/imagenet-synthetic
---

# TinyFlux-Deep (Lailah)

**TinyFlux-Lailah** is an expanded TinyFlux architecture with increased depth and width. Originally ported from [TinyFlux](https://huggingface.co/AbstractPhil/tiny-flux) with strategic layer expansion and attention head doubling, now training end-to-end on teacher latents.


## Quick Start (Colab)

The easiest way to test Lailah:

1. Open [Google Colab](https://colab.research.google.com/)
2. Copy the contents of [`inference_v3.py`](./inference_v3.py) and [`model_v3.py`](./model_v3.py) 
3. Run the cells

```python
# Or fetch directly:
!wget https://huggingface.co/AbstractPhil/tiny-flux-deep/raw/main/inference_v3.py
%run inference_v3.py
```

## Fair Weights

### ImageNet Synthetic step_346875
* Handles multiple animal combination variants with high fidelity
https://huggingface.co/datasets/AbstractPhil/imagenet-synthetic

"subject, animal, cat, photograph of a tiger, natural habitat"

![image](https://cdn-uploads.huggingface.co/production/uploads/630cf55b15433862cfc9556f/uJ9Ffh780iLgEIJhmafod.png)

"subject, bird, blue beak, red eyes, green claws"
![image](https://cdn-uploads.huggingface.co/production/uploads/630cf55b15433862cfc9556f/GRS5tyaFFa0HV2xSJCsin.png)

"subject, bird, red haired bird in a tree"
![image](https://cdn-uploads.huggingface.co/production/uploads/630cf55b15433862cfc9556f/rGourHokJsPtYNnoFi3Eq.png)


![image](https://cdn-uploads.huggingface.co/production/uploads/630cf55b15433862cfc9556f/O_z6DLc32HDNBq3ZwEjqf.png)

## Architecture

| Component | TinyFlux | TinyFlux-Lailah | Flux |
|-----------|----------|-----------------|------|
| Hidden size | 256 | **512** | 3072 |
| Attention heads | 2 | **4** | 24 |
| Head dimension | 128 | 128 | 128 |
| Double-stream layers | 3 | **15** | 19 |
| Single-stream layers | 3 | **25** | 38 |
| VAE channels | 16 | 16 | 16 |
| **Total params** | ~10.7M | **~241.8M** | ~12B |

### Text Encoders

| Role | Model | Dimension |
|------|-------|-----------|
| Sequence encoder | flan-t5-base | 768 |
| Pooled encoder | CLIP-L | 768 |

## Training

### Current Approach

All parameters are trainable. The model was initially ported from TinyFlux with frozen anchor layers, but current training runs with everything unfrozen for maximum flexibility.

### Dataset

Training on [AbstractPhil/flux-schnell-teacher-latents](https://huggingface.co/datasets/AbstractPhil/flux-schnell-teacher-latents):
- Pre-computed VAE latents from Flux-Schnell generations
- 512Γ—512 resolution (64Γ—64 latent space)
- Diverse prompts covering people, objects, scenes, styles

### Training Details

- **Objective**: Flow matching (rectified flow)
- **Timestep sampling**: Logit-normal with Flux shift (s=3.0)
- **Loss weighting**: Min-SNR-Ξ³ (Ξ³=5.0)
- **Optimizer**: AdamW (lr=3e-4, Ξ²=(0.9, 0.99), wd=0.01)
- **Schedule**: Cosine with warmup
- **Precision**: bfloat16
- **Batch size**: 32 (16 Γ— 2 gradient accumulation)
- **EMA decay**: 0.9999

### Checkpoints

Checkpoints are saved every epoch or so with both main and EMA weights:
- `checkpoints/step_XXXXX.safetensors` - Training weights
- `checkpoints/step_XXXXX_ema.safetensors` - EMA weights (currently very broken and retraining, use standard step to inference)

## Usage

### Dependencies

```bash
pip install torch transformers diffusers safetensors huggingface_hub
```

### Basic Inference

```python
import torch
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file

# Load model (requires TinyFluxDeep class from tinyflux_deep.py)
config = TinyFluxDeepConfig()
model = TinyFluxDeep(config).to("cuda", torch.bfloat16)

# Load EMA weights (broken) or main weights
weights = load_file(hf_hub_download(
    "AbstractPhil/tiny-flux-deep", 
    "checkpoints/step_286250_ema.safetensors"  # EMA will be better later, for now it's broken.
))
model.load_state_dict(weights, strict=False)
model.eval()
```

### Sampling

Lailah uses Euler discrete sampling with Flux timestep shift:

```python
def flux_shift(t, s=3.0):
    """Bias timesteps toward data (higher t)."""
    return s * t / (1 + (s - 1) * t)

# 20-50 steps recommended
timesteps = flux_shift(torch.linspace(0, 1, num_steps + 1))

for i in range(num_steps):
    t_curr, t_next = timesteps[i], timesteps[i + 1]
    dt = t_next - t_curr
    
    v = model(hidden_states=x, encoder_hidden_states=t5_out, ...)
    x = x + v * dt  # Euler step
```

### Configuration

```python
@dataclass
class TinyFluxDeepConfig:
    hidden_size: int = 512
    num_attention_heads: int = 4
    attention_head_dim: int = 128
    in_channels: int = 16
    joint_attention_dim: int = 768
    pooled_projection_dim: int = 768
    num_double_layers: int = 15
    num_single_layers: int = 25
    mlp_ratio: float = 4.0
    axes_dims_rope: Tuple[int, int, int] = (16, 56, 56)
    guidance_embeds: bool = True
```

## Files

```
AbstractPhil/tiny-flux-deep/
β”œβ”€β”€ model.safetensors              # Latest best weights
β”œβ”€β”€ tinyflux_deep.py               # Model architecture
β”œβ”€β”€ colab_inference_lailah_early.py # Ready-to-run Colab inference
β”œβ”€β”€ inference_tinyflux_deep.py     # Standalone inference script
β”œβ”€β”€ train_tinyflux_deep.py         # Training script
β”œβ”€β”€ checkpoints/
β”‚   β”œβ”€β”€ step_286250.safetensors    # Training weights
β”‚   └── step_286250_ema.safetensors # EMA weights (currently broken)
β”œβ”€β”€ samples/                        # Generated samples during training
└── README.md
```

## Origin: Porting from TinyFlux

Lailah was initialized by porting TinyFlux weights:

1. **Attention head expansion** (2 β†’ 4): Original heads copied to positions 0-1, new heads 2-3 Xavier initialized
2. **Hidden dimension expansion** (256 β†’ 512): Weights tiled and scaled
3. **Layer distribution**: Original 3 layers distributed across 15/25 positions as initialization anchors

The initial port used selective freezing of anchor layers, but current training leaves all parameters unfrozen.

## Comparison

| Aspect | TinyFlux | Lailah | Full Flux |
|--------|----------|--------|-----------|
| Parameters | 10.7M | 241.8M | 12B |
| Memory (bf16) | ~22MB | ~484MB | ~24GB |
| Quality | Limited | Moderate | High |
| Speed (A100) | ~10ms | ~40ms | ~200ms |

## Limitations

- **Resolution**: 512Γ—512 only (64Γ—64 latent)
- **Early training**: Quality improving but not production-ready
- **Text capacity**: Limited by flan-t5-base (768 dim vs Flux's 4096)
- **Experimental**: Research model, expect artifacts

## Intended Use

- Rapid prototyping and iteration
- Studying flow matching at moderate scale
- Architecture experiments
- Educational purposes
- Baseline comparisons

## Name

**Lailah** (ΧœΧ™ΧœΧ”) - Angel of the night in Jewish tradition, said to guard souls. Chosen for this model's role as a smaller guardian exploring the same space as larger models.

## Citation

```bibtex
@misc{tinyfluxlailah2026,
  title={TinyFlux-Lailah: Compact Flow Matching for Text-to-Image},
  author={AbstractPhil},
  year={2026},
  url={https://huggingface.co/AbstractPhil/tiny-flux-deep}
}
```

## Related

- [AbstractPhil/tiny-flux](https://huggingface.co/AbstractPhil/tiny-flux) - Base model (10.7M)
- [AbstractPhil/flux-schnell-teacher-latents](https://huggingface.co/datasets/AbstractPhil/flux-schnell-teacher-latents) - Training data
- [black-forest-labs/FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) - Teacher model

## License

MIT License

---

**Status**: Active training. Checkpoints updated regularly. Use standard weights for best results.