File size: 7,885 Bytes

82ba681
 
 
 
 
 
 
 
 
 
418bd36
 
82ba681
 
 
 
 
 
 
 
3fdc9fe
82ba681
 
418bd36
82ba681
418bd36
82ba681
 
418bd36
82ba681
418bd36
82ba681
418bd36
ceb8f46
418bd36
 
 
 
1af00db
 
418bd36
 
712bf73
 
 
 
3fdc9fe
3f5415d
 
2f1e426
 
3f5415d
c68dcb9
33d21d5
3f5415d
 
33d21d5
712bf73
456b1b7
 
 
418bd36
 
 
 
82ba681
 
 
 
 
 
418bd36
82ba681
 
 
418bd36
 
 
 
82ba681
 
 
418bd36
82ba681
418bd36
82ba681
 
 
418bd36
 
 
82ba681
 
 
 
 
 
 
418bd36
82ba681
 
 
418bd36
 
 
 
1af00db
418bd36
1af00db
82ba681
 
 
418bd36
82ba681
 
 
 
 
418bd36
82ba681
 
 
 
 
 
418bd36
82ba681
418bd36
 
1af00db
418bd36
 
1af00db
418bd36
 
82ba681
418bd36
 
 
 
 
82ba681
418bd36
82ba681
418bd36
82ba681
 
418bd36
 
82ba681
418bd36
 
 
82ba681
418bd36
 
82ba681
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
418bd36
 
 
 
 
 
 
1af00db
418bd36
 
82ba681
 
418bd36
82ba681
418bd36
82ba681
418bd36
 
 
 
 
82ba681
418bd36
82ba681
418bd36
 
 
 
 
 
82ba681
 
 
418bd36
 
 
 
82ba681
 
 
418bd36
 
 
82ba681
418bd36
 
 
 
 
82ba681
 
 
 
418bd36
 
82ba681
 
 
 
 
 
418bd36
82ba681
418bd36
 
 
82ba681
 
 
418bd36
82ba681
 
 
1af00db

---
license: mit
language:
- en
tags:
- diffusion
- flow-matching
- flux
- text-to-image
- image-generation
- tinyflux
- lailah
- experimental
library_name: pytorch
pipeline_tag: text-to-image
base_model:
- AbstractPhil/tiny-flux
- black-forest-labs/FLUX.1-schnell
datasets:
- AbstractPhil/flux-schnell-teacher-latents
- AbstractPhil/imagenet-synthetic
---

# TinyFlux-Deep (Lailah)

**TinyFlux-Lailah** is an expanded TinyFlux architecture with increased depth and width. Originally ported from [TinyFlux](https://huggingface.co/AbstractPhil/tiny-flux) with strategic layer expansion and attention head doubling, now training end-to-end on teacher latents.


## Quick Start (Colab)

The easiest way to test Lailah:

1. Open [Google Colab](https://colab.research.google.com/)
2. Copy the contents of [`inference_v3.py`](./inference_v3.py) and [`model_v3.py`](./model_v3.py) 
3. Run the cells

```python
# Or fetch directly:
!wget https://huggingface.co/AbstractPhil/tiny-flux-deep/raw/main/inference_v3.py
%run inference_v3.py
```

## Fair Weights

### ImageNet Synthetic step_346875
* Handles multiple animal combination variants with high fidelity
https://huggingface.co/datasets/AbstractPhil/imagenet-synthetic

"subject, animal, cat, photograph of a tiger, natural habitat"

![image](https://cdn-uploads.huggingface.co/production/uploads/630cf55b15433862cfc9556f/uJ9Ffh780iLgEIJhmafod.png)

"subject, bird, blue beak, red eyes, green claws"
![image](https://cdn-uploads.huggingface.co/production/uploads/630cf55b15433862cfc9556f/GRS5tyaFFa0HV2xSJCsin.png)

"subject, bird, red haired bird in a tree"
![image](https://cdn-uploads.huggingface.co/production/uploads/630cf55b15433862cfc9556f/rGourHokJsPtYNnoFi3Eq.png)


![image](https://cdn-uploads.huggingface.co/production/uploads/630cf55b15433862cfc9556f/O_z6DLc32HDNBq3ZwEjqf.png)

## Architecture

| Component | TinyFlux | TinyFlux-Lailah | Flux |
|-----------|----------|-----------------|------|
| Hidden size | 256 | **512** | 3072 |
| Attention heads | 2 | **4** | 24 |
| Head dimension | 128 | 128 | 128 |
| Double-stream layers | 3 | **15** | 19 |
| Single-stream layers | 3 | **25** | 38 |
| VAE channels | 16 | 16 | 16 |
| **Total params** | ~10.7M | **~241.8M** | ~12B |

### Text Encoders

| Role | Model | Dimension |
|------|-------|-----------|
| Sequence encoder | flan-t5-base | 768 |
| Pooled encoder | CLIP-L | 768 |

## Training

### Current Approach

All parameters are trainable. The model was initially ported from TinyFlux with frozen anchor layers, but current training runs with everything unfrozen for maximum flexibility.

### Dataset

Training on [AbstractPhil/flux-schnell-teacher-latents](https://huggingface.co/datasets/AbstractPhil/flux-schnell-teacher-latents):
- Pre-computed VAE latents from Flux-Schnell generations
- 512×512 resolution (64×64 latent space)
- Diverse prompts covering people, objects, scenes, styles

### Training Details

- **Objective**: Flow matching (rectified flow)
- **Timestep sampling**: Logit-normal with Flux shift (s=3.0)
- **Loss weighting**: Min-SNR-γ (γ=5.0)
- **Optimizer**: AdamW (lr=3e-4, β=(0.9, 0.99), wd=0.01)
- **Schedule**: Cosine with warmup
- **Precision**: bfloat16
- **Batch size**: 32 (16 × 2 gradient accumulation)
- **EMA decay**: 0.9999

### Checkpoints

Checkpoints are saved every epoch or so with both main and EMA weights:
- `checkpoints/step_XXXXX.safetensors` - Training weights
- `checkpoints/step_XXXXX_ema.safetensors` - EMA weights (currently very broken and retraining, use standard step to inference)

## Usage

### Dependencies

```bash
pip install torch transformers diffusers safetensors huggingface_hub
```

### Basic Inference

```python
import torch
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file

# Load model (requires TinyFluxDeep class from tinyflux_deep.py)
config = TinyFluxDeepConfig()
model = TinyFluxDeep(config).to("cuda", torch.bfloat16)

# Load EMA weights (broken) or main weights
weights = load_file(hf_hub_download(
    "AbstractPhil/tiny-flux-deep", 
    "checkpoints/step_286250_ema.safetensors"  # EMA will be better later, for now it's broken.
))
model.load_state_dict(weights, strict=False)
model.eval()
```

### Sampling

Lailah uses Euler discrete sampling with Flux timestep shift:

```python
def flux_shift(t, s=3.0):
    """Bias timesteps toward data (higher t)."""
    return s * t / (1 + (s - 1) * t)

# 20-50 steps recommended
timesteps = flux_shift(torch.linspace(0, 1, num_steps + 1))

for i in range(num_steps):
    t_curr, t_next = timesteps[i], timesteps[i + 1]
    dt = t_next - t_curr
    
    v = model(hidden_states=x, encoder_hidden_states=t5_out, ...)
    x = x + v * dt  # Euler step
```

### Configuration

```python
@dataclass
class TinyFluxDeepConfig:
    hidden_size: int = 512
    num_attention_heads: int = 4
    attention_head_dim: int = 128
    in_channels: int = 16
    joint_attention_dim: int = 768
    pooled_projection_dim: int = 768
    num_double_layers: int = 15
    num_single_layers: int = 25
    mlp_ratio: float = 4.0
    axes_dims_rope: Tuple[int, int, int] = (16, 56, 56)
    guidance_embeds: bool = True
```

## Files

```
AbstractPhil/tiny-flux-deep/
├── model.safetensors              # Latest best weights
├── tinyflux_deep.py               # Model architecture
├── colab_inference_lailah_early.py # Ready-to-run Colab inference
├── inference_tinyflux_deep.py     # Standalone inference script
├── train_tinyflux_deep.py         # Training script
├── checkpoints/
│   ├── step_286250.safetensors    # Training weights
│   └── step_286250_ema.safetensors # EMA weights (currently broken)
├── samples/                        # Generated samples during training
└── README.md
```

## Origin: Porting from TinyFlux

Lailah was initialized by porting TinyFlux weights:

1. **Attention head expansion** (2 → 4): Original heads copied to positions 0-1, new heads 2-3 Xavier initialized
2. **Hidden dimension expansion** (256 → 512): Weights tiled and scaled
3. **Layer distribution**: Original 3 layers distributed across 15/25 positions as initialization anchors

The initial port used selective freezing of anchor layers, but current training leaves all parameters unfrozen.

## Comparison

| Aspect | TinyFlux | Lailah | Full Flux |
|--------|----------|--------|-----------|
| Parameters | 10.7M | 241.8M | 12B |
| Memory (bf16) | ~22MB | ~484MB | ~24GB |
| Quality | Limited | Moderate | High |
| Speed (A100) | ~10ms | ~40ms | ~200ms |

## Limitations

- **Resolution**: 512×512 only (64×64 latent)
- **Early training**: Quality improving but not production-ready
- **Text capacity**: Limited by flan-t5-base (768 dim vs Flux's 4096)
- **Experimental**: Research model, expect artifacts

## Intended Use

- Rapid prototyping and iteration
- Studying flow matching at moderate scale
- Architecture experiments
- Educational purposes
- Baseline comparisons

## Name

**Lailah** (לילה) - Angel of the night in Jewish tradition, said to guard souls. Chosen for this model's role as a smaller guardian exploring the same space as larger models.

## Citation

```bibtex
@misc{tinyfluxlailah2026,
  title={TinyFlux-Lailah: Compact Flow Matching for Text-to-Image},
  author={AbstractPhil},
  year={2026},
  url={https://huggingface.co/AbstractPhil/tiny-flux-deep}
}
```

## Related

- [AbstractPhil/tiny-flux](https://huggingface.co/AbstractPhil/tiny-flux) - Base model (10.7M)
- [AbstractPhil/flux-schnell-teacher-latents](https://huggingface.co/datasets/AbstractPhil/flux-schnell-teacher-latents) - Training data
- [black-forest-labs/FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) - Teacher model

## License

MIT License

---

**Status**: Active training. Checkpoints updated regularly. Use standard weights for best results.