| | --- |
| | license: mit |
| | language: |
| | - en |
| | tags: |
| | - diffusion |
| | - flow-matching |
| | - flux |
| | - text-to-image |
| | - image-generation |
| | - tinyflux |
| | - lailah |
| | - experimental |
| | library_name: pytorch |
| | pipeline_tag: text-to-image |
| | base_model: |
| | - AbstractPhil/tiny-flux |
| | - black-forest-labs/FLUX.1-schnell |
| | datasets: |
| | - AbstractPhil/flux-schnell-teacher-latents |
| | - AbstractPhil/imagenet-synthetic |
| | --- |
| | |
| | # TinyFlux-Deep (Lailah) |
| |
|
| | **TinyFlux-Lailah** is an expanded TinyFlux architecture with increased depth and width. Originally ported from [TinyFlux](https://huggingface.co/AbstractPhil/tiny-flux) with strategic layer expansion and attention head doubling, now training end-to-end on teacher latents. |
| |
|
| |
|
| | ## Quick Start (Colab) |
| |
|
| | The easiest way to test Lailah: |
| |
|
| | 1. Open [Google Colab](https://colab.research.google.com/) |
| | 2. Copy the contents of [`inference_v3.py`](./inference_v3.py) and [`model_v3.py`](./model_v3.py) |
| | 3. Run the cells |
| |
|
| | ```python |
| | # Or fetch directly: |
| | !wget https://huggingface.co/AbstractPhil/tiny-flux-deep/raw/main/inference_v3.py |
| | %run inference_v3.py |
| | ``` |
| |
|
| | ## Fair Weights |
| |
|
| | ### ImageNet Synthetic step_346875 |
| | * Handles multiple animal combination variants with high fidelity |
| | https://huggingface.co/datasets/AbstractPhil/imagenet-synthetic |
| | |
| | "subject, animal, cat, photograph of a tiger, natural habitat" |
| | |
| |  |
| | |
| | "subject, bird, blue beak, red eyes, green claws" |
| |  |
| | |
| | "subject, bird, red haired bird in a tree" |
| |  |
| | |
| | |
| |  |
| | |
| | ## Architecture |
| | |
| | | Component | TinyFlux | TinyFlux-Lailah | Flux | |
| | |-----------|----------|-----------------|------| |
| | | Hidden size | 256 | **512** | 3072 | |
| | | Attention heads | 2 | **4** | 24 | |
| | | Head dimension | 128 | 128 | 128 | |
| | | Double-stream layers | 3 | **15** | 19 | |
| | | Single-stream layers | 3 | **25** | 38 | |
| | | VAE channels | 16 | 16 | 16 | |
| | | **Total params** | ~10.7M | **~241.8M** | ~12B | |
| | |
| | ### Text Encoders |
| | |
| | | Role | Model | Dimension | |
| | |------|-------|-----------| |
| | | Sequence encoder | flan-t5-base | 768 | |
| | | Pooled encoder | CLIP-L | 768 | |
| | |
| | ## Training |
| | |
| | ### Current Approach |
| | |
| | All parameters are trainable. The model was initially ported from TinyFlux with frozen anchor layers, but current training runs with everything unfrozen for maximum flexibility. |
| | |
| | ### Dataset |
| | |
| | Training on [AbstractPhil/flux-schnell-teacher-latents](https://huggingface.co/datasets/AbstractPhil/flux-schnell-teacher-latents): |
| | - Pre-computed VAE latents from Flux-Schnell generations |
| | - 512Γ512 resolution (64Γ64 latent space) |
| | - Diverse prompts covering people, objects, scenes, styles |
| | |
| | ### Training Details |
| | |
| | - **Objective**: Flow matching (rectified flow) |
| | - **Timestep sampling**: Logit-normal with Flux shift (s=3.0) |
| | - **Loss weighting**: Min-SNR-Ξ³ (Ξ³=5.0) |
| | - **Optimizer**: AdamW (lr=3e-4, Ξ²=(0.9, 0.99), wd=0.01) |
| | - **Schedule**: Cosine with warmup |
| | - **Precision**: bfloat16 |
| | - **Batch size**: 32 (16 Γ 2 gradient accumulation) |
| | - **EMA decay**: 0.9999 |
| | |
| | ### Checkpoints |
| | |
| | Checkpoints are saved every epoch or so with both main and EMA weights: |
| | - `checkpoints/step_XXXXX.safetensors` - Training weights |
| | - `checkpoints/step_XXXXX_ema.safetensors` - EMA weights (currently very broken and retraining, use standard step to inference) |
| |
|
| | ## Usage |
| |
|
| | ### Dependencies |
| |
|
| | ```bash |
| | pip install torch transformers diffusers safetensors huggingface_hub |
| | ``` |
| |
|
| | ### Basic Inference |
| |
|
| | ```python |
| | import torch |
| | from huggingface_hub import hf_hub_download |
| | from safetensors.torch import load_file |
| | |
| | # Load model (requires TinyFluxDeep class from tinyflux_deep.py) |
| | config = TinyFluxDeepConfig() |
| | model = TinyFluxDeep(config).to("cuda", torch.bfloat16) |
| | |
| | # Load EMA weights (broken) or main weights |
| | weights = load_file(hf_hub_download( |
| | "AbstractPhil/tiny-flux-deep", |
| | "checkpoints/step_286250_ema.safetensors" # EMA will be better later, for now it's broken. |
| | )) |
| | model.load_state_dict(weights, strict=False) |
| | model.eval() |
| | ``` |
| |
|
| | ### Sampling |
| |
|
| | Lailah uses Euler discrete sampling with Flux timestep shift: |
| |
|
| | ```python |
| | def flux_shift(t, s=3.0): |
| | """Bias timesteps toward data (higher t).""" |
| | return s * t / (1 + (s - 1) * t) |
| | |
| | # 20-50 steps recommended |
| | timesteps = flux_shift(torch.linspace(0, 1, num_steps + 1)) |
| | |
| | for i in range(num_steps): |
| | t_curr, t_next = timesteps[i], timesteps[i + 1] |
| | dt = t_next - t_curr |
| | |
| | v = model(hidden_states=x, encoder_hidden_states=t5_out, ...) |
| | x = x + v * dt # Euler step |
| | ``` |
| |
|
| | ### Configuration |
| |
|
| | ```python |
| | @dataclass |
| | class TinyFluxDeepConfig: |
| | hidden_size: int = 512 |
| | num_attention_heads: int = 4 |
| | attention_head_dim: int = 128 |
| | in_channels: int = 16 |
| | joint_attention_dim: int = 768 |
| | pooled_projection_dim: int = 768 |
| | num_double_layers: int = 15 |
| | num_single_layers: int = 25 |
| | mlp_ratio: float = 4.0 |
| | axes_dims_rope: Tuple[int, int, int] = (16, 56, 56) |
| | guidance_embeds: bool = True |
| | ``` |
| |
|
| | ## Files |
| |
|
| | ``` |
| | AbstractPhil/tiny-flux-deep/ |
| | βββ model.safetensors # Latest best weights |
| | βββ tinyflux_deep.py # Model architecture |
| | βββ colab_inference_lailah_early.py # Ready-to-run Colab inference |
| | βββ inference_tinyflux_deep.py # Standalone inference script |
| | βββ train_tinyflux_deep.py # Training script |
| | βββ checkpoints/ |
| | β βββ step_286250.safetensors # Training weights |
| | β βββ step_286250_ema.safetensors # EMA weights (currently broken) |
| | βββ samples/ # Generated samples during training |
| | βββ README.md |
| | ``` |
| |
|
| | ## Origin: Porting from TinyFlux |
| |
|
| | Lailah was initialized by porting TinyFlux weights: |
| |
|
| | 1. **Attention head expansion** (2 β 4): Original heads copied to positions 0-1, new heads 2-3 Xavier initialized |
| | 2. **Hidden dimension expansion** (256 β 512): Weights tiled and scaled |
| | 3. **Layer distribution**: Original 3 layers distributed across 15/25 positions as initialization anchors |
| |
|
| | The initial port used selective freezing of anchor layers, but current training leaves all parameters unfrozen. |
| |
|
| | ## Comparison |
| |
|
| | | Aspect | TinyFlux | Lailah | Full Flux | |
| | |--------|----------|--------|-----------| |
| | | Parameters | 10.7M | 241.8M | 12B | |
| | | Memory (bf16) | ~22MB | ~484MB | ~24GB | |
| | | Quality | Limited | Moderate | High | |
| | | Speed (A100) | ~10ms | ~40ms | ~200ms | |
| |
|
| | ## Limitations |
| |
|
| | - **Resolution**: 512Γ512 only (64Γ64 latent) |
| | - **Early training**: Quality improving but not production-ready |
| | - **Text capacity**: Limited by flan-t5-base (768 dim vs Flux's 4096) |
| | - **Experimental**: Research model, expect artifacts |
| |
|
| | ## Intended Use |
| |
|
| | - Rapid prototyping and iteration |
| | - Studying flow matching at moderate scale |
| | - Architecture experiments |
| | - Educational purposes |
| | - Baseline comparisons |
| |
|
| | ## Name |
| |
|
| | **Lailah** (ΧΧΧΧ) - Angel of the night in Jewish tradition, said to guard souls. Chosen for this model's role as a smaller guardian exploring the same space as larger models. |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @misc{tinyfluxlailah2026, |
| | title={TinyFlux-Lailah: Compact Flow Matching for Text-to-Image}, |
| | author={AbstractPhil}, |
| | year={2026}, |
| | url={https://huggingface.co/AbstractPhil/tiny-flux-deep} |
| | } |
| | ``` |
| |
|
| | ## Related |
| |
|
| | - [AbstractPhil/tiny-flux](https://huggingface.co/AbstractPhil/tiny-flux) - Base model (10.7M) |
| | - [AbstractPhil/flux-schnell-teacher-latents](https://huggingface.co/datasets/AbstractPhil/flux-schnell-teacher-latents) - Training data |
| | - [black-forest-labs/FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) - Teacher model |
| |
|
| | ## License |
| |
|
| | MIT License |
| |
|
| | --- |
| |
|
| | **Status**: Active training. Checkpoints updated regularly. Use standard weights for best results. |