--- license: mit language: - en tags: - diffusion - flow-matching - flux - text-to-image - image-generation - tinyflux - lailah - experimental library_name: pytorch pipeline_tag: text-to-image base_model: - AbstractPhil/tiny-flux - black-forest-labs/FLUX.1-schnell datasets: - AbstractPhil/flux-schnell-teacher-latents - AbstractPhil/imagenet-synthetic --- # TinyFlux-Deep (Lailah) **TinyFlux-Lailah** is an expanded TinyFlux architecture with increased depth and width. Originally ported from [TinyFlux](https://huggingface.co/AbstractPhil/tiny-flux) with strategic layer expansion and attention head doubling, now training end-to-end on teacher latents. ## Quick Start (Colab) The easiest way to test Lailah: 1. Open [Google Colab](https://colab.research.google.com/) 2. Copy the contents of [`inference_v3.py`](./inference_v3.py) and [`model_v3.py`](./model_v3.py) 3. Run the cells ```python # Or fetch directly: !wget https://huggingface.co/AbstractPhil/tiny-flux-deep/raw/main/inference_v3.py %run inference_v3.py ``` ## Fair Weights ### ImageNet Synthetic step_346875 * Handles multiple animal combination variants with high fidelity https://huggingface.co/datasets/AbstractPhil/imagenet-synthetic "subject, animal, cat, photograph of a tiger, natural habitat" ![image](https://cdn-uploads.huggingface.co/production/uploads/630cf55b15433862cfc9556f/uJ9Ffh780iLgEIJhmafod.png) "subject, bird, blue beak, red eyes, green claws" ![image](https://cdn-uploads.huggingface.co/production/uploads/630cf55b15433862cfc9556f/GRS5tyaFFa0HV2xSJCsin.png) "subject, bird, red haired bird in a tree" ![image](https://cdn-uploads.huggingface.co/production/uploads/630cf55b15433862cfc9556f/rGourHokJsPtYNnoFi3Eq.png) ![image](https://cdn-uploads.huggingface.co/production/uploads/630cf55b15433862cfc9556f/O_z6DLc32HDNBq3ZwEjqf.png) ## Architecture | Component | TinyFlux | TinyFlux-Lailah | Flux | |-----------|----------|-----------------|------| | Hidden size | 256 | **512** | 3072 | | Attention heads | 2 | **4** | 24 | | Head dimension | 128 | 128 | 128 | | Double-stream layers | 3 | **15** | 19 | | Single-stream layers | 3 | **25** | 38 | | VAE channels | 16 | 16 | 16 | | **Total params** | ~10.7M | **~241.8M** | ~12B | ### Text Encoders | Role | Model | Dimension | |------|-------|-----------| | Sequence encoder | flan-t5-base | 768 | | Pooled encoder | CLIP-L | 768 | ## Training ### Current Approach All parameters are trainable. The model was initially ported from TinyFlux with frozen anchor layers, but current training runs with everything unfrozen for maximum flexibility. ### Dataset Training on [AbstractPhil/flux-schnell-teacher-latents](https://huggingface.co/datasets/AbstractPhil/flux-schnell-teacher-latents): - Pre-computed VAE latents from Flux-Schnell generations - 512×512 resolution (64×64 latent space) - Diverse prompts covering people, objects, scenes, styles ### Training Details - **Objective**: Flow matching (rectified flow) - **Timestep sampling**: Logit-normal with Flux shift (s=3.0) - **Loss weighting**: Min-SNR-γ (γ=5.0) - **Optimizer**: AdamW (lr=3e-4, β=(0.9, 0.99), wd=0.01) - **Schedule**: Cosine with warmup - **Precision**: bfloat16 - **Batch size**: 32 (16 × 2 gradient accumulation) - **EMA decay**: 0.9999 ### Checkpoints Checkpoints are saved every epoch or so with both main and EMA weights: - `checkpoints/step_XXXXX.safetensors` - Training weights - `checkpoints/step_XXXXX_ema.safetensors` - EMA weights (currently very broken and retraining, use standard step to inference) ## Usage ### Dependencies ```bash pip install torch transformers diffusers safetensors huggingface_hub ``` ### Basic Inference ```python import torch from huggingface_hub import hf_hub_download from safetensors.torch import load_file # Load model (requires TinyFluxDeep class from tinyflux_deep.py) config = TinyFluxDeepConfig() model = TinyFluxDeep(config).to("cuda", torch.bfloat16) # Load EMA weights (broken) or main weights weights = load_file(hf_hub_download( "AbstractPhil/tiny-flux-deep", "checkpoints/step_286250_ema.safetensors" # EMA will be better later, for now it's broken. )) model.load_state_dict(weights, strict=False) model.eval() ``` ### Sampling Lailah uses Euler discrete sampling with Flux timestep shift: ```python def flux_shift(t, s=3.0): """Bias timesteps toward data (higher t).""" return s * t / (1 + (s - 1) * t) # 20-50 steps recommended timesteps = flux_shift(torch.linspace(0, 1, num_steps + 1)) for i in range(num_steps): t_curr, t_next = timesteps[i], timesteps[i + 1] dt = t_next - t_curr v = model(hidden_states=x, encoder_hidden_states=t5_out, ...) x = x + v * dt # Euler step ``` ### Configuration ```python @dataclass class TinyFluxDeepConfig: hidden_size: int = 512 num_attention_heads: int = 4 attention_head_dim: int = 128 in_channels: int = 16 joint_attention_dim: int = 768 pooled_projection_dim: int = 768 num_double_layers: int = 15 num_single_layers: int = 25 mlp_ratio: float = 4.0 axes_dims_rope: Tuple[int, int, int] = (16, 56, 56) guidance_embeds: bool = True ``` ## Files ``` AbstractPhil/tiny-flux-deep/ ├── model.safetensors # Latest best weights ├── tinyflux_deep.py # Model architecture ├── colab_inference_lailah_early.py # Ready-to-run Colab inference ├── inference_tinyflux_deep.py # Standalone inference script ├── train_tinyflux_deep.py # Training script ├── checkpoints/ │ ├── step_286250.safetensors # Training weights │ └── step_286250_ema.safetensors # EMA weights (currently broken) ├── samples/ # Generated samples during training └── README.md ``` ## Origin: Porting from TinyFlux Lailah was initialized by porting TinyFlux weights: 1. **Attention head expansion** (2 → 4): Original heads copied to positions 0-1, new heads 2-3 Xavier initialized 2. **Hidden dimension expansion** (256 → 512): Weights tiled and scaled 3. **Layer distribution**: Original 3 layers distributed across 15/25 positions as initialization anchors The initial port used selective freezing of anchor layers, but current training leaves all parameters unfrozen. ## Comparison | Aspect | TinyFlux | Lailah | Full Flux | |--------|----------|--------|-----------| | Parameters | 10.7M | 241.8M | 12B | | Memory (bf16) | ~22MB | ~484MB | ~24GB | | Quality | Limited | Moderate | High | | Speed (A100) | ~10ms | ~40ms | ~200ms | ## Limitations - **Resolution**: 512×512 only (64×64 latent) - **Early training**: Quality improving but not production-ready - **Text capacity**: Limited by flan-t5-base (768 dim vs Flux's 4096) - **Experimental**: Research model, expect artifacts ## Intended Use - Rapid prototyping and iteration - Studying flow matching at moderate scale - Architecture experiments - Educational purposes - Baseline comparisons ## Name **Lailah** (לילה) - Angel of the night in Jewish tradition, said to guard souls. Chosen for this model's role as a smaller guardian exploring the same space as larger models. ## Citation ```bibtex @misc{tinyfluxlailah2026, title={TinyFlux-Lailah: Compact Flow Matching for Text-to-Image}, author={AbstractPhil}, year={2026}, url={https://huggingface.co/AbstractPhil/tiny-flux-deep} } ``` ## Related - [AbstractPhil/tiny-flux](https://huggingface.co/AbstractPhil/tiny-flux) - Base model (10.7M) - [AbstractPhil/flux-schnell-teacher-latents](https://huggingface.co/datasets/AbstractPhil/flux-schnell-teacher-latents) - Training data - [black-forest-labs/FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) - Teacher model ## License MIT License --- **Status**: Active training. Checkpoints updated regularly. Use standard weights for best results.