AbstractPhil
/

tiny-flux-deep

@@ -8,7 +8,8 @@ tags:
 - flux
 - text-to-image
 - image-generation
-- deep
 - experimental
 library_name: pytorch
 pipeline_tag: text-to-image
@@ -19,81 +20,56 @@ datasets:
 - AbstractPhil/flux-schnell-teacher-latents
 ---
-# TinyFlux-Deep
-An **expanded** TinyFlux architecture that increases depth and width while preserving learned representations. TinyFlux-Deep is ported from [TinyFlux](https://huggingface.co/AbstractPhil/tiny-flux) with strategic layer expansion and attention head doubling.
-## Model Description
-TinyFlux-Deep extends the base TinyFlux model by:
-- **Doubling attention heads** (2 → 4) with expanded hidden dimension (256 → 512)
-- **5× more double-stream layers** (3 → 15)
-- **8× more single-stream layers** (3 → 25)
-- **Preserving learned weights** from TinyFlux in frozen anchor positions
-### Architecture Comparison
-| Component | TinyFlux | TinyFlux-Deep | Flux |
-|-----------|----------|---------------|------|
 | Hidden size | 256 | **512** | 3072 |
 | Attention heads | 2 | **4** | 24 |
 | Head dimension | 128 | 128 | 128 |
 | Double-stream layers | 3 | **15** | 19 |
 | Single-stream layers | 3 | **25** | 38 |
 | VAE channels | 16 | 16 | 16 |
-| **Total params** | ~8M | **~85M** | ~12B |
-### Layer Mapping (Ported from TinyFlux)
-The original TinyFlux weights are strategically distributed and frozen:
-**Single blocks (3 → 25):**
-| TinyFlux Layer | TinyFlux-Deep Position | Status |
-|----------------|------------------------|--------|
-| 0 | 0 | Frozen |
-| 1 | 8, 12, 16 | Frozen (3 copies) |
-| 2 | 24 | Frozen |
-| — | 1-7, 9-11, 13-15, 17-23 | Trainable |
-**Double blocks (3 → 15):**
-| TinyFlux Layer | TinyFlux-Deep Position | Status |
-|----------------|------------------------|--------|
-| 0 | 0 | Frozen |
-| 1 | 4, 7, 10 | Frozen (3 copies) |
-| 2 | 14 | Frozen |
-| — | 1-3, 5-6, 8-9, 11-13 | Trainable |
-**Trainable ratio:** ~70% of parameters
-### Attention Head Expansion
-Original 2 heads are copied to new positions, with 2 new heads randomly initialized:
-- Old head 0 → New head 0
-- Old head 1 → New head 1
-- Heads 2-3 → Xavier initialized (scaled 0.02×)
 ### Text Encoders
-Same as TinyFlux:
-| Role | Model |
-|------|-------|
-| Sequence encoder | flan-t5-base (768 dim) |
-| Pooled encoder | CLIP-L (768 dim) |
 ## Training
-### Strategy
-1. **Port** TinyFlux weights with dimension expansion
-2. **Freeze** ported layers as "anchor" knowledge
-3. **Train** new layers to interpolate between anchors
-4. **Optional:** Unfreeze all and fine-tune at lower LR
 ### Dataset
-Trained on [AbstractPhil/flux-schnell-teacher-latents](https://huggingface.co/datasets/AbstractPhil/flux-schnell-teacher-latents):
-- 10,000 samples
-- Pre-computed VAE latents (16, 64, 64) from 512×512 images
 - Diverse prompts covering people, objects, scenes, styles
 ### Training Details
@@ -101,80 +77,64 @@ Trained on [AbstractPhil/flux-schnell-teacher-latents](https://huggingface.co/da
 - **Objective**: Flow matching (rectified flow)
 - **Timestep sampling**: Logit-normal with Flux shift (s=3.0)
 - **Loss weighting**: Min-SNR-γ (γ=5.0)
-- **Optimizer**: AdamW (lr=5e-5, β=(0.9, 0.99), wd=0.01)
 - **Schedule**: Cosine with warmup
 - **Precision**: bfloat16
 - **Batch size**: 32 (16 × 2 gradient accumulation)
 ## Usage
-### Installation
 ```bash
 pip install torch transformers diffusers safetensors huggingface_hub
 ```
-### Inference
 ```python
 import torch
 from huggingface_hub import hf_hub_download
 from safetensors.torch import load_file
-from transformers import T5EncoderModel, T5Tokenizer, CLIPTextModel, CLIPTokenizer
-from diffusers import AutoencoderKL
-# Load model (copy TinyFlux class definition first, use TinyFluxDeepConfig)
 config = TinyFluxDeepConfig()
-model = TinyFlux(config).to("cuda").to(torch.bfloat16)
-weights = load_file(hf_hub_download("AbstractPhil/tiny-flux-deep", "model.safetensors"))
-model.load_state_dict(weights, strict=False)  # strict=False for precomputed buffers
 model.eval()
-# Load encoders
-t5_tok = T5Tokenizer.from_pretrained("google/flan-t5-base")
-t5_enc = T5EncoderModel.from_pretrained("google/flan-t5-base", torch_dtype=torch.bfloat16).to("cuda")
-clip_tok = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
-clip_enc = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14", torch_dtype=torch.bfloat16).to("cuda")
-vae = AutoencoderKL.from_pretrained("black-forest-labs/FLUX.1-schnell", subfolder="vae", torch_dtype=torch.bfloat16).to("cuda")
-# Encode prompt
-prompt = "a photo of a cat sitting on a windowsill"
-t5_in = t5_tok(prompt, max_length=128, padding="max_length", truncation=True, return_tensors="pt").to("cuda")
-t5_out = t5_enc(**t5_in).last_hidden_state
-clip_in = clip_tok(prompt, max_length=77, padding="max_length", truncation=True, return_tensors="pt").to("cuda")
-clip_out = clip_enc(**clip_in).pooler_output
-# Euler sampling with Flux shift
 def flux_shift(t, s=3.0):
     return s * t / (1 + (s - 1) * t)
-x = torch.randn(1, 64*64, 16, device="cuda", dtype=torch.bfloat16)
-img_ids = TinyFlux.create_img_ids(1, 64, 64, "cuda")
-t_linear = torch.linspace(0, 1, 21, device="cuda")
-timesteps = flux_shift(t_linear)
-for i in range(20):
-    t = timesteps[i].unsqueeze(0)
-    dt = timesteps[i+1] - timesteps[i]
-    guidance = torch.tensor([3.5], device="cuda", dtype=torch.bfloat16)
-    v = model(
-        hidden_states=x,
-        encoder_hidden_states=t5_out,
-        pooled_projections=clip_out,
-        timestep=t,
-        img_ids=img_ids,
-        guidance=guidance,
-    )
-    x = x + v * dt
-# Decode
-latents = x.reshape(1, 64, 64, 16).permute(0, 3, 1, 2)
-latents = latents / vae.config.scaling_factor
-image = vae.decode(latents.float()).sample
-image = (image / 2 + 0.5).clamp(0, 1)
 ```
 ### Configuration
@@ -199,84 +159,77 @@ class TinyFluxDeepConfig:
 ```
 AbstractPhil/tiny-flux-deep/
-├── model.safetensors      # Model weights (~340MB)
-├── config.json            # Model configuration
-├── frozen_params.json     # List of frozen parameter names
-├── README.md              # This file
-├── model.py               # Model architecture (includes TinyFluxDeepConfig)
-├── inference_colab.py     # Inference script
-├── train_deep_colab.py    # Training script with layer freezing
-├── port_to_deep.py        # Porting script from TinyFlux
-├── checkpoints/           # Training checkpoints
-│   └── step_*.safetensors
-├── logs/                  # Tensorboard logs
-└── samples/               # Generated samples during training
 ```
-## Porting from TinyFlux
-To create a new TinyFlux-Deep from scratch:
-```python
-# Run port_to_deep.py
-# 1. Downloads AbstractPhil/tiny-flux weights
-# 2. Creates TinyFlux-Deep model (512 hidden, 4 heads, 25 single, 15 double)
-# 3. Expands attention heads (2→4) and hidden dimension (256→512)
-# 4. Distributes layers to anchor positions
-# 5. Saves to AbstractPhil/tiny-flux-deep
-```
-## Comparison with TinyFlux
-| Aspect | TinyFlux | TinyFlux-Deep |
-|--------|----------|---------------|
-| Parameters | ~8M | ~85M |
-| Memory (bf16) | ~16MB | ~170MB |
-| Forward pass | ~15ms | ~60ms |
-| Capacity | Limited | Moderate |
-| Training | From scratch | Ported + fine-tuned |
 ## Limitations
-- **Resolution**: Trained on 512×512 only
-- **Quality**: Better than TinyFlux, still below full Flux
-- **Text understanding**: Limited by smaller T5 encoder (768 vs 4096 dim)
-- **Early training**: Model is actively being trained
-- **Experimental**: Intended for research, not production
 ## Intended Use
-- Studying model scaling and expansion techniques
-- Testing layer freezing and knowledge transfer
-- Rapid prototyping with moderate capacity
 - Educational purposes
-- Baseline for architecture experiments
 ## Citation
 ```bibtex
-@misc{tinyfluxdeep2026,
-  title={TinyFlux-Deep: Expanded Flux Architecture with Knowledge Preservation},
   author={AbstractPhil},
   year={2026},
   url={https://huggingface.co/AbstractPhil/tiny-flux-deep}
 }
 ```
-## Related Models
-- [AbstractPhil/tiny-flux](https://huggingface.co/AbstractPhil/tiny-flux) - Base model (8M params)
-- [black-forest-labs/FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) - Original Flux
-## Acknowledgments
-- [Black Forest Labs](https://blackforestlabs.ai/) for the original Flux architecture
-- [Hugging Face](https://huggingface.co/) for diffusers and transformers libraries
 ## License
-MIT License - See LICENSE file for details.
 ---
-**Note**: This is an experimental research model under active development. Training is ongoing and weights may be updated frequently.

 - flux
 - text-to-image
 - image-generation
+- tinyflux
+- lailah
 - experimental
 library_name: pytorch
 pipeline_tag: text-to-image
 - AbstractPhil/flux-schnell-teacher-latents
 ---
+# TinyFlux-Deep (Lailah)
+**TinyFlux-Lailah** is an expanded TinyFlux architecture with increased depth and width. Originally ported from [TinyFlux](https://huggingface.co/AbstractPhil/tiny-flux) with strategic layer expansion and attention head doubling, now training end-to-end on teacher latents.
+> **Current checkpoint:** `step_286250` | **Status:** Active training
+## Quick Start (Colab)
+The easiest way to test Lailah:
+1. Open [Google Colab](https://colab.research.google.com/)
+2. Copy the contents of [`colab_inference_lailah_early.py`](./colab_inference_lailah_early.py)
+3. Run the cells
+```python
+# Or fetch directly:
+!wget https://huggingface.co/AbstractPhil/tiny-flux-deep/raw/main/colab_inference_lailah_early.py
+%run colab_inference_lailah_early.py
+```
+## Architecture
+| Component | TinyFlux | TinyFlux-Lailah | Flux |
+|-----------|----------|-----------------|------|
 | Hidden size | 256 | **512** | 3072 |
 | Attention heads | 2 | **4** | 24 |
 | Head dimension | 128 | 128 | 128 |
 | Double-stream layers | 3 | **15** | 19 |
 | Single-stream layers | 3 | **25** | 38 |
 | VAE channels | 16 | 16 | 16 |
+| **Total params** | ~10.7M | **~241.8M** | ~12B |
 ### Text Encoders
+| Role | Model | Dimension |
+|------|-------|-----------|
+| Sequence encoder | flan-t5-base | 768 |
+| Pooled encoder | CLIP-L | 768 |
 ## Training
+### Current Approach
+All parameters are trainable. The model was initially ported from TinyFlux with frozen anchor layers, but current training runs with everything unfrozen for maximum flexibility.
 ### Dataset
+Training on [AbstractPhil/flux-schnell-teacher-latents](https://huggingface.co/datasets/AbstractPhil/flux-schnell-teacher-latents):
+- Pre-computed VAE latents from Flux-Schnell generations
+- 512×512 resolution (64×64 latent space)
 - Diverse prompts covering people, objects, scenes, styles
 ### Training Details
 - **Objective**: Flow matching (rectified flow)
 - **Timestep sampling**: Logit-normal with Flux shift (s=3.0)
 - **Loss weighting**: Min-SNR-γ (γ=5.0)
+- **Optimizer**: AdamW (lr=3e-4, β=(0.9, 0.99), wd=0.01)
 - **Schedule**: Cosine with warmup
 - **Precision**: bfloat16
 - **Batch size**: 32 (16 × 2 gradient accumulation)
+- **EMA decay**: 0.9999
+### Checkpoints
+Checkpoints are saved every 625 steps with both main and EMA weights:
+- `checkpoints/step_XXXXX.safetensors` - Training weights
+- `checkpoints/step_XXXXX_ema.safetensors` - EMA weights (recommended for inference)
 ## Usage
+### Dependencies
 ```bash
 pip install torch transformers diffusers safetensors huggingface_hub
 ```
+### Basic Inference
 ```python
 import torch
 from huggingface_hub import hf_hub_download
 from safetensors.torch import load_file
+# Load model (requires TinyFluxDeep class from tinyflux_deep.py)
 config = TinyFluxDeepConfig()
+model = TinyFluxDeep(config).to("cuda", torch.bfloat16)
+# Load EMA weights (recommended) or main weights
+weights = load_file(hf_hub_download(
+    "AbstractPhil/tiny-flux-deep",
+    "checkpoints/step_286250_ema.safetensors"  # Use _ema for best quality
+))
+model.load_state_dict(weights, strict=False)
 model.eval()
+```
+### Sampling
+Lailah uses Euler discrete sampling with Flux timestep shift:
+```python
 def flux_shift(t, s=3.0):
+    """Bias timesteps toward data (higher t)."""
     return s * t / (1 + (s - 1) * t)
+# 20-50 steps recommended
+timesteps = flux_shift(torch.linspace(0, 1, num_steps + 1))
+for i in range(num_steps):
+    t_curr, t_next = timesteps[i], timesteps[i + 1]
+    dt = t_next - t_curr
+    v = model(hidden_states=x, encoder_hidden_states=t5_out, ...)
+    x = x + v * dt  # Euler step
 ```
 ### Configuration
 ```
 AbstractPhil/tiny-flux-deep/
+├── model.safetensors              # Latest best weights
+├── tinyflux_deep.py               # Model architecture
+├── colab_inference_lailah_early.py # Ready-to-run Colab inference
+├── inference_tinyflux_deep.py     # Standalone inference script
+├── train_tinyflux_deep.py         # Training script
+├── checkpoints/
+│   ├── step_286250.safetensors    # Training weights
+│   └── step_286250_ema.safetensors # EMA weights (use this)
+├── samples/                        # Generated samples during training
+└── README.md
 ```
+## Origin: Porting from TinyFlux
+Lailah was initialized by porting TinyFlux weights:
+1. **Attention head expansion** (2 → 4): Original heads copied to positions 0-1, new heads 2-3 Xavier initialized
+2. **Hidden dimension expansion** (256 → 512): Weights tiled and scaled
+3. **Layer distribution**: Original 3 layers distributed across 15/25 positions as initialization anchors
+The initial port used selective freezing of anchor layers, but current training leaves all parameters unfrozen.
+## Comparison
+| Aspect | TinyFlux | Lailah | Full Flux |
+|--------|----------|--------|-----------|
+| Parameters | 10.7M | 241.8M | 12B |
+| Memory (bf16) | ~22MB | ~484MB | ~24GB |
+| Quality | Limited | Moderate | High |
+| Speed (A100) | ~10ms | ~40ms | ~200ms |
 ## Limitations
+- **Resolution**: 512×512 only (64×64 latent)
+- **Early training**: Quality improving but not production-ready
+- **Text capacity**: Limited by flan-t5-base (768 dim vs Flux's 4096)
+- **Experimental**: Research model, expect artifacts
 ## Intended Use
+- Rapid prototyping and iteration
+- Studying flow matching at moderate scale
+- Architecture experiments
 - Educational purposes
+- Baseline comparisons
+## Name
+**Lailah** (לילה) - Angel of the night in Jewish tradition, said to guard souls. Chosen for this model's role as a smaller guardian exploring the same space as larger models.
 ## Citation
 ```bibtex
+@misc{tinyfluxlailah2026,
+  title={TinyFlux-Lailah: Compact Flow Matching for Text-to-Image},
   author={AbstractPhil},
   year={2026},
   url={https://huggingface.co/AbstractPhil/tiny-flux-deep}
 }
 ```
+## Related
+- [AbstractPhil/tiny-flux](https://huggingface.co/AbstractPhil/tiny-flux) - Base model (10.7M)
+- [AbstractPhil/flux-schnell-teacher-latents](https://huggingface.co/datasets/AbstractPhil/flux-schnell-teacher-latents) - Training data
+- [black-forest-labs/FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) - Teacher model
 ## License
+MIT License
 ---
+**Status**: Active training. Checkpoints updated regularly. Use EMA weights for best results.