--- license: mit language: - en tags: - diffusion - flow-matching - flux - text-to-image - image-generation - deep - experimental library_name: pytorch pipeline_tag: text-to-image base_model: - AbstractPhil/tiny-flux - black-forest-labs/FLUX.1-schnell datasets: - AbstractPhil/flux-schnell-teacher-latents --- # TinyFlux-Deep An **expanded** TinyFlux architecture that increases depth and width while preserving learned representations. TinyFlux-Deep is ported from [TinyFlux](https://huggingface.co/AbstractPhil/tiny-flux) with strategic layer expansion and attention head doubling. ## Model Description TinyFlux-Deep extends the base TinyFlux model by: - **Doubling attention heads** (2 → 4) with expanded hidden dimension (256 → 512) - **5× more double-stream layers** (3 → 15) - **8× more single-stream layers** (3 → 25) - **Preserving learned weights** from TinyFlux in frozen anchor positions ### Architecture Comparison | Component | TinyFlux | TinyFlux-Deep | Flux | |-----------|----------|---------------|------| | Hidden size | 256 | **512** | 3072 | | Attention heads | 2 | **4** | 24 | | Head dimension | 128 | 128 | 128 | | Double-stream layers | 3 | **15** | 19 | | Single-stream layers | 3 | **25** | 38 | | VAE channels | 16 | 16 | 16 | | **Total params** | ~8M | **~85M** | ~12B | ### Layer Mapping (Ported from TinyFlux) The original TinyFlux weights are strategically distributed and frozen: **Single blocks (3 → 25):** | TinyFlux Layer | TinyFlux-Deep Position | Status | |----------------|------------------------|--------| | 0 | 0 | Frozen | | 1 | 8, 12, 16 | Frozen (3 copies) | | 2 | 24 | Frozen | | — | 1-7, 9-11, 13-15, 17-23 | Trainable | **Double blocks (3 → 15):** | TinyFlux Layer | TinyFlux-Deep Position | Status | |----------------|------------------------|--------| | 0 | 0 | Frozen | | 1 | 4, 7, 10 | Frozen (3 copies) | | 2 | 14 | Frozen | | — | 1-3, 5-6, 8-9, 11-13 | Trainable | **Trainable ratio:** ~70% of parameters ### Attention Head Expansion Original 2 heads are copied to new positions, with 2 new heads randomly initialized: - Old head 0 → New head 0 - Old head 1 → New head 1 - Heads 2-3 → Xavier initialized (scaled 0.02×) ### Text Encoders Same as TinyFlux: | Role | Model | |------|-------| | Sequence encoder | flan-t5-base (768 dim) | | Pooled encoder | CLIP-L (768 dim) | ## Training ### Strategy 1. **Port** TinyFlux weights with dimension expansion 2. **Freeze** ported layers as "anchor" knowledge 3. **Train** new layers to interpolate between anchors 4. **Optional:** Unfreeze all and fine-tune at lower LR ### Dataset Trained on [AbstractPhil/flux-schnell-teacher-latents](https://huggingface.co/datasets/AbstractPhil/flux-schnell-teacher-latents): - 10,000 samples - Pre-computed VAE latents (16, 64, 64) from 512×512 images - Diverse prompts covering people, objects, scenes, styles ### Training Details - **Objective**: Flow matching (rectified flow) - **Timestep sampling**: Logit-normal with Flux shift (s=3.0) - **Loss weighting**: Min-SNR-γ (γ=5.0) - **Optimizer**: AdamW (lr=5e-5, β=(0.9, 0.99), wd=0.01) - **Schedule**: Cosine with warmup - **Precision**: bfloat16 - **Batch size**: 32 (16 × 2 gradient accumulation) ## Usage ### Installation ```bash pip install torch transformers diffusers safetensors huggingface_hub ``` ### Inference ```python import torch from huggingface_hub import hf_hub_download from safetensors.torch import load_file from transformers import T5EncoderModel, T5Tokenizer, CLIPTextModel, CLIPTokenizer from diffusers import AutoencoderKL # Load model (copy TinyFlux class definition first, use TinyFluxDeepConfig) config = TinyFluxDeepConfig() model = TinyFlux(config).to("cuda").to(torch.bfloat16) weights = load_file(hf_hub_download("AbstractPhil/tiny-flux-deep", "model.safetensors")) model.load_state_dict(weights, strict=False) # strict=False for precomputed buffers model.eval() # Load encoders t5_tok = T5Tokenizer.from_pretrained("google/flan-t5-base") t5_enc = T5EncoderModel.from_pretrained("google/flan-t5-base", torch_dtype=torch.bfloat16).to("cuda") clip_tok = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14") clip_enc = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14", torch_dtype=torch.bfloat16).to("cuda") vae = AutoencoderKL.from_pretrained("black-forest-labs/FLUX.1-schnell", subfolder="vae", torch_dtype=torch.bfloat16).to("cuda") # Encode prompt prompt = "a photo of a cat sitting on a windowsill" t5_in = t5_tok(prompt, max_length=128, padding="max_length", truncation=True, return_tensors="pt").to("cuda") t5_out = t5_enc(**t5_in).last_hidden_state clip_in = clip_tok(prompt, max_length=77, padding="max_length", truncation=True, return_tensors="pt").to("cuda") clip_out = clip_enc(**clip_in).pooler_output # Euler sampling with Flux shift def flux_shift(t, s=3.0): return s * t / (1 + (s - 1) * t) x = torch.randn(1, 64*64, 16, device="cuda", dtype=torch.bfloat16) img_ids = TinyFlux.create_img_ids(1, 64, 64, "cuda") t_linear = torch.linspace(0, 1, 21, device="cuda") timesteps = flux_shift(t_linear) for i in range(20): t = timesteps[i].unsqueeze(0) dt = timesteps[i+1] - timesteps[i] guidance = torch.tensor([3.5], device="cuda", dtype=torch.bfloat16) v = model( hidden_states=x, encoder_hidden_states=t5_out, pooled_projections=clip_out, timestep=t, img_ids=img_ids, guidance=guidance, ) x = x + v * dt # Decode latents = x.reshape(1, 64, 64, 16).permute(0, 3, 1, 2) latents = latents / vae.config.scaling_factor image = vae.decode(latents.float()).sample image = (image / 2 + 0.5).clamp(0, 1) ``` ### Configuration ```python @dataclass class TinyFluxDeepConfig: hidden_size: int = 512 num_attention_heads: int = 4 attention_head_dim: int = 128 in_channels: int = 16 joint_attention_dim: int = 768 pooled_projection_dim: int = 768 num_double_layers: int = 15 num_single_layers: int = 25 mlp_ratio: float = 4.0 axes_dims_rope: Tuple[int, int, int] = (16, 56, 56) guidance_embeds: bool = True ``` ## Files ``` AbstractPhil/tiny-flux-deep/ ├── model.safetensors # Model weights (~340MB) ├── config.json # Model configuration ├── frozen_params.json # List of frozen parameter names ├── README.md # This file ├── model.py # Model architecture (includes TinyFluxDeepConfig) ├── inference_colab.py # Inference script ├── train_deep_colab.py # Training script with layer freezing ├── port_to_deep.py # Porting script from TinyFlux ├── checkpoints/ # Training checkpoints │ └── step_*.safetensors ├── logs/ # Tensorboard logs └── samples/ # Generated samples during training ``` ## Porting from TinyFlux To create a new TinyFlux-Deep from scratch: ```python # Run port_to_deep.py # 1. Downloads AbstractPhil/tiny-flux weights # 2. Creates TinyFlux-Deep model (512 hidden, 4 heads, 25 single, 15 double) # 3. Expands attention heads (2→4) and hidden dimension (256→512) # 4. Distributes layers to anchor positions # 5. Saves to AbstractPhil/tiny-flux-deep ``` ## Comparison with TinyFlux | Aspect | TinyFlux | TinyFlux-Deep | |--------|----------|---------------| | Parameters | ~8M | ~85M | | Memory (bf16) | ~16MB | ~170MB | | Forward pass | ~15ms | ~60ms | | Capacity | Limited | Moderate | | Training | From scratch | Ported + fine-tuned | ## Limitations - **Resolution**: Trained on 512×512 only - **Quality**: Better than TinyFlux, still below full Flux - **Text understanding**: Limited by smaller T5 encoder (768 vs 4096 dim) - **Early training**: Model is actively being trained - **Experimental**: Intended for research, not production ## Intended Use - Studying model scaling and expansion techniques - Testing layer freezing and knowledge transfer - Rapid prototyping with moderate capacity - Educational purposes - Baseline for architecture experiments ## Citation ```bibtex @misc{tinyfluxdeep2026, title={TinyFlux-Deep: Expanded Flux Architecture with Knowledge Preservation}, author={AbstractPhil}, year={2026}, url={https://huggingface.co/AbstractPhil/tiny-flux-deep} } ``` ## Related Models - [AbstractPhil/tiny-flux](https://huggingface.co/AbstractPhil/tiny-flux) - Base model (8M params) - [black-forest-labs/FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) - Original Flux ## Acknowledgments - [Black Forest Labs](https://blackforestlabs.ai/) for the original Flux architecture - [Hugging Face](https://huggingface.co/) for diffusers and transformers libraries ## License MIT License - See LICENSE file for details. --- **Note**: This is an experimental research model under active development. Training is ongoing and weights may be updated frequently.