AbstractPhil
/

tiny-flux

@@ -8,86 +8,52 @@ tags:
 - flux
 - text-to-image
 - image-generation
-- deep
 - experimental
 library_name: pytorch
 pipeline_tag: text-to-image
 base_model:
-- AbstractPhil/tiny-flux
 - black-forest-labs/FLUX.1-schnell
 datasets:
 - AbstractPhil/flux-schnell-teacher-latents
 ---
-# TinyFlux-Deep
-An **expanded** TinyFlux architecture that increases depth and width while preserving learned representations. TinyFlux-Deep is ported from [TinyFlux](https://huggingface.co/AbstractPhil/tiny-flux) with strategic layer expansion and attention head doubling.
 ## Model Description
-TinyFlux-Deep extends the base TinyFlux model by:
-- **Doubling attention heads** (2 → 4) with expanded hidden dimension (256 → 512)
-- **5× more double-stream layers** (3 → 15)
-- **8× more single-stream layers** (3 → 25)
-- **Preserving learned weights** from TinyFlux in frozen anchor positions
 ### Architecture Comparison
-| Component | TinyFlux | TinyFlux-Deep | Flux |
-|-----------|----------|---------------|------|
-| Hidden size | 256 | **512** | 3072 |
-| Attention heads | 2 | **4** | 24 |
-| Head dimension | 128 | 128 | 128 |
-| Double-stream layers | 3 | **15** | 19 |
-| Single-stream layers | 3 | **25** | 38 |
-| VAE channels | 16 | 16 | 16 |
-| **Total params** | ~8M | **~85M** | ~12B |
-### Layer Mapping (Ported from TinyFlux)
-The original TinyFlux weights are strategically distributed and frozen:
-**Single blocks (3 → 25):**
-| TinyFlux Layer | TinyFlux-Deep Position | Status |
-|----------------|------------------------|--------|
-| 0 | 0 | Frozen |
-| 1 | 8, 12, 16 | Frozen (3 copies) |
-| 2 | 24 | Frozen |
-| — | 1-7, 9-11, 13-15, 17-23 | Trainable |
-**Double blocks (3 → 15):**
-| TinyFlux Layer | TinyFlux-Deep Position | Status |
-|----------------|------------------------|--------|
-| 0 | 0 | Frozen |
-| 1 | 4, 7, 10 | Frozen (3 copies) |
-| 2 | 14 | Frozen |
-| — | 1-3, 5-6, 8-9, 11-13 | Trainable |
-**Trainable ratio:** ~70% of parameters
-### Attention Head Expansion
-Original 2 heads are copied to new positions, with 2 new heads randomly initialized:
-- Old head 0 → New head 0
-- Old head 1 → New head 1
-- Heads 2-3 → Xavier initialized (scaled 0.02×)
 ### Text Encoders
-Same as TinyFlux:
-| Role | Model |
-|------|-------|
-| Sequence encoder | flan-t5-base (768 dim) |
-| Pooled encoder | CLIP-L (768 dim) |
-## Training
-### Strategy
-1. **Port** TinyFlux weights with dimension expansion
-2. **Freeze** ported layers as "anchor" knowledge
-3. **Train** new layers to interpolate between anchors
-4. **Optional:** Unfreeze all and fine-tune at lower LR
 ### Dataset
@@ -101,10 +67,17 @@ Trained on [AbstractPhil/flux-schnell-teacher-latents](https://huggingface.co/da
 - **Objective**: Flow matching (rectified flow)
 - **Timestep sampling**: Logit-normal with Flux shift (s=3.0)
 - **Loss weighting**: Min-SNR-γ (γ=5.0)
-- **Optimizer**: AdamW (lr=5e-5, β=(0.9, 0.99), wd=0.01)
 - **Schedule**: Cosine with warmup
 - **Precision**: bfloat16
-- **Batch size**: 32 (16 × 2 gradient accumulation)
 ## Usage
@@ -123,12 +96,12 @@ from safetensors.torch import load_file
 from transformers import T5EncoderModel, T5Tokenizer, CLIPTextModel, CLIPTokenizer
 from diffusers import AutoencoderKL
-# Load model (copy TinyFlux class definition first, use TinyFluxDeepConfig)
-config = TinyFluxDeepConfig()
 model = TinyFlux(config).to("cuda").to(torch.bfloat16)
-weights = load_file(hf_hub_download("AbstractPhil/tiny-flux-deep", "model.safetensors"))
-model.load_state_dict(weights, strict=False)  # strict=False for precomputed buffers
 model.eval()
 # Load encoders
@@ -139,21 +112,16 @@ clip_enc = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14", torch_
 vae = AutoencoderKL.from_pretrained("black-forest-labs/FLUX.1-schnell", subfolder="vae", torch_dtype=torch.bfloat16).to("cuda")
 # Encode prompt
-prompt = "a photo of a cat sitting on a windowsill"
 t5_in = t5_tok(prompt, max_length=128, padding="max_length", truncation=True, return_tensors="pt").to("cuda")
 t5_out = t5_enc(**t5_in).last_hidden_state
 clip_in = clip_tok(prompt, max_length=77, padding="max_length", truncation=True, return_tensors="pt").to("cuda")
 clip_out = clip_enc(**clip_in).pooler_output
-# Euler sampling with Flux shift
-def flux_shift(t, s=3.0):
-    return s * t / (1 + (s - 1) * t)
 x = torch.randn(1, 64*64, 16, device="cuda", dtype=torch.bfloat16)
 img_ids = TinyFlux.create_img_ids(1, 64, 64, "cuda")
-t_linear = torch.linspace(0, 1, 21, device="cuda")
-timesteps = flux_shift(t_linear)
 for i in range(20):
     t = timesteps[i].unsqueeze(0)
@@ -177,97 +145,58 @@ image = vae.decode(latents.float()).sample
 image = (image / 2 + 0.5).clamp(0, 1)
 ```
-### Configuration
-```python
-@dataclass
-class TinyFluxDeepConfig:
-    hidden_size: int = 512
-    num_attention_heads: int = 4
-    attention_head_dim: int = 128
-    in_channels: int = 16
-    joint_attention_dim: int = 768
-    pooled_projection_dim: int = 768
-    num_double_layers: int = 15
-    num_single_layers: int = 25
-    mlp_ratio: float = 4.0
-    axes_dims_rope: Tuple[int, int, int] = (16, 56, 56)
-    guidance_embeds: bool = True
-```
 ## Files
 ```
-AbstractPhil/tiny-flux-deep/
-├── model.safetensors      # Model weights (~340MB)
 ├── config.json            # Model configuration
-├── frozen_params.json     # List of frozen parameter names
 ├── README.md              # This file
-├── model.py               # Model architecture (includes TinyFluxDeepConfig)
 ├── inference_colab.py     # Inference script
-├── train_deep_colab.py    # Training script with layer freezing
-├── port_to_deep.py        # Porting script from TinyFlux
 ├── checkpoints/           # Training checkpoints
 │   └── step_*.safetensors
 ├── logs/                  # Tensorboard logs
 └── samples/               # Generated samples during training
 ```
-## Porting from TinyFlux
-To create a new TinyFlux-Deep from scratch:
-```python
-# Run port_to_deep.py
-# 1. Downloads AbstractPhil/tiny-flux weights
-# 2. Creates TinyFlux-Deep model (512 hidden, 4 heads, 25 single, 15 double)
-# 3. Expands attention heads (2→4) and hidden dimension (256→512)
-# 4. Distributes layers to anchor positions
-# 5. Saves to AbstractPhil/tiny-flux-deep
-```
-## Comparison with TinyFlux
-| Aspect | TinyFlux | TinyFlux-Deep |
-|--------|----------|---------------|
-| Parameters | ~8M | ~85M |
-| Memory (bf16) | ~16MB | ~170MB |
-| Forward pass | ~15ms | ~60ms |
-| Capacity | Limited | Moderate |
-| Training | From scratch | Ported + fine-tuned |
 ## Limitations
 - **Resolution**: Trained on 512×512 only
-- **Quality**: Better than TinyFlux, still below full Flux
 - **Text understanding**: Limited by smaller T5 encoder (768 vs 4096 dim)
-- **Early training**: Model is actively being trained
-- **Experimental**: Intended for research, not production
 ## Intended Use
-- Studying model scaling and expansion techniques
-- Testing layer freezing and knowledge transfer
-- Rapid prototyping with moderate capacity
 - Educational purposes
-- Baseline for architecture experiments
 ## Citation
 ```bibtex
-@misc{tinyfluxdeep2026,
-  title={TinyFlux-Deep: Expanded Flux Architecture with Knowledge Preservation},
   author={AbstractPhil},
-  year={2026},
-  url={https://huggingface.co/AbstractPhil/tiny-flux-deep}
 }
 ```
-## Related Models
-- [AbstractPhil/tiny-flux](https://huggingface.co/AbstractPhil/tiny-flux) - Base model (8M params)
-- [black-forest-labs/FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) - Original Flux
 ## Acknowledgments
 - [Black Forest Labs](https://blackforestlabs.ai/) for the original Flux architecture
@@ -279,4 +208,4 @@ MIT License - See LICENSE file for details.
 ---
-**Note**: This is an experimental research model under active development. Training is ongoing and weights may be updated frequently.

 - flux
 - text-to-image
 - image-generation
+- tiny
 - experimental
 library_name: pytorch
 pipeline_tag: text-to-image
 base_model:
 - black-forest-labs/FLUX.1-schnell
 datasets:
 - AbstractPhil/flux-schnell-teacher-latents
 ---
+# TinyFlux
+A **/12 scaled** Flux architecture for experimentation and research. TinyFlux maintains the core MMDiT (Multimodal Diffusion Transformer) design of Flux while dramatically reducing parameter count for faster iteration and lower resource requirements.
 ## Model Description
+TinyFlux is a miniaturized version of [FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) that preserves the essential architectural components:
+- **Double-stream blocks** (MMDiT style) - separate text/image pathways with joint attention
+- **Single-stream blocks** - concatenated text+image with shared weights
+- **AdaLN-Zero modulation** - adaptive layer norm with gating
+- **3D RoPE** - rotary position embeddings for temporal + spatial positions
+- **Flow matching** - rectified flow training objective
 ### Architecture Comparison
+| Component | Flux | TinyFlux | Scale |
+|-----------|------|----------|-------|
+| Hidden size | 3072 | 256 | /12 |
+| Attention heads | 24 | 2 | /12 |
+| Head dimension | 128 | 128 | preserved |
+| Double-stream layers | 19 | 3 | /6 |
+| Single-stream layers | 38 | 3 | /12 |
+| VAE channels | 16 | 16 | preserved |
+| **Total params** | ~12B | ~8M | /1500 |
 ### Text Encoders
+TinyFlux uses smaller text encoders than standard Flux:
+| Role | Flux | TinyFlux |
+|------|------|----------|
+| Sequence encoder | T5-XXL (4096 dim) | flan-t5-base (768 dim) |
+| Pooled encoder | CLIP-L (768 dim) | CLIP-L (768 dim) |
+## Training
 ### Dataset
 - **Objective**: Flow matching (rectified flow)
 - **Timestep sampling**: Logit-normal with Flux shift (s=3.0)
 - **Loss weighting**: Min-SNR-γ (γ=5.0)
+- **Optimizer**: AdamW (lr=1e-4, β=(0.9, 0.99), wd=0.01)
 - **Schedule**: Cosine with warmup
 - **Precision**: bfloat16
+### Flow Matching Formulation
+```
+Interpolation: x_t = (1 - t) * noise + t * data
+Target velocity: v = data - noise
+Loss: MSE(predicted_v, target_v) * min_snr_weight(t)
+```
 ## Usage
 from transformers import T5EncoderModel, T5Tokenizer, CLIPTextModel, CLIPTokenizer
 from diffusers import AutoencoderKL
+# Load model (copy TinyFlux class definition first)
+config = TinyFluxConfig()
 model = TinyFlux(config).to("cuda").to(torch.bfloat16)
+weights = load_file(hf_hub_download("AbstractPhil/tiny-flux", "model.safetensors"))
+model.load_state_dict(weights)
 model.eval()
 # Load encoders
 vae = AutoencoderKL.from_pretrained("black-forest-labs/FLUX.1-schnell", subfolder="vae", torch_dtype=torch.bfloat16).to("cuda")
 # Encode prompt
+prompt = "a photo of a cat"
 t5_in = t5_tok(prompt, max_length=128, padding="max_length", truncation=True, return_tensors="pt").to("cuda")
 t5_out = t5_enc(**t5_in).last_hidden_state
 clip_in = clip_tok(prompt, max_length=77, padding="max_length", truncation=True, return_tensors="pt").to("cuda")
 clip_out = clip_enc(**clip_in).pooler_output
+# Euler sampling (t: 0→1, noise→data)
 x = torch.randn(1, 64*64, 16, device="cuda", dtype=torch.bfloat16)
 img_ids = TinyFlux.create_img_ids(1, 64, 64, "cuda")
+timesteps = torch.linspace(0, 1, 21, device="cuda")
 for i in range(20):
     t = timesteps[i].unsqueeze(0)
 image = (image / 2 + 0.5).clamp(0, 1)
 ```
+### Full Inference Script
+See the [inference_colab.py](https://huggingface.co/AbstractPhil/tiny-flux/blob/main/inference_colab.py) for a complete generation pipeline with:
+- Classifier-free guidance
+- Batch generation
+- Image saving
 ## Files
 ```
+AbstractPhil/tiny-flux/
+├── model.safetensors      # Model weights (~32MB)
 ├── config.json            # Model configuration
 ├── README.md              # This file
+├── model.py               # Model architecture definition
 ├── inference_colab.py     # Inference script
+├── train_colab.py         # Training script
 ├── checkpoints/           # Training checkpoints
 │   └── step_*.safetensors
 ├── logs/                  # Tensorboard logs
 └── samples/               # Generated samples during training
 ```
 ## Limitations
 - **Resolution**: Trained on 512×512 only
+- **Quality**: Significantly lower than full Flux due to reduced capacity
 - **Text understanding**: Limited by smaller T5 encoder (768 vs 4096 dim)
+- **Fine details**: May struggle with complex scenes or fine-grained details
+- **Experimental**: Intended for research and learning, not production use
 ## Intended Use
+- Understanding Flux/MMDiT architecture
+- Rapid prototyping and experimentation
 - Educational purposes
+- Resource-constrained environments
+- Baseline for architecture modifications
 ## Citation
+If you use TinyFlux in your research, please cite:
 ```bibtex
+@misc{tinyflux2025,
+  title={TinyFlux: A Miniaturized Flux Architecture for Experimentation},
   author={AbstractPhil},
+  year={2025},
+  url={https://huggingface.co/AbstractPhil/tiny-flux}
 }
 ```
 ## Acknowledgments
 - [Black Forest Labs](https://blackforestlabs.ai/) for the original Flux architecture
 ---
+**Note**: This is an experimental research model. For high-quality image generation, use the full [FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) or [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) models.