File size: 8,943 Bytes

82ba681

---
license: mit
language:
- en
tags:
- diffusion
- flow-matching
- flux
- text-to-image
- image-generation
- deep
- experimental
library_name: pytorch
pipeline_tag: text-to-image
base_model:
- AbstractPhil/tiny-flux
- black-forest-labs/FLUX.1-schnell
datasets:
- AbstractPhil/flux-schnell-teacher-latents
---

# TinyFlux-Deep

An **expanded** TinyFlux architecture that increases depth and width while preserving learned representations. TinyFlux-Deep is ported from [TinyFlux](https://huggingface.co/AbstractPhil/tiny-flux) with strategic layer expansion and attention head doubling.

## Model Description

TinyFlux-Deep extends the base TinyFlux model by:
- **Doubling attention heads** (2 → 4) with expanded hidden dimension (256 → 512)
- **5× more double-stream layers** (3 → 15)
- **8× more single-stream layers** (3 → 25)
- **Preserving learned weights** from TinyFlux in frozen anchor positions

### Architecture Comparison

| Component | TinyFlux | TinyFlux-Deep | Flux |
|-----------|----------|---------------|------|
| Hidden size | 256 | **512** | 3072 |
| Attention heads | 2 | **4** | 24 |
| Head dimension | 128 | 128 | 128 |
| Double-stream layers | 3 | **15** | 19 |
| Single-stream layers | 3 | **25** | 38 |
| VAE channels | 16 | 16 | 16 |
| **Total params** | ~8M | **~85M** | ~12B |

### Layer Mapping (Ported from TinyFlux)

The original TinyFlux weights are strategically distributed and frozen:

**Single blocks (3 → 25):**
| TinyFlux Layer | TinyFlux-Deep Position | Status |
|----------------|------------------------|--------|
| 0 | 0 | Frozen |
| 1 | 8, 12, 16 | Frozen (3 copies) |
| 2 | 24 | Frozen |
| — | 1-7, 9-11, 13-15, 17-23 | Trainable |

**Double blocks (3 → 15):**
| TinyFlux Layer | TinyFlux-Deep Position | Status |
|----------------|------------------------|--------|
| 0 | 0 | Frozen |
| 1 | 4, 7, 10 | Frozen (3 copies) |
| 2 | 14 | Frozen |
| — | 1-3, 5-6, 8-9, 11-13 | Trainable |

**Trainable ratio:** ~70% of parameters

### Attention Head Expansion

Original 2 heads are copied to new positions, with 2 new heads randomly initialized:
- Old head 0 → New head 0
- Old head 1 → New head 1
- Heads 2-3 → Xavier initialized (scaled 0.02×)

### Text Encoders

Same as TinyFlux:
| Role | Model |
|------|-------|
| Sequence encoder | flan-t5-base (768 dim) |
| Pooled encoder | CLIP-L (768 dim) |

## Training

### Strategy

1. **Port** TinyFlux weights with dimension expansion
2. **Freeze** ported layers as "anchor" knowledge
3. **Train** new layers to interpolate between anchors
4. **Optional:** Unfreeze all and fine-tune at lower LR

### Dataset

Trained on [AbstractPhil/flux-schnell-teacher-latents](https://huggingface.co/datasets/AbstractPhil/flux-schnell-teacher-latents):
- 10,000 samples
- Pre-computed VAE latents (16, 64, 64) from 512×512 images
- Diverse prompts covering people, objects, scenes, styles

### Training Details

- **Objective**: Flow matching (rectified flow)
- **Timestep sampling**: Logit-normal with Flux shift (s=3.0)
- **Loss weighting**: Min-SNR-γ (γ=5.0)
- **Optimizer**: AdamW (lr=5e-5, β=(0.9, 0.99), wd=0.01)
- **Schedule**: Cosine with warmup
- **Precision**: bfloat16
- **Batch size**: 32 (16 × 2 gradient accumulation)

## Usage

### Installation

```bash
pip install torch transformers diffusers safetensors huggingface_hub
```

### Inference

```python
import torch
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
from transformers import T5EncoderModel, T5Tokenizer, CLIPTextModel, CLIPTokenizer
from diffusers import AutoencoderKL

# Load model (copy TinyFlux class definition first, use TinyFluxDeepConfig)
config = TinyFluxDeepConfig()
model = TinyFlux(config).to("cuda").to(torch.bfloat16)

weights = load_file(hf_hub_download("AbstractPhil/tiny-flux-deep", "model.safetensors"))
model.load_state_dict(weights, strict=False)  # strict=False for precomputed buffers
model.eval()

# Load encoders
t5_tok = T5Tokenizer.from_pretrained("google/flan-t5-base")
t5_enc = T5EncoderModel.from_pretrained("google/flan-t5-base", torch_dtype=torch.bfloat16).to("cuda")
clip_tok = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
clip_enc = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14", torch_dtype=torch.bfloat16).to("cuda")
vae = AutoencoderKL.from_pretrained("black-forest-labs/FLUX.1-schnell", subfolder="vae", torch_dtype=torch.bfloat16).to("cuda")

# Encode prompt
prompt = "a photo of a cat sitting on a windowsill"
t5_in = t5_tok(prompt, max_length=128, padding="max_length", truncation=True, return_tensors="pt").to("cuda")
t5_out = t5_enc(**t5_in).last_hidden_state
clip_in = clip_tok(prompt, max_length=77, padding="max_length", truncation=True, return_tensors="pt").to("cuda")
clip_out = clip_enc(**clip_in).pooler_output

# Euler sampling with Flux shift
def flux_shift(t, s=3.0):
    return s * t / (1 + (s - 1) * t)

x = torch.randn(1, 64*64, 16, device="cuda", dtype=torch.bfloat16)
img_ids = TinyFlux.create_img_ids(1, 64, 64, "cuda")

t_linear = torch.linspace(0, 1, 21, device="cuda")
timesteps = flux_shift(t_linear)

for i in range(20):
    t = timesteps[i].unsqueeze(0)
    dt = timesteps[i+1] - timesteps[i]
    guidance = torch.tensor([3.5], device="cuda", dtype=torch.bfloat16)
    
    v = model(
        hidden_states=x,
        encoder_hidden_states=t5_out,
        pooled_projections=clip_out,
        timestep=t,
        img_ids=img_ids,
        guidance=guidance,
    )
    x = x + v * dt

# Decode
latents = x.reshape(1, 64, 64, 16).permute(0, 3, 1, 2)
latents = latents / vae.config.scaling_factor
image = vae.decode(latents.float()).sample
image = (image / 2 + 0.5).clamp(0, 1)
```

### Configuration

```python
@dataclass
class TinyFluxDeepConfig:
    hidden_size: int = 512
    num_attention_heads: int = 4
    attention_head_dim: int = 128
    in_channels: int = 16
    joint_attention_dim: int = 768
    pooled_projection_dim: int = 768
    num_double_layers: int = 15
    num_single_layers: int = 25
    mlp_ratio: float = 4.0
    axes_dims_rope: Tuple[int, int, int] = (16, 56, 56)
    guidance_embeds: bool = True
```

## Files

```
AbstractPhil/tiny-flux-deep/
├── model.safetensors      # Model weights (~340MB)
├── config.json            # Model configuration
├── frozen_params.json     # List of frozen parameter names
├── README.md              # This file
├── model.py               # Model architecture (includes TinyFluxDeepConfig)
├── inference_colab.py     # Inference script
├── train_deep_colab.py    # Training script with layer freezing
├── port_to_deep.py        # Porting script from TinyFlux
├── checkpoints/           # Training checkpoints
│   └── step_*.safetensors
├── logs/                  # Tensorboard logs
└── samples/               # Generated samples during training
```

## Porting from TinyFlux

To create a new TinyFlux-Deep from scratch:

```python
# Run port_to_deep.py
# 1. Downloads AbstractPhil/tiny-flux weights
# 2. Creates TinyFlux-Deep model (512 hidden, 4 heads, 25 single, 15 double)
# 3. Expands attention heads (2→4) and hidden dimension (256→512)
# 4. Distributes layers to anchor positions
# 5. Saves to AbstractPhil/tiny-flux-deep
```

## Comparison with TinyFlux

| Aspect | TinyFlux | TinyFlux-Deep |
|--------|----------|---------------|
| Parameters | ~8M | ~85M |
| Memory (bf16) | ~16MB | ~170MB |
| Forward pass | ~15ms | ~60ms |
| Capacity | Limited | Moderate |
| Training | From scratch | Ported + fine-tuned |

## Limitations

- **Resolution**: Trained on 512×512 only
- **Quality**: Better than TinyFlux, still below full Flux
- **Text understanding**: Limited by smaller T5 encoder (768 vs 4096 dim)
- **Early training**: Model is actively being trained
- **Experimental**: Intended for research, not production

## Intended Use

- Studying model scaling and expansion techniques
- Testing layer freezing and knowledge transfer
- Rapid prototyping with moderate capacity
- Educational purposes
- Baseline for architecture experiments

## Citation

```bibtex
@misc{tinyfluxdeep2026,
  title={TinyFlux-Deep: Expanded Flux Architecture with Knowledge Preservation},
  author={AbstractPhil},
  year={2026},
  url={https://huggingface.co/AbstractPhil/tiny-flux-deep}
}
```

## Related Models

- [AbstractPhil/tiny-flux](https://huggingface.co/AbstractPhil/tiny-flux) - Base model (8M params)
- [black-forest-labs/FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) - Original Flux

## Acknowledgments

- [Black Forest Labs](https://blackforestlabs.ai/) for the original Flux architecture
- [Hugging Face](https://huggingface.co/) for diffusers and transformers libraries

## License

MIT License - See LICENSE file for details.

---

**Note**: This is an experimental research model under active development. Training is ongoing and weights may be updated frequently.