Update README.md

82ba681 verified about 2 months ago

8.94 kB

	---
	license: mit
	language:
	- en
	tags:
	- diffusion
	- flow-matching
	- flux
	- text-to-image
	- image-generation
	- deep
	- experimental
	library_name: pytorch
	pipeline_tag: text-to-image
	base_model:
	- AbstractPhil/tiny-flux
	- black-forest-labs/FLUX.1-schnell
	datasets:
	- AbstractPhil/flux-schnell-teacher-latents
	---

	# TinyFlux-Deep

	An expanded TinyFlux architecture that increases depth and width while preserving learned representations. TinyFlux-Deep is ported from [TinyFlux](https://huggingface.co/AbstractPhil/tiny-flux) with strategic layer expansion and attention head doubling.

	## Model Description

	TinyFlux-Deep extends the base TinyFlux model by:
	- Doubling attention heads (2 → 4) with expanded hidden dimension (256 → 512)
	- 5× more double-stream layers (3 → 15)
	- 8× more single-stream layers (3 → 25)
	- Preserving learned weights from TinyFlux in frozen anchor positions

	### Architecture Comparison

	\| Component \| TinyFlux \| TinyFlux-Deep \| Flux \|
	\|-----------\|----------\|---------------\|------\|
	\| Hidden size \| 256 \| 512 \| 3072 \|
	\| Attention heads \| 2 \| 4 \| 24 \|
	\| Head dimension \| 128 \| 128 \| 128 \|
	\| Double-stream layers \| 3 \| 15 \| 19 \|
	\| Single-stream layers \| 3 \| 25 \| 38 \|
	\| VAE channels \| 16 \| 16 \| 16 \|
	\| Total params \| ~8M \| ~85M \| ~12B \|

	### Layer Mapping (Ported from TinyFlux)

	The original TinyFlux weights are strategically distributed and frozen:

	Single blocks (3 → 25):
	\| TinyFlux Layer \| TinyFlux-Deep Position \| Status \|
	\|----------------\|------------------------\|--------\|
	\| 0 \| 0 \| Frozen \|
	\| 1 \| 8, 12, 16 \| Frozen (3 copies) \|
	\| 2 \| 24 \| Frozen \|
	\| — \| 1-7, 9-11, 13-15, 17-23 \| Trainable \|

	Double blocks (3 → 15):
	\| TinyFlux Layer \| TinyFlux-Deep Position \| Status \|
	\|----------------\|------------------------\|--------\|
	\| 0 \| 0 \| Frozen \|
	\| 1 \| 4, 7, 10 \| Frozen (3 copies) \|
	\| 2 \| 14 \| Frozen \|
	\| — \| 1-3, 5-6, 8-9, 11-13 \| Trainable \|

	Trainable ratio: ~70% of parameters

	### Attention Head Expansion

	Original 2 heads are copied to new positions, with 2 new heads randomly initialized:
	- Old head 0 → New head 0
	- Old head 1 → New head 1
	- Heads 2-3 → Xavier initialized (scaled 0.02×)

	### Text Encoders

	Same as TinyFlux:
	\| Role \| Model \|
	\|------\|-------\|
	\| Sequence encoder \| flan-t5-base (768 dim) \|
	\| Pooled encoder \| CLIP-L (768 dim) \|

	## Training

	### Strategy

	1. Port TinyFlux weights with dimension expansion
	2. Freeze ported layers as "anchor" knowledge
	3. Train new layers to interpolate between anchors
	4. Optional: Unfreeze all and fine-tune at lower LR

	### Dataset

	Trained on [AbstractPhil/flux-schnell-teacher-latents](https://huggingface.co/datasets/AbstractPhil/flux-schnell-teacher-latents):
	- 10,000 samples
	- Pre-computed VAE latents (16, 64, 64) from 512×512 images
	- Diverse prompts covering people, objects, scenes, styles

	### Training Details

	- Objective: Flow matching (rectified flow)
	- Timestep sampling: Logit-normal with Flux shift (s=3.0)
	- Loss weighting: Min-SNR-γ (γ=5.0)
	- Optimizer: AdamW (lr=5e-5, β=(0.9, 0.99), wd=0.01)
	- Schedule: Cosine with warmup
	- Precision: bfloat16
	- Batch size: 32 (16 × 2 gradient accumulation)

	## Usage

	### Installation

	```bash
	pip install torch transformers diffusers safetensors huggingface_hub
	```

	### Inference

	```python
	import torch
	from huggingface_hub import hf_hub_download
	from safetensors.torch import load_file
	from transformers import T5EncoderModel, T5Tokenizer, CLIPTextModel, CLIPTokenizer
	from diffusers import AutoencoderKL

	# Load model (copy TinyFlux class definition first, use TinyFluxDeepConfig)
	config = TinyFluxDeepConfig()
	model = TinyFlux(config).to("cuda").to(torch.bfloat16)

	weights = load_file(hf_hub_download("AbstractPhil/tiny-flux-deep", "model.safetensors"))
	model.load_state_dict(weights, strict=False) # strict=False for precomputed buffers
	model.eval()

	# Load encoders
	t5_tok = T5Tokenizer.from_pretrained("google/flan-t5-base")
	t5_enc = T5EncoderModel.from_pretrained("google/flan-t5-base", torch_dtype=torch.bfloat16).to("cuda")
	clip_tok = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
	clip_enc = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14", torch_dtype=torch.bfloat16).to("cuda")
	vae = AutoencoderKL.from_pretrained("black-forest-labs/FLUX.1-schnell", subfolder="vae", torch_dtype=torch.bfloat16).to("cuda")

	# Encode prompt
	prompt = "a photo of a cat sitting on a windowsill"
	t5_in = t5_tok(prompt, max_length=128, padding="max_length", truncation=True, return_tensors="pt").to("cuda")
	t5_out = t5_enc(**t5_in).last_hidden_state
	clip_in = clip_tok(prompt, max_length=77, padding="max_length", truncation=True, return_tensors="pt").to("cuda")
	clip_out = clip_enc(**clip_in).pooler_output

	# Euler sampling with Flux shift
	def flux_shift(t, s=3.0):
	return s * t / (1 + (s - 1) * t)

	x = torch.randn(1, 64*64, 16, device="cuda", dtype=torch.bfloat16)
	img_ids = TinyFlux.create_img_ids(1, 64, 64, "cuda")

	t_linear = torch.linspace(0, 1, 21, device="cuda")
	timesteps = flux_shift(t_linear)

	for i in range(20):
	t = timesteps[i].unsqueeze(0)
	dt = timesteps[i+1] - timesteps[i]
	guidance = torch.tensor([3.5], device="cuda", dtype=torch.bfloat16)

	v = model(
	hidden_states=x,
	encoder_hidden_states=t5_out,
	pooled_projections=clip_out,
	timestep=t,
	img_ids=img_ids,
	guidance=guidance,
	)
	x = x + v * dt

	# Decode
	latents = x.reshape(1, 64, 64, 16).permute(0, 3, 1, 2)
	latents = latents / vae.config.scaling_factor
	image = vae.decode(latents.float()).sample
	image = (image / 2 + 0.5).clamp(0, 1)
	```

	### Configuration

	```python
	@dataclass
	class TinyFluxDeepConfig:
	hidden_size: int = 512
	num_attention_heads: int = 4
	attention_head_dim: int = 128
	in_channels: int = 16
	joint_attention_dim: int = 768
	pooled_projection_dim: int = 768
	num_double_layers: int = 15
	num_single_layers: int = 25
	mlp_ratio: float = 4.0
	axes_dims_rope: Tuple[int, int, int] = (16, 56, 56)
	guidance_embeds: bool = True
	```

	## Files

	```
	AbstractPhil/tiny-flux-deep/
	├── model.safetensors # Model weights (~340MB)
	├── config.json # Model configuration
	├── frozen_params.json # List of frozen parameter names
	├── README.md # This file
	├── model.py # Model architecture (includes TinyFluxDeepConfig)
	├── inference_colab.py # Inference script
	├── train_deep_colab.py # Training script with layer freezing
	├── port_to_deep.py # Porting script from TinyFlux
	├── checkpoints/ # Training checkpoints
	│ └── step_*.safetensors
	├── logs/ # Tensorboard logs
	└── samples/ # Generated samples during training
	```

	## Porting from TinyFlux

	To create a new TinyFlux-Deep from scratch:

	```python
	# Run port_to_deep.py
	# 1. Downloads AbstractPhil/tiny-flux weights
	# 2. Creates TinyFlux-Deep model (512 hidden, 4 heads, 25 single, 15 double)
	# 3. Expands attention heads (2→4) and hidden dimension (256→512)
	# 4. Distributes layers to anchor positions
	# 5. Saves to AbstractPhil/tiny-flux-deep
	```

	## Comparison with TinyFlux

	\| Aspect \| TinyFlux \| TinyFlux-Deep \|
	\|--------\|----------\|---------------\|
	\| Parameters \| ~8M \| ~85M \|
	\| Memory (bf16) \| ~16MB \| ~170MB \|
	\| Forward pass \| ~15ms \| ~60ms \|
	\| Capacity \| Limited \| Moderate \|
	\| Training \| From scratch \| Ported + fine-tuned \|

	## Limitations

	- Resolution: Trained on 512×512 only
	- Quality: Better than TinyFlux, still below full Flux
	- Text understanding: Limited by smaller T5 encoder (768 vs 4096 dim)
	- Early training: Model is actively being trained
	- Experimental: Intended for research, not production

	## Intended Use

	- Studying model scaling and expansion techniques
	- Testing layer freezing and knowledge transfer
	- Rapid prototyping with moderate capacity
	- Educational purposes
	- Baseline for architecture experiments

	## Citation

	```bibtex
	@misc{tinyfluxdeep2026,
	title={TinyFlux-Deep: Expanded Flux Architecture with Knowledge Preservation},
	author={AbstractPhil},
	year={2026},
	url={https://huggingface.co/AbstractPhil/tiny-flux-deep}
	}
	```

	## Related Models

	- [AbstractPhil/tiny-flux](https://huggingface.co/AbstractPhil/tiny-flux) - Base model (8M params)
	- [black-forest-labs/FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) - Original Flux

	## Acknowledgments

	- [Black Forest Labs](https://blackforestlabs.ai/) for the original Flux architecture
	- [Hugging Face](https://huggingface.co/) for diffusers and transformers libraries

	## License

	MIT License - See LICENSE file for details.

	---

	Note: This is an experimental research model under active development. Training is ongoing and weights may be updated frequently.