Update README.md

ceb8f46 verified about 1 month ago

7.89 kB

	---
	license: mit
	language:
	- en
	tags:
	- diffusion
	- flow-matching
	- flux
	- text-to-image
	- image-generation
	- tinyflux
	- lailah
	- experimental
	library_name: pytorch
	pipeline_tag: text-to-image
	base_model:
	- AbstractPhil/tiny-flux
	- black-forest-labs/FLUX.1-schnell
	datasets:
	- AbstractPhil/flux-schnell-teacher-latents
	- AbstractPhil/imagenet-synthetic
	---

	# TinyFlux-Deep (Lailah)

	TinyFlux-Lailah is an expanded TinyFlux architecture with increased depth and width. Originally ported from [TinyFlux](https://huggingface.co/AbstractPhil/tiny-flux) with strategic layer expansion and attention head doubling, now training end-to-end on teacher latents.


	## Quick Start (Colab)

	The easiest way to test Lailah:

	1. Open [Google Colab](https://colab.research.google.com/)
	2. Copy the contents of [`inference_v3.py`](./inference_v3.py) and [`model_v3.py`](./model_v3.py)
	3. Run the cells

	```python
	# Or fetch directly:
	!wget https://huggingface.co/AbstractPhil/tiny-flux-deep/raw/main/inference_v3.py
	%run inference_v3.py
	```

	## Fair Weights

	### ImageNet Synthetic step_346875
	* Handles multiple animal combination variants with high fidelity
	https://huggingface.co/datasets/AbstractPhil/imagenet-synthetic

	"subject, animal, cat, photograph of a tiger, natural habitat"

	![image](https://cdn-uploads.huggingface.co/production/uploads/630cf55b15433862cfc9556f/uJ9Ffh780iLgEIJhmafod.png)

	"subject, bird, blue beak, red eyes, green claws"
	![image](https://cdn-uploads.huggingface.co/production/uploads/630cf55b15433862cfc9556f/GRS5tyaFFa0HV2xSJCsin.png)

	"subject, bird, red haired bird in a tree"
	![image](https://cdn-uploads.huggingface.co/production/uploads/630cf55b15433862cfc9556f/rGourHokJsPtYNnoFi3Eq.png)


	![image](https://cdn-uploads.huggingface.co/production/uploads/630cf55b15433862cfc9556f/O_z6DLc32HDNBq3ZwEjqf.png)

	## Architecture

	\| Component \| TinyFlux \| TinyFlux-Lailah \| Flux \|
	\|-----------\|----------\|-----------------\|------\|
	\| Hidden size \| 256 \| 512 \| 3072 \|
	\| Attention heads \| 2 \| 4 \| 24 \|
	\| Head dimension \| 128 \| 128 \| 128 \|
	\| Double-stream layers \| 3 \| 15 \| 19 \|
	\| Single-stream layers \| 3 \| 25 \| 38 \|
	\| VAE channels \| 16 \| 16 \| 16 \|
	\| Total params \| ~10.7M \| ~241.8M \| ~12B \|

	### Text Encoders

	\| Role \| Model \| Dimension \|
	\|------\|-------\|-----------\|
	\| Sequence encoder \| flan-t5-base \| 768 \|
	\| Pooled encoder \| CLIP-L \| 768 \|

	## Training

	### Current Approach

	All parameters are trainable. The model was initially ported from TinyFlux with frozen anchor layers, but current training runs with everything unfrozen for maximum flexibility.

	### Dataset

	Training on [AbstractPhil/flux-schnell-teacher-latents](https://huggingface.co/datasets/AbstractPhil/flux-schnell-teacher-latents):
	- Pre-computed VAE latents from Flux-Schnell generations
	- 512×512 resolution (64×64 latent space)
	- Diverse prompts covering people, objects, scenes, styles

	### Training Details

	- Objective: Flow matching (rectified flow)
	- Timestep sampling: Logit-normal with Flux shift (s=3.0)
	- Loss weighting: Min-SNR-γ (γ=5.0)
	- Optimizer: AdamW (lr=3e-4, β=(0.9, 0.99), wd=0.01)
	- Schedule: Cosine with warmup
	- Precision: bfloat16
	- Batch size: 32 (16 × 2 gradient accumulation)
	- EMA decay: 0.9999

	### Checkpoints

	Checkpoints are saved every epoch or so with both main and EMA weights:
	- `checkpoints/step_XXXXX.safetensors` - Training weights
	- `checkpoints/step_XXXXX_ema.safetensors` - EMA weights (currently very broken and retraining, use standard step to inference)

	## Usage

	### Dependencies

	```bash
	pip install torch transformers diffusers safetensors huggingface_hub
	```

	### Basic Inference

	```python
	import torch
	from huggingface_hub import hf_hub_download
	from safetensors.torch import load_file

	# Load model (requires TinyFluxDeep class from tinyflux_deep.py)
	config = TinyFluxDeepConfig()
	model = TinyFluxDeep(config).to("cuda", torch.bfloat16)

	# Load EMA weights (broken) or main weights
	weights = load_file(hf_hub_download(
	"AbstractPhil/tiny-flux-deep",
	"checkpoints/step_286250_ema.safetensors" # EMA will be better later, for now it's broken.
	))
	model.load_state_dict(weights, strict=False)
	model.eval()
	```

	### Sampling

	Lailah uses Euler discrete sampling with Flux timestep shift:

	```python
	def flux_shift(t, s=3.0):
	"""Bias timesteps toward data (higher t)."""
	return s * t / (1 + (s - 1) * t)

	# 20-50 steps recommended
	timesteps = flux_shift(torch.linspace(0, 1, num_steps + 1))

	for i in range(num_steps):
	t_curr, t_next = timesteps[i], timesteps[i + 1]
	dt = t_next - t_curr

	v = model(hidden_states=x, encoder_hidden_states=t5_out, ...)
	x = x + v * dt # Euler step
	```

	### Configuration

	```python
	@dataclass
	class TinyFluxDeepConfig:
	hidden_size: int = 512
	num_attention_heads: int = 4
	attention_head_dim: int = 128
	in_channels: int = 16
	joint_attention_dim: int = 768
	pooled_projection_dim: int = 768
	num_double_layers: int = 15
	num_single_layers: int = 25
	mlp_ratio: float = 4.0
	axes_dims_rope: Tuple[int, int, int] = (16, 56, 56)
	guidance_embeds: bool = True
	```

	## Files

	```
	AbstractPhil/tiny-flux-deep/
	├── model.safetensors # Latest best weights
	├── tinyflux_deep.py # Model architecture
	├── colab_inference_lailah_early.py # Ready-to-run Colab inference
	├── inference_tinyflux_deep.py # Standalone inference script
	├── train_tinyflux_deep.py # Training script
	├── checkpoints/
	│ ├── step_286250.safetensors # Training weights
	│ └── step_286250_ema.safetensors # EMA weights (currently broken)
	├── samples/ # Generated samples during training
	└── README.md
	```

	## Origin: Porting from TinyFlux

	Lailah was initialized by porting TinyFlux weights:

	1. Attention head expansion (2 → 4): Original heads copied to positions 0-1, new heads 2-3 Xavier initialized
	2. Hidden dimension expansion (256 → 512): Weights tiled and scaled
	3. Layer distribution: Original 3 layers distributed across 15/25 positions as initialization anchors

	The initial port used selective freezing of anchor layers, but current training leaves all parameters unfrozen.

	## Comparison

	\| Aspect \| TinyFlux \| Lailah \| Full Flux \|
	\|--------\|----------\|--------\|-----------\|
	\| Parameters \| 10.7M \| 241.8M \| 12B \|
	\| Memory (bf16) \| ~22MB \| ~484MB \| ~24GB \|
	\| Quality \| Limited \| Moderate \| High \|
	\| Speed (A100) \| ~10ms \| ~40ms \| ~200ms \|

	## Limitations

	- Resolution: 512×512 only (64×64 latent)
	- Early training: Quality improving but not production-ready
	- Text capacity: Limited by flan-t5-base (768 dim vs Flux's 4096)
	- Experimental: Research model, expect artifacts

	## Intended Use

	- Rapid prototyping and iteration
	- Studying flow matching at moderate scale
	- Architecture experiments
	- Educational purposes
	- Baseline comparisons

	## Name

	Lailah (לילה) - Angel of the night in Jewish tradition, said to guard souls. Chosen for this model's role as a smaller guardian exploring the same space as larger models.

	## Citation

	```bibtex
	@misc{tinyfluxlailah2026,
	title={TinyFlux-Lailah: Compact Flow Matching for Text-to-Image},
	author={AbstractPhil},
	year={2026},
	url={https://huggingface.co/AbstractPhil/tiny-flux-deep}
	}
	```

	## Related

	- [AbstractPhil/tiny-flux](https://huggingface.co/AbstractPhil/tiny-flux) - Base model (10.7M)
	- [AbstractPhil/flux-schnell-teacher-latents](https://huggingface.co/datasets/AbstractPhil/flux-schnell-teacher-latents) - Training data
	- [black-forest-labs/FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) - Teacher model

	## License

	MIT License

	---

	Status: Active training. Checkpoints updated regularly. Use standard weights for best results.