AbstractPhil commited on
Commit
24bf751
Β·
verified Β·
1 Parent(s): 14c3b94

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +211 -3
README.md CHANGED
@@ -1,3 +1,211 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ tags:
6
+ - diffusion
7
+ - flow-matching
8
+ - flux
9
+ - text-to-image
10
+ - image-generation
11
+ - tiny
12
+ - experimental
13
+ library_name: pytorch
14
+ pipeline_tag: text-to-image
15
+ base_model:
16
+ - black-forest-labs/FLUX.1-schnell
17
+ datasets:
18
+ - AbstractPhil/flux-schnell-teacher-latents
19
+ ---
20
+
21
+ # TinyFlux
22
+
23
+ A **/12 scaled** Flux architecture for experimentation and research. TinyFlux maintains the core MMDiT (Multimodal Diffusion Transformer) design of Flux while dramatically reducing parameter count for faster iteration and lower resource requirements.
24
+
25
+ ## Model Description
26
+
27
+ TinyFlux is a miniaturized version of [FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) that preserves the essential architectural components:
28
+
29
+ - **Double-stream blocks** (MMDiT style) - separate text/image pathways with joint attention
30
+ - **Single-stream blocks** - concatenated text+image with shared weights
31
+ - **AdaLN-Zero modulation** - adaptive layer norm with gating
32
+ - **3D RoPE** - rotary position embeddings for temporal + spatial positions
33
+ - **Flow matching** - rectified flow training objective
34
+
35
+ ### Architecture Comparison
36
+
37
+ | Component | Flux | TinyFlux | Scale |
38
+ |-----------|------|----------|-------|
39
+ | Hidden size | 3072 | 256 | /12 |
40
+ | Attention heads | 24 | 2 | /12 |
41
+ | Head dimension | 128 | 128 | preserved |
42
+ | Double-stream layers | 19 | 3 | /6 |
43
+ | Single-stream layers | 38 | 3 | /12 |
44
+ | VAE channels | 16 | 16 | preserved |
45
+ | **Total params** | ~12B | ~8M | /1500 |
46
+
47
+ ### Text Encoders
48
+
49
+ TinyFlux uses smaller text encoders than standard Flux:
50
+
51
+ | Role | Flux | TinyFlux |
52
+ |------|------|----------|
53
+ | Sequence encoder | T5-XXL (4096 dim) | flan-t5-base (768 dim) |
54
+ | Pooled encoder | CLIP-L (768 dim) | CLIP-L (768 dim) |
55
+
56
+ ## Training
57
+
58
+ ### Dataset
59
+
60
+ Trained on [AbstractPhil/flux-schnell-teacher-latents](https://huggingface.co/datasets/AbstractPhil/flux-schnell-teacher-latents):
61
+ - 10,000 samples
62
+ - Pre-computed VAE latents (16, 64, 64) from 512Γ—512 images
63
+ - Diverse prompts covering people, objects, scenes, styles
64
+
65
+ ### Training Details
66
+
67
+ - **Objective**: Flow matching (rectified flow)
68
+ - **Timestep sampling**: Logit-normal with Flux shift (s=3.0)
69
+ - **Loss weighting**: Min-SNR-Ξ³ (Ξ³=5.0)
70
+ - **Optimizer**: AdamW (lr=1e-4, Ξ²=(0.9, 0.99), wd=0.01)
71
+ - **Schedule**: Cosine with warmup
72
+ - **Precision**: bfloat16
73
+
74
+ ### Flow Matching Formulation
75
+
76
+ ```
77
+ Interpolation: x_t = (1 - t) * noise + t * data
78
+ Target velocity: v = data - noise
79
+ Loss: MSE(predicted_v, target_v) * min_snr_weight(t)
80
+ ```
81
+
82
+ ## Usage
83
+
84
+ ### Installation
85
+
86
+ ```bash
87
+ pip install torch transformers diffusers safetensors huggingface_hub
88
+ ```
89
+
90
+ ### Inference
91
+
92
+ ```python
93
+ import torch
94
+ from huggingface_hub import hf_hub_download
95
+ from safetensors.torch import load_file
96
+ from transformers import T5EncoderModel, T5Tokenizer, CLIPTextModel, CLIPTokenizer
97
+ from diffusers import AutoencoderKL
98
+
99
+ # Load model (copy TinyFlux class definition first)
100
+ config = TinyFluxConfig()
101
+ model = TinyFlux(config).to("cuda").to(torch.bfloat16)
102
+
103
+ weights = load_file(hf_hub_download("AbstractPhil/tiny-flux", "model.safetensors"))
104
+ model.load_state_dict(weights)
105
+ model.eval()
106
+
107
+ # Load encoders
108
+ t5_tok = T5Tokenizer.from_pretrained("google/flan-t5-base")
109
+ t5_enc = T5EncoderModel.from_pretrained("google/flan-t5-base", torch_dtype=torch.bfloat16).to("cuda")
110
+ clip_tok = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
111
+ clip_enc = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14", torch_dtype=torch.bfloat16).to("cuda")
112
+ vae = AutoencoderKL.from_pretrained("black-forest-labs/FLUX.1-schnell", subfolder="vae", torch_dtype=torch.bfloat16).to("cuda")
113
+
114
+ # Encode prompt
115
+ prompt = "a photo of a cat"
116
+ t5_in = t5_tok(prompt, max_length=128, padding="max_length", truncation=True, return_tensors="pt").to("cuda")
117
+ t5_out = t5_enc(**t5_in).last_hidden_state
118
+ clip_in = clip_tok(prompt, max_length=77, padding="max_length", truncation=True, return_tensors="pt").to("cuda")
119
+ clip_out = clip_enc(**clip_in).pooler_output
120
+
121
+ # Euler sampling (t: 0→1, noise→data)
122
+ x = torch.randn(1, 64*64, 16, device="cuda", dtype=torch.bfloat16)
123
+ img_ids = TinyFlux.create_img_ids(1, 64, 64, "cuda")
124
+ timesteps = torch.linspace(0, 1, 21, device="cuda")
125
+
126
+ for i in range(20):
127
+ t = timesteps[i].unsqueeze(0)
128
+ dt = timesteps[i+1] - timesteps[i]
129
+ guidance = torch.tensor([3.5], device="cuda", dtype=torch.bfloat16)
130
+
131
+ v = model(
132
+ hidden_states=x,
133
+ encoder_hidden_states=t5_out,
134
+ pooled_projections=clip_out,
135
+ timestep=t,
136
+ img_ids=img_ids,
137
+ guidance=guidance,
138
+ )
139
+ x = x + v * dt
140
+
141
+ # Decode
142
+ latents = x.reshape(1, 64, 64, 16).permute(0, 3, 1, 2)
143
+ latents = latents / vae.config.scaling_factor
144
+ image = vae.decode(latents.float()).sample
145
+ image = (image / 2 + 0.5).clamp(0, 1)
146
+ ```
147
+
148
+ ### Full Inference Script
149
+
150
+ See the [inference_colab.py](https://huggingface.co/AbstractPhil/tiny-flux/blob/main/inference_colab.py) for a complete generation pipeline with:
151
+ - Classifier-free guidance
152
+ - Batch generation
153
+ - Image saving
154
+
155
+ ## Files
156
+
157
+ ```
158
+ AbstractPhil/tiny-flux/
159
+ β”œβ”€β”€ model.safetensors # Model weights (~32MB)
160
+ β”œβ”€β”€ config.json # Model configuration
161
+ β”œβ”€β”€ README.md # This file
162
+ β”œβ”€β”€ model.py # Model architecture definition
163
+ β”œβ”€β”€ inference_colab.py # Inference script
164
+ β”œβ”€β”€ train_colab.py # Training script
165
+ β”œβ”€β”€ checkpoints/ # Training checkpoints
166
+ β”‚ └── step_*.safetensors
167
+ β”œβ”€β”€ logs/ # Tensorboard logs
168
+ └── samples/ # Generated samples during training
169
+ ```
170
+
171
+ ## Limitations
172
+
173
+ - **Resolution**: Trained on 512Γ—512 only
174
+ - **Quality**: Significantly lower than full Flux due to reduced capacity
175
+ - **Text understanding**: Limited by smaller T5 encoder (768 vs 4096 dim)
176
+ - **Fine details**: May struggle with complex scenes or fine-grained details
177
+ - **Experimental**: Intended for research and learning, not production use
178
+
179
+ ## Intended Use
180
+
181
+ - Understanding Flux/MMDiT architecture
182
+ - Rapid prototyping and experimentation
183
+ - Educational purposes
184
+ - Resource-constrained environments
185
+ - Baseline for architecture modifications
186
+
187
+ ## Citation
188
+
189
+ If you use TinyFlux in your research, please cite:
190
+
191
+ ```bibtex
192
+ @misc{tinyflux2025,
193
+ title={TinyFlux: A Miniaturized Flux Architecture for Experimentation},
194
+ author={AbstractPhil},
195
+ year={2025},
196
+ url={https://huggingface.co/AbstractPhil/tiny-flux}
197
+ }
198
+ ```
199
+
200
+ ## Acknowledgments
201
+
202
+ - [Black Forest Labs](https://blackforestlabs.ai/) for the original Flux architecture
203
+ - [Hugging Face](https://huggingface.co/) for diffusers and transformers libraries
204
+
205
+ ## License
206
+
207
+ MIT License - See LICENSE file for details.
208
+
209
+ ---
210
+
211
+ **Note**: This is an experimental research model. For high-quality image generation, use the full [FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) or [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) models.