AbstractPhil commited on
Commit
f6ce539
Β·
verified Β·
1 Parent(s): beb7553

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +64 -135
README.md CHANGED
@@ -8,86 +8,52 @@ tags:
8
  - flux
9
  - text-to-image
10
  - image-generation
11
- - deep
12
  - experimental
13
  library_name: pytorch
14
  pipeline_tag: text-to-image
15
  base_model:
16
- - AbstractPhil/tiny-flux
17
  - black-forest-labs/FLUX.1-schnell
18
  datasets:
19
  - AbstractPhil/flux-schnell-teacher-latents
20
  ---
21
 
22
- # TinyFlux-Deep
23
 
24
- An **expanded** TinyFlux architecture that increases depth and width while preserving learned representations. TinyFlux-Deep is ported from [TinyFlux](https://huggingface.co/AbstractPhil/tiny-flux) with strategic layer expansion and attention head doubling.
25
 
26
  ## Model Description
27
 
28
- TinyFlux-Deep extends the base TinyFlux model by:
29
- - **Doubling attention heads** (2 β†’ 4) with expanded hidden dimension (256 β†’ 512)
30
- - **5Γ— more double-stream layers** (3 β†’ 15)
31
- - **8Γ— more single-stream layers** (3 β†’ 25)
32
- - **Preserving learned weights** from TinyFlux in frozen anchor positions
 
 
33
 
34
  ### Architecture Comparison
35
 
36
- | Component | TinyFlux | TinyFlux-Deep | Flux |
37
- |-----------|----------|---------------|------|
38
- | Hidden size | 256 | **512** | 3072 |
39
- | Attention heads | 2 | **4** | 24 |
40
- | Head dimension | 128 | 128 | 128 |
41
- | Double-stream layers | 3 | **15** | 19 |
42
- | Single-stream layers | 3 | **25** | 38 |
43
- | VAE channels | 16 | 16 | 16 |
44
- | **Total params** | ~8M | **~85M** | ~12B |
45
-
46
- ### Layer Mapping (Ported from TinyFlux)
47
-
48
- The original TinyFlux weights are strategically distributed and frozen:
49
-
50
- **Single blocks (3 β†’ 25):**
51
- | TinyFlux Layer | TinyFlux-Deep Position | Status |
52
- |----------------|------------------------|--------|
53
- | 0 | 0 | Frozen |
54
- | 1 | 8, 12, 16 | Frozen (3 copies) |
55
- | 2 | 24 | Frozen |
56
- | β€” | 1-7, 9-11, 13-15, 17-23 | Trainable |
57
-
58
- **Double blocks (3 β†’ 15):**
59
- | TinyFlux Layer | TinyFlux-Deep Position | Status |
60
- |----------------|------------------------|--------|
61
- | 0 | 0 | Frozen |
62
- | 1 | 4, 7, 10 | Frozen (3 copies) |
63
- | 2 | 14 | Frozen |
64
- | β€” | 1-3, 5-6, 8-9, 11-13 | Trainable |
65
-
66
- **Trainable ratio:** ~70% of parameters
67
-
68
- ### Attention Head Expansion
69
-
70
- Original 2 heads are copied to new positions, with 2 new heads randomly initialized:
71
- - Old head 0 β†’ New head 0
72
- - Old head 1 β†’ New head 1
73
- - Heads 2-3 β†’ Xavier initialized (scaled 0.02Γ—)
74
 
75
  ### Text Encoders
76
 
77
- Same as TinyFlux:
78
- | Role | Model |
79
- |------|-------|
80
- | Sequence encoder | flan-t5-base (768 dim) |
81
- | Pooled encoder | CLIP-L (768 dim) |
82
-
83
- ## Training
84
 
85
- ### Strategy
 
 
 
86
 
87
- 1. **Port** TinyFlux weights with dimension expansion
88
- 2. **Freeze** ported layers as "anchor" knowledge
89
- 3. **Train** new layers to interpolate between anchors
90
- 4. **Optional:** Unfreeze all and fine-tune at lower LR
91
 
92
  ### Dataset
93
 
@@ -101,10 +67,17 @@ Trained on [AbstractPhil/flux-schnell-teacher-latents](https://huggingface.co/da
101
  - **Objective**: Flow matching (rectified flow)
102
  - **Timestep sampling**: Logit-normal with Flux shift (s=3.0)
103
  - **Loss weighting**: Min-SNR-Ξ³ (Ξ³=5.0)
104
- - **Optimizer**: AdamW (lr=5e-5, Ξ²=(0.9, 0.99), wd=0.01)
105
  - **Schedule**: Cosine with warmup
106
  - **Precision**: bfloat16
107
- - **Batch size**: 32 (16 Γ— 2 gradient accumulation)
 
 
 
 
 
 
 
108
 
109
  ## Usage
110
 
@@ -123,12 +96,12 @@ from safetensors.torch import load_file
123
  from transformers import T5EncoderModel, T5Tokenizer, CLIPTextModel, CLIPTokenizer
124
  from diffusers import AutoencoderKL
125
 
126
- # Load model (copy TinyFlux class definition first, use TinyFluxDeepConfig)
127
- config = TinyFluxDeepConfig()
128
  model = TinyFlux(config).to("cuda").to(torch.bfloat16)
129
 
130
- weights = load_file(hf_hub_download("AbstractPhil/tiny-flux-deep", "model.safetensors"))
131
- model.load_state_dict(weights, strict=False) # strict=False for precomputed buffers
132
  model.eval()
133
 
134
  # Load encoders
@@ -139,21 +112,16 @@ clip_enc = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14", torch_
139
  vae = AutoencoderKL.from_pretrained("black-forest-labs/FLUX.1-schnell", subfolder="vae", torch_dtype=torch.bfloat16).to("cuda")
140
 
141
  # Encode prompt
142
- prompt = "a photo of a cat sitting on a windowsill"
143
  t5_in = t5_tok(prompt, max_length=128, padding="max_length", truncation=True, return_tensors="pt").to("cuda")
144
  t5_out = t5_enc(**t5_in).last_hidden_state
145
  clip_in = clip_tok(prompt, max_length=77, padding="max_length", truncation=True, return_tensors="pt").to("cuda")
146
  clip_out = clip_enc(**clip_in).pooler_output
147
 
148
- # Euler sampling with Flux shift
149
- def flux_shift(t, s=3.0):
150
- return s * t / (1 + (s - 1) * t)
151
-
152
  x = torch.randn(1, 64*64, 16, device="cuda", dtype=torch.bfloat16)
153
  img_ids = TinyFlux.create_img_ids(1, 64, 64, "cuda")
154
-
155
- t_linear = torch.linspace(0, 1, 21, device="cuda")
156
- timesteps = flux_shift(t_linear)
157
 
158
  for i in range(20):
159
  t = timesteps[i].unsqueeze(0)
@@ -177,97 +145,58 @@ image = vae.decode(latents.float()).sample
177
  image = (image / 2 + 0.5).clamp(0, 1)
178
  ```
179
 
180
- ### Configuration
181
 
182
- ```python
183
- @dataclass
184
- class TinyFluxDeepConfig:
185
- hidden_size: int = 512
186
- num_attention_heads: int = 4
187
- attention_head_dim: int = 128
188
- in_channels: int = 16
189
- joint_attention_dim: int = 768
190
- pooled_projection_dim: int = 768
191
- num_double_layers: int = 15
192
- num_single_layers: int = 25
193
- mlp_ratio: float = 4.0
194
- axes_dims_rope: Tuple[int, int, int] = (16, 56, 56)
195
- guidance_embeds: bool = True
196
- ```
197
 
198
  ## Files
199
 
200
  ```
201
- AbstractPhil/tiny-flux-deep/
202
- β”œβ”€β”€ model.safetensors # Model weights (~340MB)
203
  β”œβ”€β”€ config.json # Model configuration
204
- β”œβ”€β”€ frozen_params.json # List of frozen parameter names
205
  β”œβ”€β”€ README.md # This file
206
- β”œβ”€β”€ model.py # Model architecture (includes TinyFluxDeepConfig)
207
  β”œβ”€β”€ inference_colab.py # Inference script
208
- β”œβ”€β”€ train_deep_colab.py # Training script with layer freezing
209
- β”œβ”€β”€ port_to_deep.py # Porting script from TinyFlux
210
  β”œβ”€β”€ checkpoints/ # Training checkpoints
211
  β”‚ └── step_*.safetensors
212
  β”œβ”€β”€ logs/ # Tensorboard logs
213
  └── samples/ # Generated samples during training
214
  ```
215
 
216
- ## Porting from TinyFlux
217
-
218
- To create a new TinyFlux-Deep from scratch:
219
-
220
- ```python
221
- # Run port_to_deep.py
222
- # 1. Downloads AbstractPhil/tiny-flux weights
223
- # 2. Creates TinyFlux-Deep model (512 hidden, 4 heads, 25 single, 15 double)
224
- # 3. Expands attention heads (2β†’4) and hidden dimension (256β†’512)
225
- # 4. Distributes layers to anchor positions
226
- # 5. Saves to AbstractPhil/tiny-flux-deep
227
- ```
228
-
229
- ## Comparison with TinyFlux
230
-
231
- | Aspect | TinyFlux | TinyFlux-Deep |
232
- |--------|----------|---------------|
233
- | Parameters | ~8M | ~85M |
234
- | Memory (bf16) | ~16MB | ~170MB |
235
- | Forward pass | ~15ms | ~60ms |
236
- | Capacity | Limited | Moderate |
237
- | Training | From scratch | Ported + fine-tuned |
238
-
239
  ## Limitations
240
 
241
  - **Resolution**: Trained on 512Γ—512 only
242
- - **Quality**: Better than TinyFlux, still below full Flux
243
  - **Text understanding**: Limited by smaller T5 encoder (768 vs 4096 dim)
244
- - **Early training**: Model is actively being trained
245
- - **Experimental**: Intended for research, not production
246
 
247
  ## Intended Use
248
 
249
- - Studying model scaling and expansion techniques
250
- - Testing layer freezing and knowledge transfer
251
- - Rapid prototyping with moderate capacity
252
  - Educational purposes
253
- - Baseline for architecture experiments
 
254
 
255
  ## Citation
256
 
 
 
257
  ```bibtex
258
- @misc{tinyfluxdeep2026,
259
- title={TinyFlux-Deep: Expanded Flux Architecture with Knowledge Preservation},
260
  author={AbstractPhil},
261
- year={2026},
262
- url={https://huggingface.co/AbstractPhil/tiny-flux-deep}
263
  }
264
  ```
265
 
266
- ## Related Models
267
-
268
- - [AbstractPhil/tiny-flux](https://huggingface.co/AbstractPhil/tiny-flux) - Base model (8M params)
269
- - [black-forest-labs/FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) - Original Flux
270
-
271
  ## Acknowledgments
272
 
273
  - [Black Forest Labs](https://blackforestlabs.ai/) for the original Flux architecture
@@ -279,4 +208,4 @@ MIT License - See LICENSE file for details.
279
 
280
  ---
281
 
282
- **Note**: This is an experimental research model under active development. Training is ongoing and weights may be updated frequently.
 
8
  - flux
9
  - text-to-image
10
  - image-generation
11
+ - tiny
12
  - experimental
13
  library_name: pytorch
14
  pipeline_tag: text-to-image
15
  base_model:
 
16
  - black-forest-labs/FLUX.1-schnell
17
  datasets:
18
  - AbstractPhil/flux-schnell-teacher-latents
19
  ---
20
 
21
+ # TinyFlux
22
 
23
+ A **/12 scaled** Flux architecture for experimentation and research. TinyFlux maintains the core MMDiT (Multimodal Diffusion Transformer) design of Flux while dramatically reducing parameter count for faster iteration and lower resource requirements.
24
 
25
  ## Model Description
26
 
27
+ TinyFlux is a miniaturized version of [FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) that preserves the essential architectural components:
28
+
29
+ - **Double-stream blocks** (MMDiT style) - separate text/image pathways with joint attention
30
+ - **Single-stream blocks** - concatenated text+image with shared weights
31
+ - **AdaLN-Zero modulation** - adaptive layer norm with gating
32
+ - **3D RoPE** - rotary position embeddings for temporal + spatial positions
33
+ - **Flow matching** - rectified flow training objective
34
 
35
  ### Architecture Comparison
36
 
37
+ | Component | Flux | TinyFlux | Scale |
38
+ |-----------|------|----------|-------|
39
+ | Hidden size | 3072 | 256 | /12 |
40
+ | Attention heads | 24 | 2 | /12 |
41
+ | Head dimension | 128 | 128 | preserved |
42
+ | Double-stream layers | 19 | 3 | /6 |
43
+ | Single-stream layers | 38 | 3 | /12 |
44
+ | VAE channels | 16 | 16 | preserved |
45
+ | **Total params** | ~12B | ~8M | /1500 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
 
47
  ### Text Encoders
48
 
49
+ TinyFlux uses smaller text encoders than standard Flux:
 
 
 
 
 
 
50
 
51
+ | Role | Flux | TinyFlux |
52
+ |------|------|----------|
53
+ | Sequence encoder | T5-XXL (4096 dim) | flan-t5-base (768 dim) |
54
+ | Pooled encoder | CLIP-L (768 dim) | CLIP-L (768 dim) |
55
 
56
+ ## Training
 
 
 
57
 
58
  ### Dataset
59
 
 
67
  - **Objective**: Flow matching (rectified flow)
68
  - **Timestep sampling**: Logit-normal with Flux shift (s=3.0)
69
  - **Loss weighting**: Min-SNR-Ξ³ (Ξ³=5.0)
70
+ - **Optimizer**: AdamW (lr=1e-4, Ξ²=(0.9, 0.99), wd=0.01)
71
  - **Schedule**: Cosine with warmup
72
  - **Precision**: bfloat16
73
+
74
+ ### Flow Matching Formulation
75
+
76
+ ```
77
+ Interpolation: x_t = (1 - t) * noise + t * data
78
+ Target velocity: v = data - noise
79
+ Loss: MSE(predicted_v, target_v) * min_snr_weight(t)
80
+ ```
81
 
82
  ## Usage
83
 
 
96
  from transformers import T5EncoderModel, T5Tokenizer, CLIPTextModel, CLIPTokenizer
97
  from diffusers import AutoencoderKL
98
 
99
+ # Load model (copy TinyFlux class definition first)
100
+ config = TinyFluxConfig()
101
  model = TinyFlux(config).to("cuda").to(torch.bfloat16)
102
 
103
+ weights = load_file(hf_hub_download("AbstractPhil/tiny-flux", "model.safetensors"))
104
+ model.load_state_dict(weights)
105
  model.eval()
106
 
107
  # Load encoders
 
112
  vae = AutoencoderKL.from_pretrained("black-forest-labs/FLUX.1-schnell", subfolder="vae", torch_dtype=torch.bfloat16).to("cuda")
113
 
114
  # Encode prompt
115
+ prompt = "a photo of a cat"
116
  t5_in = t5_tok(prompt, max_length=128, padding="max_length", truncation=True, return_tensors="pt").to("cuda")
117
  t5_out = t5_enc(**t5_in).last_hidden_state
118
  clip_in = clip_tok(prompt, max_length=77, padding="max_length", truncation=True, return_tensors="pt").to("cuda")
119
  clip_out = clip_enc(**clip_in).pooler_output
120
 
121
+ # Euler sampling (t: 0→1, noise→data)
 
 
 
122
  x = torch.randn(1, 64*64, 16, device="cuda", dtype=torch.bfloat16)
123
  img_ids = TinyFlux.create_img_ids(1, 64, 64, "cuda")
124
+ timesteps = torch.linspace(0, 1, 21, device="cuda")
 
 
125
 
126
  for i in range(20):
127
  t = timesteps[i].unsqueeze(0)
 
145
  image = (image / 2 + 0.5).clamp(0, 1)
146
  ```
147
 
148
+ ### Full Inference Script
149
 
150
+ See the [inference_colab.py](https://huggingface.co/AbstractPhil/tiny-flux/blob/main/inference_colab.py) for a complete generation pipeline with:
151
+ - Classifier-free guidance
152
+ - Batch generation
153
+ - Image saving
 
 
 
 
 
 
 
 
 
 
 
154
 
155
  ## Files
156
 
157
  ```
158
+ AbstractPhil/tiny-flux/
159
+ β”œβ”€β”€ model.safetensors # Model weights (~32MB)
160
  β”œβ”€β”€ config.json # Model configuration
 
161
  β”œβ”€β”€ README.md # This file
162
+ β”œβ”€β”€ model.py # Model architecture definition
163
  β”œβ”€β”€ inference_colab.py # Inference script
164
+ β”œβ”€β”€ train_colab.py # Training script
 
165
  β”œβ”€β”€ checkpoints/ # Training checkpoints
166
  β”‚ └── step_*.safetensors
167
  β”œβ”€β”€ logs/ # Tensorboard logs
168
  └── samples/ # Generated samples during training
169
  ```
170
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
171
  ## Limitations
172
 
173
  - **Resolution**: Trained on 512Γ—512 only
174
+ - **Quality**: Significantly lower than full Flux due to reduced capacity
175
  - **Text understanding**: Limited by smaller T5 encoder (768 vs 4096 dim)
176
+ - **Fine details**: May struggle with complex scenes or fine-grained details
177
+ - **Experimental**: Intended for research and learning, not production use
178
 
179
  ## Intended Use
180
 
181
+ - Understanding Flux/MMDiT architecture
182
+ - Rapid prototyping and experimentation
 
183
  - Educational purposes
184
+ - Resource-constrained environments
185
+ - Baseline for architecture modifications
186
 
187
  ## Citation
188
 
189
+ If you use TinyFlux in your research, please cite:
190
+
191
  ```bibtex
192
+ @misc{tinyflux2025,
193
+ title={TinyFlux: A Miniaturized Flux Architecture for Experimentation},
194
  author={AbstractPhil},
195
+ year={2025},
196
+ url={https://huggingface.co/AbstractPhil/tiny-flux}
197
  }
198
  ```
199
 
 
 
 
 
 
200
  ## Acknowledgments
201
 
202
  - [Black Forest Labs](https://blackforestlabs.ai/) for the original Flux architecture
 
208
 
209
  ---
210
 
211
+ **Note**: This is an experimental research model. For high-quality image generation, use the full [FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) or [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) models.