AbstractPhil commited on
Commit
82ba681
Β·
verified Β·
1 Parent(s): 56dce31

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +282 -3
README.md CHANGED
@@ -1,3 +1,282 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ tags:
6
+ - diffusion
7
+ - flow-matching
8
+ - flux
9
+ - text-to-image
10
+ - image-generation
11
+ - deep
12
+ - experimental
13
+ library_name: pytorch
14
+ pipeline_tag: text-to-image
15
+ base_model:
16
+ - AbstractPhil/tiny-flux
17
+ - black-forest-labs/FLUX.1-schnell
18
+ datasets:
19
+ - AbstractPhil/flux-schnell-teacher-latents
20
+ ---
21
+
22
+ # TinyFlux-Deep
23
+
24
+ An **expanded** TinyFlux architecture that increases depth and width while preserving learned representations. TinyFlux-Deep is ported from [TinyFlux](https://huggingface.co/AbstractPhil/tiny-flux) with strategic layer expansion and attention head doubling.
25
+
26
+ ## Model Description
27
+
28
+ TinyFlux-Deep extends the base TinyFlux model by:
29
+ - **Doubling attention heads** (2 β†’ 4) with expanded hidden dimension (256 β†’ 512)
30
+ - **5Γ— more double-stream layers** (3 β†’ 15)
31
+ - **8Γ— more single-stream layers** (3 β†’ 25)
32
+ - **Preserving learned weights** from TinyFlux in frozen anchor positions
33
+
34
+ ### Architecture Comparison
35
+
36
+ | Component | TinyFlux | TinyFlux-Deep | Flux |
37
+ |-----------|----------|---------------|------|
38
+ | Hidden size | 256 | **512** | 3072 |
39
+ | Attention heads | 2 | **4** | 24 |
40
+ | Head dimension | 128 | 128 | 128 |
41
+ | Double-stream layers | 3 | **15** | 19 |
42
+ | Single-stream layers | 3 | **25** | 38 |
43
+ | VAE channels | 16 | 16 | 16 |
44
+ | **Total params** | ~8M | **~85M** | ~12B |
45
+
46
+ ### Layer Mapping (Ported from TinyFlux)
47
+
48
+ The original TinyFlux weights are strategically distributed and frozen:
49
+
50
+ **Single blocks (3 β†’ 25):**
51
+ | TinyFlux Layer | TinyFlux-Deep Position | Status |
52
+ |----------------|------------------------|--------|
53
+ | 0 | 0 | Frozen |
54
+ | 1 | 8, 12, 16 | Frozen (3 copies) |
55
+ | 2 | 24 | Frozen |
56
+ | β€” | 1-7, 9-11, 13-15, 17-23 | Trainable |
57
+
58
+ **Double blocks (3 β†’ 15):**
59
+ | TinyFlux Layer | TinyFlux-Deep Position | Status |
60
+ |----------------|------------------------|--------|
61
+ | 0 | 0 | Frozen |
62
+ | 1 | 4, 7, 10 | Frozen (3 copies) |
63
+ | 2 | 14 | Frozen |
64
+ | β€” | 1-3, 5-6, 8-9, 11-13 | Trainable |
65
+
66
+ **Trainable ratio:** ~70% of parameters
67
+
68
+ ### Attention Head Expansion
69
+
70
+ Original 2 heads are copied to new positions, with 2 new heads randomly initialized:
71
+ - Old head 0 β†’ New head 0
72
+ - Old head 1 β†’ New head 1
73
+ - Heads 2-3 β†’ Xavier initialized (scaled 0.02Γ—)
74
+
75
+ ### Text Encoders
76
+
77
+ Same as TinyFlux:
78
+ | Role | Model |
79
+ |------|-------|
80
+ | Sequence encoder | flan-t5-base (768 dim) |
81
+ | Pooled encoder | CLIP-L (768 dim) |
82
+
83
+ ## Training
84
+
85
+ ### Strategy
86
+
87
+ 1. **Port** TinyFlux weights with dimension expansion
88
+ 2. **Freeze** ported layers as "anchor" knowledge
89
+ 3. **Train** new layers to interpolate between anchors
90
+ 4. **Optional:** Unfreeze all and fine-tune at lower LR
91
+
92
+ ### Dataset
93
+
94
+ Trained on [AbstractPhil/flux-schnell-teacher-latents](https://huggingface.co/datasets/AbstractPhil/flux-schnell-teacher-latents):
95
+ - 10,000 samples
96
+ - Pre-computed VAE latents (16, 64, 64) from 512Γ—512 images
97
+ - Diverse prompts covering people, objects, scenes, styles
98
+
99
+ ### Training Details
100
+
101
+ - **Objective**: Flow matching (rectified flow)
102
+ - **Timestep sampling**: Logit-normal with Flux shift (s=3.0)
103
+ - **Loss weighting**: Min-SNR-Ξ³ (Ξ³=5.0)
104
+ - **Optimizer**: AdamW (lr=5e-5, Ξ²=(0.9, 0.99), wd=0.01)
105
+ - **Schedule**: Cosine with warmup
106
+ - **Precision**: bfloat16
107
+ - **Batch size**: 32 (16 Γ— 2 gradient accumulation)
108
+
109
+ ## Usage
110
+
111
+ ### Installation
112
+
113
+ ```bash
114
+ pip install torch transformers diffusers safetensors huggingface_hub
115
+ ```
116
+
117
+ ### Inference
118
+
119
+ ```python
120
+ import torch
121
+ from huggingface_hub import hf_hub_download
122
+ from safetensors.torch import load_file
123
+ from transformers import T5EncoderModel, T5Tokenizer, CLIPTextModel, CLIPTokenizer
124
+ from diffusers import AutoencoderKL
125
+
126
+ # Load model (copy TinyFlux class definition first, use TinyFluxDeepConfig)
127
+ config = TinyFluxDeepConfig()
128
+ model = TinyFlux(config).to("cuda").to(torch.bfloat16)
129
+
130
+ weights = load_file(hf_hub_download("AbstractPhil/tiny-flux-deep", "model.safetensors"))
131
+ model.load_state_dict(weights, strict=False) # strict=False for precomputed buffers
132
+ model.eval()
133
+
134
+ # Load encoders
135
+ t5_tok = T5Tokenizer.from_pretrained("google/flan-t5-base")
136
+ t5_enc = T5EncoderModel.from_pretrained("google/flan-t5-base", torch_dtype=torch.bfloat16).to("cuda")
137
+ clip_tok = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
138
+ clip_enc = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14", torch_dtype=torch.bfloat16).to("cuda")
139
+ vae = AutoencoderKL.from_pretrained("black-forest-labs/FLUX.1-schnell", subfolder="vae", torch_dtype=torch.bfloat16).to("cuda")
140
+
141
+ # Encode prompt
142
+ prompt = "a photo of a cat sitting on a windowsill"
143
+ t5_in = t5_tok(prompt, max_length=128, padding="max_length", truncation=True, return_tensors="pt").to("cuda")
144
+ t5_out = t5_enc(**t5_in).last_hidden_state
145
+ clip_in = clip_tok(prompt, max_length=77, padding="max_length", truncation=True, return_tensors="pt").to("cuda")
146
+ clip_out = clip_enc(**clip_in).pooler_output
147
+
148
+ # Euler sampling with Flux shift
149
+ def flux_shift(t, s=3.0):
150
+ return s * t / (1 + (s - 1) * t)
151
+
152
+ x = torch.randn(1, 64*64, 16, device="cuda", dtype=torch.bfloat16)
153
+ img_ids = TinyFlux.create_img_ids(1, 64, 64, "cuda")
154
+
155
+ t_linear = torch.linspace(0, 1, 21, device="cuda")
156
+ timesteps = flux_shift(t_linear)
157
+
158
+ for i in range(20):
159
+ t = timesteps[i].unsqueeze(0)
160
+ dt = timesteps[i+1] - timesteps[i]
161
+ guidance = torch.tensor([3.5], device="cuda", dtype=torch.bfloat16)
162
+
163
+ v = model(
164
+ hidden_states=x,
165
+ encoder_hidden_states=t5_out,
166
+ pooled_projections=clip_out,
167
+ timestep=t,
168
+ img_ids=img_ids,
169
+ guidance=guidance,
170
+ )
171
+ x = x + v * dt
172
+
173
+ # Decode
174
+ latents = x.reshape(1, 64, 64, 16).permute(0, 3, 1, 2)
175
+ latents = latents / vae.config.scaling_factor
176
+ image = vae.decode(latents.float()).sample
177
+ image = (image / 2 + 0.5).clamp(0, 1)
178
+ ```
179
+
180
+ ### Configuration
181
+
182
+ ```python
183
+ @dataclass
184
+ class TinyFluxDeepConfig:
185
+ hidden_size: int = 512
186
+ num_attention_heads: int = 4
187
+ attention_head_dim: int = 128
188
+ in_channels: int = 16
189
+ joint_attention_dim: int = 768
190
+ pooled_projection_dim: int = 768
191
+ num_double_layers: int = 15
192
+ num_single_layers: int = 25
193
+ mlp_ratio: float = 4.0
194
+ axes_dims_rope: Tuple[int, int, int] = (16, 56, 56)
195
+ guidance_embeds: bool = True
196
+ ```
197
+
198
+ ## Files
199
+
200
+ ```
201
+ AbstractPhil/tiny-flux-deep/
202
+ β”œβ”€β”€ model.safetensors # Model weights (~340MB)
203
+ β”œβ”€β”€ config.json # Model configuration
204
+ β”œβ”€β”€ frozen_params.json # List of frozen parameter names
205
+ β”œβ”€β”€ README.md # This file
206
+ β”œβ”€β”€ model.py # Model architecture (includes TinyFluxDeepConfig)
207
+ β”œβ”€β”€ inference_colab.py # Inference script
208
+ β”œβ”€β”€ train_deep_colab.py # Training script with layer freezing
209
+ β”œβ”€β”€ port_to_deep.py # Porting script from TinyFlux
210
+ β”œβ”€β”€ checkpoints/ # Training checkpoints
211
+ β”‚ └── step_*.safetensors
212
+ β”œβ”€β”€ logs/ # Tensorboard logs
213
+ └── samples/ # Generated samples during training
214
+ ```
215
+
216
+ ## Porting from TinyFlux
217
+
218
+ To create a new TinyFlux-Deep from scratch:
219
+
220
+ ```python
221
+ # Run port_to_deep.py
222
+ # 1. Downloads AbstractPhil/tiny-flux weights
223
+ # 2. Creates TinyFlux-Deep model (512 hidden, 4 heads, 25 single, 15 double)
224
+ # 3. Expands attention heads (2β†’4) and hidden dimension (256β†’512)
225
+ # 4. Distributes layers to anchor positions
226
+ # 5. Saves to AbstractPhil/tiny-flux-deep
227
+ ```
228
+
229
+ ## Comparison with TinyFlux
230
+
231
+ | Aspect | TinyFlux | TinyFlux-Deep |
232
+ |--------|----------|---------------|
233
+ | Parameters | ~8M | ~85M |
234
+ | Memory (bf16) | ~16MB | ~170MB |
235
+ | Forward pass | ~15ms | ~60ms |
236
+ | Capacity | Limited | Moderate |
237
+ | Training | From scratch | Ported + fine-tuned |
238
+
239
+ ## Limitations
240
+
241
+ - **Resolution**: Trained on 512Γ—512 only
242
+ - **Quality**: Better than TinyFlux, still below full Flux
243
+ - **Text understanding**: Limited by smaller T5 encoder (768 vs 4096 dim)
244
+ - **Early training**: Model is actively being trained
245
+ - **Experimental**: Intended for research, not production
246
+
247
+ ## Intended Use
248
+
249
+ - Studying model scaling and expansion techniques
250
+ - Testing layer freezing and knowledge transfer
251
+ - Rapid prototyping with moderate capacity
252
+ - Educational purposes
253
+ - Baseline for architecture experiments
254
+
255
+ ## Citation
256
+
257
+ ```bibtex
258
+ @misc{tinyfluxdeep2026,
259
+ title={TinyFlux-Deep: Expanded Flux Architecture with Knowledge Preservation},
260
+ author={AbstractPhil},
261
+ year={2026},
262
+ url={https://huggingface.co/AbstractPhil/tiny-flux-deep}
263
+ }
264
+ ```
265
+
266
+ ## Related Models
267
+
268
+ - [AbstractPhil/tiny-flux](https://huggingface.co/AbstractPhil/tiny-flux) - Base model (8M params)
269
+ - [black-forest-labs/FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) - Original Flux
270
+
271
+ ## Acknowledgments
272
+
273
+ - [Black Forest Labs](https://blackforestlabs.ai/) for the original Flux architecture
274
+ - [Hugging Face](https://huggingface.co/) for diffusers and transformers libraries
275
+
276
+ ## License
277
+
278
+ MIT License - See LICENSE file for details.
279
+
280
+ ---
281
+
282
+ **Note**: This is an experimental research model under active development. Training is ongoing and weights may be updated frequently.