AbstractPhil commited on
Commit
418bd36
Β·
verified Β·
1 Parent(s): ff984ad

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +108 -155
README.md CHANGED
@@ -8,7 +8,8 @@ tags:
8
  - flux
9
  - text-to-image
10
  - image-generation
11
- - deep
 
12
  - experimental
13
  library_name: pytorch
14
  pipeline_tag: text-to-image
@@ -19,81 +20,56 @@ datasets:
19
  - AbstractPhil/flux-schnell-teacher-latents
20
  ---
21
 
22
- # TinyFlux-Deep
23
 
24
- An **expanded** TinyFlux architecture that increases depth and width while preserving learned representations. TinyFlux-Deep is ported from [TinyFlux](https://huggingface.co/AbstractPhil/tiny-flux) with strategic layer expansion and attention head doubling.
25
 
26
- ## Model Description
27
 
28
- TinyFlux-Deep extends the base TinyFlux model by:
29
- - **Doubling attention heads** (2 β†’ 4) with expanded hidden dimension (256 β†’ 512)
30
- - **5Γ— more double-stream layers** (3 β†’ 15)
31
- - **8Γ— more single-stream layers** (3 β†’ 25)
32
- - **Preserving learned weights** from TinyFlux in frozen anchor positions
33
 
34
- ### Architecture Comparison
35
 
36
- | Component | TinyFlux | TinyFlux-Deep | Flux |
37
- |-----------|----------|---------------|------|
 
 
 
 
 
 
 
 
 
 
 
 
38
  | Hidden size | 256 | **512** | 3072 |
39
  | Attention heads | 2 | **4** | 24 |
40
  | Head dimension | 128 | 128 | 128 |
41
  | Double-stream layers | 3 | **15** | 19 |
42
  | Single-stream layers | 3 | **25** | 38 |
43
  | VAE channels | 16 | 16 | 16 |
44
- | **Total params** | ~8M | **~85M** | ~12B |
45
-
46
- ### Layer Mapping (Ported from TinyFlux)
47
-
48
- The original TinyFlux weights are strategically distributed and frozen:
49
-
50
- **Single blocks (3 β†’ 25):**
51
- | TinyFlux Layer | TinyFlux-Deep Position | Status |
52
- |----------------|------------------------|--------|
53
- | 0 | 0 | Frozen |
54
- | 1 | 8, 12, 16 | Frozen (3 copies) |
55
- | 2 | 24 | Frozen |
56
- | β€” | 1-7, 9-11, 13-15, 17-23 | Trainable |
57
-
58
- **Double blocks (3 β†’ 15):**
59
- | TinyFlux Layer | TinyFlux-Deep Position | Status |
60
- |----------------|------------------------|--------|
61
- | 0 | 0 | Frozen |
62
- | 1 | 4, 7, 10 | Frozen (3 copies) |
63
- | 2 | 14 | Frozen |
64
- | β€” | 1-3, 5-6, 8-9, 11-13 | Trainable |
65
-
66
- **Trainable ratio:** ~70% of parameters
67
-
68
- ### Attention Head Expansion
69
-
70
- Original 2 heads are copied to new positions, with 2 new heads randomly initialized:
71
- - Old head 0 β†’ New head 0
72
- - Old head 1 β†’ New head 1
73
- - Heads 2-3 β†’ Xavier initialized (scaled 0.02Γ—)
74
 
75
  ### Text Encoders
76
 
77
- Same as TinyFlux:
78
- | Role | Model |
79
- |------|-------|
80
- | Sequence encoder | flan-t5-base (768 dim) |
81
- | Pooled encoder | CLIP-L (768 dim) |
82
 
83
  ## Training
84
 
85
- ### Strategy
86
 
87
- 1. **Port** TinyFlux weights with dimension expansion
88
- 2. **Freeze** ported layers as "anchor" knowledge
89
- 3. **Train** new layers to interpolate between anchors
90
- 4. **Optional:** Unfreeze all and fine-tune at lower LR
91
 
92
  ### Dataset
93
 
94
- Trained on [AbstractPhil/flux-schnell-teacher-latents](https://huggingface.co/datasets/AbstractPhil/flux-schnell-teacher-latents):
95
- - 10,000 samples
96
- - Pre-computed VAE latents (16, 64, 64) from 512Γ—512 images
97
  - Diverse prompts covering people, objects, scenes, styles
98
 
99
  ### Training Details
@@ -101,80 +77,64 @@ Trained on [AbstractPhil/flux-schnell-teacher-latents](https://huggingface.co/da
101
  - **Objective**: Flow matching (rectified flow)
102
  - **Timestep sampling**: Logit-normal with Flux shift (s=3.0)
103
  - **Loss weighting**: Min-SNR-Ξ³ (Ξ³=5.0)
104
- - **Optimizer**: AdamW (lr=5e-5, Ξ²=(0.9, 0.99), wd=0.01)
105
  - **Schedule**: Cosine with warmup
106
  - **Precision**: bfloat16
107
  - **Batch size**: 32 (16 Γ— 2 gradient accumulation)
 
 
 
 
 
 
 
108
 
109
  ## Usage
110
 
111
- ### Installation
112
 
113
  ```bash
114
  pip install torch transformers diffusers safetensors huggingface_hub
115
  ```
116
 
117
- ### Inference
118
 
119
  ```python
120
  import torch
121
  from huggingface_hub import hf_hub_download
122
  from safetensors.torch import load_file
123
- from transformers import T5EncoderModel, T5Tokenizer, CLIPTextModel, CLIPTokenizer
124
- from diffusers import AutoencoderKL
125
 
126
- # Load model (copy TinyFlux class definition first, use TinyFluxDeepConfig)
127
  config = TinyFluxDeepConfig()
128
- model = TinyFlux(config).to("cuda").to(torch.bfloat16)
129
-
130
- weights = load_file(hf_hub_download("AbstractPhil/tiny-flux-deep", "model.safetensors"))
131
- model.load_state_dict(weights, strict=False) # strict=False for precomputed buffers
 
 
 
 
132
  model.eval()
 
 
 
 
 
133
 
134
- # Load encoders
135
- t5_tok = T5Tokenizer.from_pretrained("google/flan-t5-base")
136
- t5_enc = T5EncoderModel.from_pretrained("google/flan-t5-base", torch_dtype=torch.bfloat16).to("cuda")
137
- clip_tok = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
138
- clip_enc = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14", torch_dtype=torch.bfloat16).to("cuda")
139
- vae = AutoencoderKL.from_pretrained("black-forest-labs/FLUX.1-schnell", subfolder="vae", torch_dtype=torch.bfloat16).to("cuda")
140
-
141
- # Encode prompt
142
- prompt = "a photo of a cat sitting on a windowsill"
143
- t5_in = t5_tok(prompt, max_length=128, padding="max_length", truncation=True, return_tensors="pt").to("cuda")
144
- t5_out = t5_enc(**t5_in).last_hidden_state
145
- clip_in = clip_tok(prompt, max_length=77, padding="max_length", truncation=True, return_tensors="pt").to("cuda")
146
- clip_out = clip_enc(**clip_in).pooler_output
147
-
148
- # Euler sampling with Flux shift
149
  def flux_shift(t, s=3.0):
 
150
  return s * t / (1 + (s - 1) * t)
151
 
152
- x = torch.randn(1, 64*64, 16, device="cuda", dtype=torch.bfloat16)
153
- img_ids = TinyFlux.create_img_ids(1, 64, 64, "cuda")
154
-
155
- t_linear = torch.linspace(0, 1, 21, device="cuda")
156
- timesteps = flux_shift(t_linear)
157
 
158
- for i in range(20):
159
- t = timesteps[i].unsqueeze(0)
160
- dt = timesteps[i+1] - timesteps[i]
161
- guidance = torch.tensor([3.5], device="cuda", dtype=torch.bfloat16)
162
 
163
- v = model(
164
- hidden_states=x,
165
- encoder_hidden_states=t5_out,
166
- pooled_projections=clip_out,
167
- timestep=t,
168
- img_ids=img_ids,
169
- guidance=guidance,
170
- )
171
- x = x + v * dt
172
-
173
- # Decode
174
- latents = x.reshape(1, 64, 64, 16).permute(0, 3, 1, 2)
175
- latents = latents / vae.config.scaling_factor
176
- image = vae.decode(latents.float()).sample
177
- image = (image / 2 + 0.5).clamp(0, 1)
178
  ```
179
 
180
  ### Configuration
@@ -199,84 +159,77 @@ class TinyFluxDeepConfig:
199
 
200
  ```
201
  AbstractPhil/tiny-flux-deep/
202
- β”œβ”€β”€ model.safetensors # Model weights (~340MB)
203
- β”œβ”€β”€ config.json # Model configuration
204
- β”œβ”€β”€ frozen_params.json # List of frozen parameter names
205
- β”œβ”€β”€ README.md # This file
206
- β”œβ”€β”€ model.py # Model architecture (includes TinyFluxDeepConfig)
207
- β”œβ”€β”€ inference_colab.py # Inference script
208
- β”œβ”€β”€ train_deep_colab.py # Training script with layer freezing
209
- β”œβ”€β”€ port_to_deep.py # Porting script from TinyFlux
210
- β”œβ”€β”€ checkpoints/ # Training checkpoints
211
- β”‚ └── step_*.safetensors
212
- β”œβ”€β”€ logs/ # Tensorboard logs
213
- └── samples/ # Generated samples during training
214
  ```
215
 
216
- ## Porting from TinyFlux
217
 
218
- To create a new TinyFlux-Deep from scratch:
219
 
220
- ```python
221
- # Run port_to_deep.py
222
- # 1. Downloads AbstractPhil/tiny-flux weights
223
- # 2. Creates TinyFlux-Deep model (512 hidden, 4 heads, 25 single, 15 double)
224
- # 3. Expands attention heads (2β†’4) and hidden dimension (256β†’512)
225
- # 4. Distributes layers to anchor positions
226
- # 5. Saves to AbstractPhil/tiny-flux-deep
227
- ```
228
 
229
- ## Comparison with TinyFlux
230
 
231
- | Aspect | TinyFlux | TinyFlux-Deep |
232
- |--------|----------|---------------|
233
- | Parameters | ~8M | ~85M |
234
- | Memory (bf16) | ~16MB | ~170MB |
235
- | Forward pass | ~15ms | ~60ms |
236
- | Capacity | Limited | Moderate |
237
- | Training | From scratch | Ported + fine-tuned |
238
 
239
  ## Limitations
240
 
241
- - **Resolution**: Trained on 512Γ—512 only
242
- - **Quality**: Better than TinyFlux, still below full Flux
243
- - **Text understanding**: Limited by smaller T5 encoder (768 vs 4096 dim)
244
- - **Early training**: Model is actively being trained
245
- - **Experimental**: Intended for research, not production
246
 
247
  ## Intended Use
248
 
249
- - Studying model scaling and expansion techniques
250
- - Testing layer freezing and knowledge transfer
251
- - Rapid prototyping with moderate capacity
252
  - Educational purposes
253
- - Baseline for architecture experiments
 
 
 
 
254
 
255
  ## Citation
256
 
257
  ```bibtex
258
- @misc{tinyfluxdeep2026,
259
- title={TinyFlux-Deep: Expanded Flux Architecture with Knowledge Preservation},
260
  author={AbstractPhil},
261
  year={2026},
262
  url={https://huggingface.co/AbstractPhil/tiny-flux-deep}
263
  }
264
  ```
265
 
266
- ## Related Models
267
-
268
- - [AbstractPhil/tiny-flux](https://huggingface.co/AbstractPhil/tiny-flux) - Base model (8M params)
269
- - [black-forest-labs/FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) - Original Flux
270
-
271
- ## Acknowledgments
272
 
273
- - [Black Forest Labs](https://blackforestlabs.ai/) for the original Flux architecture
274
- - [Hugging Face](https://huggingface.co/) for diffusers and transformers libraries
 
275
 
276
  ## License
277
 
278
- MIT License - See LICENSE file for details.
279
 
280
  ---
281
 
282
- **Note**: This is an experimental research model under active development. Training is ongoing and weights may be updated frequently.
 
8
  - flux
9
  - text-to-image
10
  - image-generation
11
+ - tinyflux
12
+ - lailah
13
  - experimental
14
  library_name: pytorch
15
  pipeline_tag: text-to-image
 
20
  - AbstractPhil/flux-schnell-teacher-latents
21
  ---
22
 
23
+ # TinyFlux-Deep (Lailah)
24
 
25
+ **TinyFlux-Lailah** is an expanded TinyFlux architecture with increased depth and width. Originally ported from [TinyFlux](https://huggingface.co/AbstractPhil/tiny-flux) with strategic layer expansion and attention head doubling, now training end-to-end on teacher latents.
26
 
27
+ > **Current checkpoint:** `step_286250` | **Status:** Active training
28
 
29
+ ## Quick Start (Colab)
 
 
 
 
30
 
31
+ The easiest way to test Lailah:
32
 
33
+ 1. Open [Google Colab](https://colab.research.google.com/)
34
+ 2. Copy the contents of [`colab_inference_lailah_early.py`](./colab_inference_lailah_early.py)
35
+ 3. Run the cells
36
+
37
+ ```python
38
+ # Or fetch directly:
39
+ !wget https://huggingface.co/AbstractPhil/tiny-flux-deep/raw/main/colab_inference_lailah_early.py
40
+ %run colab_inference_lailah_early.py
41
+ ```
42
+
43
+ ## Architecture
44
+
45
+ | Component | TinyFlux | TinyFlux-Lailah | Flux |
46
+ |-----------|----------|-----------------|------|
47
  | Hidden size | 256 | **512** | 3072 |
48
  | Attention heads | 2 | **4** | 24 |
49
  | Head dimension | 128 | 128 | 128 |
50
  | Double-stream layers | 3 | **15** | 19 |
51
  | Single-stream layers | 3 | **25** | 38 |
52
  | VAE channels | 16 | 16 | 16 |
53
+ | **Total params** | ~10.7M | **~241.8M** | ~12B |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
 
55
  ### Text Encoders
56
 
57
+ | Role | Model | Dimension |
58
+ |------|-------|-----------|
59
+ | Sequence encoder | flan-t5-base | 768 |
60
+ | Pooled encoder | CLIP-L | 768 |
 
61
 
62
  ## Training
63
 
64
+ ### Current Approach
65
 
66
+ All parameters are trainable. The model was initially ported from TinyFlux with frozen anchor layers, but current training runs with everything unfrozen for maximum flexibility.
 
 
 
67
 
68
  ### Dataset
69
 
70
+ Training on [AbstractPhil/flux-schnell-teacher-latents](https://huggingface.co/datasets/AbstractPhil/flux-schnell-teacher-latents):
71
+ - Pre-computed VAE latents from Flux-Schnell generations
72
+ - 512Γ—512 resolution (64Γ—64 latent space)
73
  - Diverse prompts covering people, objects, scenes, styles
74
 
75
  ### Training Details
 
77
  - **Objective**: Flow matching (rectified flow)
78
  - **Timestep sampling**: Logit-normal with Flux shift (s=3.0)
79
  - **Loss weighting**: Min-SNR-Ξ³ (Ξ³=5.0)
80
+ - **Optimizer**: AdamW (lr=3e-4, Ξ²=(0.9, 0.99), wd=0.01)
81
  - **Schedule**: Cosine with warmup
82
  - **Precision**: bfloat16
83
  - **Batch size**: 32 (16 Γ— 2 gradient accumulation)
84
+ - **EMA decay**: 0.9999
85
+
86
+ ### Checkpoints
87
+
88
+ Checkpoints are saved every 625 steps with both main and EMA weights:
89
+ - `checkpoints/step_XXXXX.safetensors` - Training weights
90
+ - `checkpoints/step_XXXXX_ema.safetensors` - EMA weights (recommended for inference)
91
 
92
  ## Usage
93
 
94
+ ### Dependencies
95
 
96
  ```bash
97
  pip install torch transformers diffusers safetensors huggingface_hub
98
  ```
99
 
100
+ ### Basic Inference
101
 
102
  ```python
103
  import torch
104
  from huggingface_hub import hf_hub_download
105
  from safetensors.torch import load_file
 
 
106
 
107
+ # Load model (requires TinyFluxDeep class from tinyflux_deep.py)
108
  config = TinyFluxDeepConfig()
109
+ model = TinyFluxDeep(config).to("cuda", torch.bfloat16)
110
+
111
+ # Load EMA weights (recommended) or main weights
112
+ weights = load_file(hf_hub_download(
113
+ "AbstractPhil/tiny-flux-deep",
114
+ "checkpoints/step_286250_ema.safetensors" # Use _ema for best quality
115
+ ))
116
+ model.load_state_dict(weights, strict=False)
117
  model.eval()
118
+ ```
119
+
120
+ ### Sampling
121
+
122
+ Lailah uses Euler discrete sampling with Flux timestep shift:
123
 
124
+ ```python
 
 
 
 
 
 
 
 
 
 
 
 
 
 
125
  def flux_shift(t, s=3.0):
126
+ """Bias timesteps toward data (higher t)."""
127
  return s * t / (1 + (s - 1) * t)
128
 
129
+ # 20-50 steps recommended
130
+ timesteps = flux_shift(torch.linspace(0, 1, num_steps + 1))
 
 
 
131
 
132
+ for i in range(num_steps):
133
+ t_curr, t_next = timesteps[i], timesteps[i + 1]
134
+ dt = t_next - t_curr
 
135
 
136
+ v = model(hidden_states=x, encoder_hidden_states=t5_out, ...)
137
+ x = x + v * dt # Euler step
 
 
 
 
 
 
 
 
 
 
 
 
 
138
  ```
139
 
140
  ### Configuration
 
159
 
160
  ```
161
  AbstractPhil/tiny-flux-deep/
162
+ β”œβ”€β”€ model.safetensors # Latest best weights
163
+ β”œβ”€β”€ tinyflux_deep.py # Model architecture
164
+ β”œβ”€β”€ colab_inference_lailah_early.py # Ready-to-run Colab inference
165
+ β”œβ”€β”€ inference_tinyflux_deep.py # Standalone inference script
166
+ β”œβ”€β”€ train_tinyflux_deep.py # Training script
167
+ β”œβ”€β”€ checkpoints/
168
+ β”‚ β”œβ”€β”€ step_286250.safetensors # Training weights
169
+ β”‚ └── step_286250_ema.safetensors # EMA weights (use this)
170
+ β”œβ”€β”€ samples/ # Generated samples during training
171
+ └── README.md
 
 
172
  ```
173
 
174
+ ## Origin: Porting from TinyFlux
175
 
176
+ Lailah was initialized by porting TinyFlux weights:
177
 
178
+ 1. **Attention head expansion** (2 β†’ 4): Original heads copied to positions 0-1, new heads 2-3 Xavier initialized
179
+ 2. **Hidden dimension expansion** (256 β†’ 512): Weights tiled and scaled
180
+ 3. **Layer distribution**: Original 3 layers distributed across 15/25 positions as initialization anchors
181
+
182
+ The initial port used selective freezing of anchor layers, but current training leaves all parameters unfrozen.
 
 
 
183
 
184
+ ## Comparison
185
 
186
+ | Aspect | TinyFlux | Lailah | Full Flux |
187
+ |--------|----------|--------|-----------|
188
+ | Parameters | 10.7M | 241.8M | 12B |
189
+ | Memory (bf16) | ~22MB | ~484MB | ~24GB |
190
+ | Quality | Limited | Moderate | High |
191
+ | Speed (A100) | ~10ms | ~40ms | ~200ms |
 
192
 
193
  ## Limitations
194
 
195
+ - **Resolution**: 512Γ—512 only (64Γ—64 latent)
196
+ - **Early training**: Quality improving but not production-ready
197
+ - **Text capacity**: Limited by flan-t5-base (768 dim vs Flux's 4096)
198
+ - **Experimental**: Research model, expect artifacts
 
199
 
200
  ## Intended Use
201
 
202
+ - Rapid prototyping and iteration
203
+ - Studying flow matching at moderate scale
204
+ - Architecture experiments
205
  - Educational purposes
206
+ - Baseline comparisons
207
+
208
+ ## Name
209
+
210
+ **Lailah** (ΧœΧ™ΧœΧ”) - Angel of the night in Jewish tradition, said to guard souls. Chosen for this model's role as a smaller guardian exploring the same space as larger models.
211
 
212
  ## Citation
213
 
214
  ```bibtex
215
+ @misc{tinyfluxlailah2026,
216
+ title={TinyFlux-Lailah: Compact Flow Matching for Text-to-Image},
217
  author={AbstractPhil},
218
  year={2026},
219
  url={https://huggingface.co/AbstractPhil/tiny-flux-deep}
220
  }
221
  ```
222
 
223
+ ## Related
 
 
 
 
 
224
 
225
+ - [AbstractPhil/tiny-flux](https://huggingface.co/AbstractPhil/tiny-flux) - Base model (10.7M)
226
+ - [AbstractPhil/flux-schnell-teacher-latents](https://huggingface.co/datasets/AbstractPhil/flux-schnell-teacher-latents) - Training data
227
+ - [black-forest-labs/FLUX.1-schnell](https://huggingface.co/black-forest-labs/FLUX.1-schnell) - Teacher model
228
 
229
  ## License
230
 
231
+ MIT License
232
 
233
  ---
234
 
235
+ **Status**: Active training. Checkpoints updated regularly. Use EMA weights for best results.