AbstractPhil commited on
Commit
49a6bd4
·
verified ·
1 Parent(s): 8135143

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +318 -3
README.md CHANGED
@@ -1,3 +1,318 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+
3
+ ---
4
+ license: mit
5
+ base_model: runwayml/stable-diffusion-v1-5
6
+ tags:
7
+ - stable-diffusion
8
+ - diffusion
9
+ - distillation
10
+ - flow-matching
11
+ - geometric-deep-learning
12
+ - research
13
+ library_name: diffusers
14
+ pipeline_tag: text-to-image
15
+ ---
16
+
17
+ # SD1.5 Flow-Matching Distillation with Geometric Guidance (EXPERIMENTAL)
18
+
19
+ ## ⚠️ Experimental Research
20
+
21
+ **Status:** Training in progress | No guarantees of convergence or quality
22
+
23
+ This is an experimental approach to distilling Stable Diffusion 1.5 using flow matching with geometric guidance from [GeoDavidCollective](https://huggingface.co/AbstractPhil/geo-david-collective-sd15-base-e40). Results are not yet validated.
24
+
25
+ ## Overview
26
+
27
+ This trainer attempts to distill Stable Diffusion 1.5 using **v-prediction flow matching** with **adaptive per-block weighting** based on geometric quality assessment. Unlike traditional distillation that treats all UNet blocks equally, this approach uses a pre-trained geometric model (David) to evaluate student features and dynamically adjust training emphasis per block.
28
+
29
+ **Hypothesis:** Geometric guidance may help the student learn SD1.5's internal structure more effectively by:
30
+ - Identifying which blocks are learning poorly
31
+ - Applying stronger supervision where needed
32
+ - Maintaining geometric stability during training
33
+
34
+ **Status:** Hypothesis untested. Requires ablation study comparing David-guided vs. vanilla flow matching.
35
+
36
+ ## Architecture
37
+
38
+ ### Three-Component System
39
+
40
+ ```
41
+ Teacher (SD1.5 UNet, frozen, FP16)
42
+ ↓ provides ε* → v* targets + features
43
+
44
+ Student (Trainable UNet, FP16)
45
+ ↓ predicts v̂ + features
46
+
47
+ Flow Matching Loss: MSE(v̂, v*)
48
+
49
+ +
50
+
51
+ David Assessor (GeoDavidCollective, frozen, 872M params)
52
+ ↓ evaluates student features per block
53
+ ↓ outputs: e_t (timestep error), e_p (pattern entropy), coh (coherence)
54
+
55
+ Fusion System: λ_b = w_b · (1 + α·e_t + β·e_p + δ·(1-coh))
56
+ ↓ converts metrics to per-block penalties
57
+
58
+ Block Losses: Σ λ_b · (KD loss per block)
59
+
60
+ Total: L_flow + block_weight · L_blocks
61
+ ```
62
+
63
+ ### Components
64
+
65
+ **Teacher**: SD1.5 UNet (frozen, FP16)
66
+ - Provides ground truth for flow matching
67
+ - Extracts spatial features per block
68
+
69
+ **Student**: Trainable UNet (FP16)
70
+ - Initialized from teacher weights
71
+ - Learns v-prediction objective
72
+ - Features assessed by David
73
+
74
+ **David**: GeoDavidCollective (frozen)
75
+ - Pre-trained geometric model
76
+ - Evaluates feature quality per block
77
+ - Provides adaptive weighting signals
78
+
79
+ **Fusion**: Dynamic penalty calculator
80
+ - `λ_b = w_b · (1 + α·e_t + β·e_p + δ·(1-coh))`
81
+ - Bounded: `[0.5, 3.0]`
82
+ - Higher λ = more training emphasis
83
+
84
+ ## Training Configuration
85
+
86
+ ### Dataset
87
+ ```yaml
88
+ Source: SymbolicPromptDataset (synthetic prompts)
89
+ Samples: 200,000
90
+ Batch Size: 64
91
+ Epochs: 10
92
+ Workers: 2
93
+ ```
94
+
95
+ ### Optimization
96
+ ```yaml
97
+ Optimizer: AdamW
98
+ Learning Rate: 1e-4
99
+ Weight Decay: 1e-3
100
+ Scheduler: CosineAnnealingLR
101
+ Gradient Clipping: 1.0
102
+ Mixed Precision: Enabled (FP16)
103
+ ```
104
+
105
+ ### Loss Weights
106
+ ```yaml
107
+ Global Flow Weight: 1.0
108
+ Block Penalty Weight: 0.05 # Critical hyperparameter!
109
+ KD Weight: 0.25 (cosine similarity on pooled features)
110
+ Local Flow Heads: Disabled
111
+ ```
112
+
113
+ ### David Fusion
114
+ ```yaml
115
+ Base Block Weights:
116
+ down_0: 0.7, down_1: 0.9, down_2: 1.0, down_3: 1.1
117
+ mid: 1.2, up_0: 1.1, up_1: 1.0, up_2: 0.9, up_3: 0.7
118
+
119
+ Fusion Coefficients:
120
+ alpha (timestep): 0.5
121
+ beta (pattern): 0.25
122
+ delta (incoherence): 0.25
123
+
124
+ Lambda Bounds: [0.5, 3.0]
125
+ ```
126
+
127
+ ## Training Progress (Epoch 1/10)
128
+
129
+ ### Current Metrics
130
+ ```
131
+ L_total: 0.24
132
+ L_flow: 0.23
133
+ L_blocks: 0.07
134
+ Speed: ~1.5 it/s (A100)
135
+ ```
136
+
137
+ **Interpretation:**
138
+ - Block losses balanced after fixing `block_penalty_weight`
139
+ - Flow loss converging as expected
140
+ - No evidence of collapse or divergence yet
141
+
142
+ ### Expected Timeline (Unvalidated)
143
+ ```
144
+ Epoch 1-2: Loss stabilization
145
+ Epoch 3-5: Feature structure learning (images may be blurry)
146
+ Epoch 8-10: Potential convergence (quality unknown)
147
+ ```
148
+
149
+ **Note:** No baseline comparison yet. Cannot claim faster/better convergence without ablation study.
150
+
151
+ ## Model Files
152
+
153
+ Training saves checkpoints as:
154
+ ```
155
+ checkpoints/
156
+ ├── checkpoint_epoch_002.safetensors
157
+ ├── checkpoint_epoch_004.safetensors
158
+ └── final.safetensors
159
+ ```
160
+
161
+ Each checkpoint contains student UNet weights only.
162
+
163
+ ## Inference
164
+
165
+ Model can be sampled using standard diffusion samplers (DDPM, DDIM) with v-prediction:
166
+
167
+ ```python
168
+ # Pseudocode - implementation details TBD
169
+ x_t = noise
170
+ for t in reversed(timesteps):
171
+ v = student_unet(x_t, t, text_embeddings)
172
+ x_t = step(x_t, v, t) # v-prediction update
173
+ image = vae.decode(x_t)
174
+ ```
175
+
176
+ Requires SD1.5 VAE and text encoder (not included in checkpoint).
177
+
178
+ ## Known Issues
179
+
180
+ - ❓ No proof this approach works better than vanilla distillation
181
+ - ❓ Optimal `block_penalty_weight` unknown (currently 0.05)
182
+ - ❓ May require tuning lambda bounds for different datasets
183
+ - ❓ Inference quality unvalidated
184
+ - ❌ Not compatible with ComfyUI without conversion (details TBD)
185
+ - ❌ No SD1.5 components included (VAE, text encoder)
186
+
187
+ ## Future Work
188
+
189
+ ### Required Validation
190
+ 1. **Ablation Study**: Train identical model WITHOUT David guidance
191
+ 2. **Quality Metrics**: FID, CLIP score vs. SD1.5 baseline
192
+ 3. **Convergence Analysis**: Compare learning curves
193
+ 4. **Inference Testing**: Visual quality assessment
194
+
195
+ ### Potential Improvements
196
+ - Adaptive `block_penalty_weight` scheduling
197
+ - Per-block learning rates
198
+ - David warmup strategy
199
+ - Better fusion formulas
200
+
201
+ ## Experimental Design
202
+
203
+ ### Hypothesis
204
+ Geometric guidance from David will improve distillation by:
205
+ 1. Identifying poorly-learning blocks
206
+ 2. Applying adaptive supervision
207
+ 3. Maintaining feature geometry
208
+
209
+ ### Test Plan
210
+ ```
211
+ Control: SD1.5 flow matching (no David)
212
+ Treatment: SD1.5 flow matching + David guidance
213
+ Metrics: Loss curves, FID, CLIP score, visual quality
214
+ ```
215
+
216
+ ### Success Criteria
217
+ - Faster convergence (fewer epochs to target loss)
218
+ - Better final quality (lower FID)
219
+ - More stable training (less variance)
220
+
221
+ **Status:** Experiment in progress, no results yet.
222
+
223
+ ## Technical Details
224
+
225
+ ### David Assessment
226
+ Per block, David outputs:
227
+ - `e_t`: Cross-entropy on timestep classification (proxy for temporal understanding)
228
+ - `e_p`: Entropy on pattern classification (proxy for feature diversity)
229
+ - `coh`: Cantor alpha (geometric coherence metric)
230
+
231
+ These convert to penalty multipliers via fusion formula.
232
+
233
+ ### Flow Matching
234
+ v-prediction objective:
235
+ ```
236
+ v* = α · ε - σ · x₀ (target)
237
+ v̂ = student(x_t, t) (prediction)
238
+ L_flow = MSE(v̂, v*)
239
+ ```
240
+
241
+ Where α, σ from noise schedule.
242
+
243
+ ### Per-Block KD
244
+ Cosine similarity on spatial-pooled features:
245
+ ```
246
+ L_kd = 1 - cosine_sim(
247
+ student_features.mean(spatial),
248
+ teacher_features.mean(spatial)
249
+ )
250
+ ```
251
+
252
+ ## Dependencies
253
+
254
+ ```
255
+ torch >= 2.0
256
+ diffusers >= 0.21
257
+ transformers >= 4.30
258
+ safetensors >= 0.3
259
+ huggingface_hub >= 0.16
260
+ ```
261
+
262
+ Plus custom repo: `geovocab2` (for David model and data synthesis)
263
+
264
+ ## Hardware Requirements
265
+
266
+ - **Training**: A100 40GB (FP16 mixed precision)
267
+ - **Inference**: RTX 3090 / A6000 (24GB)
268
+ - **Storage**: ~10GB for checkpoints + logs
269
+
270
+ ## Reproducibility
271
+
272
+ Training is deterministic with fixed seed (42), but:
273
+ - Depends on David checkpoint version
274
+ - May be sensitive to hardware (GPU type)
275
+ - Synthetic data generation has randomness
276
+
277
+ ## Limitations
278
+
279
+ 1. **Untested**: No validation that this works
280
+ 2. **SD1.5 Only**: Hardcoded for SD1.5 architecture
281
+ 3. **David Dependency**: Requires specific pre-trained model
282
+ 4. **Synthetic Data**: Trained on generated prompts, not real captions
283
+ 5. **No Safety**: Inherits SD1.5 biases, no content filtering
284
+
285
+ ## Ethical Considerations
286
+
287
+ - Inherits biases from SD1.5 training data
288
+ - No additional safety measures implemented
289
+ - Should not be deployed without content filtering
290
+ - Research purposes only
291
+
292
+ ## Citation
293
+
294
+ ```bibtex
295
+ @software{sd15flowmatch2024,
296
+ author = {AbstractPhil},
297
+ title = {SD1.5 Flow-Matching with Geometric Guidance (Experimental)},
298
+ year = {2024},
299
+ url = {https://huggingface.co/AbstractPhil/[model-name]},
300
+ note = {Experimental distillation approach, results unvalidated}
301
+ }
302
+ ```
303
+
304
+ ## License
305
+
306
+ MIT License
307
+
308
+ ## Related Work
309
+
310
+ - [GeoDavidCollective](https://huggingface.co/AbstractPhil/geo-david-collective-sd15-base-e40): Geometric assessor model
311
+ - [Stable Diffusion 1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5): Teacher model
312
+ - Flow Matching: Progressive distillation technique
313
+
314
+ ---
315
+
316
+ **Current Status:** 🧪 Experimental training in progress
317
+
318
+ **Do not use for production** - validation pending