AbstractPhil
/

sd15-flow-matching

+---
+license: mit
+base_model: runwayml/stable-diffusion-v1-5
+tags:
+  - stable-diffusion
+  - diffusion
+  - distillation
+  - flow-matching
+  - geometric-deep-learning
+  - research
+library_name: diffusers
+pipeline_tag: text-to-image
+---
+# SD1.5 Flow-Matching Distillation with Geometric Guidance (EXPERIMENTAL)
+## ⚠️ Experimental Research
+**Status:** Training in progress | No guarantees of convergence or quality
+This is an experimental approach to distilling Stable Diffusion 1.5 using flow matching with geometric guidance from [GeoDavidCollective](https://huggingface.co/AbstractPhil/geo-david-collective-sd15-base-e40). Results are not yet validated.
+## Overview
+This trainer attempts to distill Stable Diffusion 1.5 using **v-prediction flow matching** with **adaptive per-block weighting** based on geometric quality assessment. Unlike traditional distillation that treats all UNet blocks equally, this approach uses a pre-trained geometric model (David) to evaluate student features and dynamically adjust training emphasis per block.
+**Hypothesis:** Geometric guidance may help the student learn SD1.5's internal structure more effectively by:
+- Identifying which blocks are learning poorly
+- Applying stronger supervision where needed
+- Maintaining geometric stability during training
+**Status:** Hypothesis untested. Requires ablation study comparing David-guided vs. vanilla flow matching.
+## Architecture
+### Three-Component System
+```
+Teacher (SD1.5 UNet, frozen, FP16)
+  ↓ provides ε* → v* targets + features
+Student (Trainable UNet, FP16)
+  ↓ predicts v̂ + features
+Flow Matching Loss: MSE(v̂, v*)
++
+David Assessor (GeoDavidCollective, frozen, 872M params)
+  ↓ evaluates student features per block
+  ↓ outputs: e_t (timestep error), e_p (pattern entropy), coh (coherence)
+Fusion System: λ_b = w_b · (1 + α·e_t + β·e_p + δ·(1-coh))
+  ↓ converts metrics to per-block penalties
+Block Losses: Σ λ_b · (KD loss per block)
+Total: L_flow + block_weight · L_blocks
+```
+### Components
+**Teacher**: SD1.5 UNet (frozen, FP16)
+- Provides ground truth for flow matching
+- Extracts spatial features per block
+**Student**: Trainable UNet (FP16)
+- Initialized from teacher weights
+- Learns v-prediction objective
+- Features assessed by David
+**David**: GeoDavidCollective (frozen)
+- Pre-trained geometric model
+- Evaluates feature quality per block
+- Provides adaptive weighting signals
+**Fusion**: Dynamic penalty calculator
+- `λ_b = w_b · (1 + α·e_t + β·e_p + δ·(1-coh))`
+- Bounded: `[0.5, 3.0]`
+- Higher λ = more training emphasis
+## Training Configuration
+### Dataset
+```yaml
+Source: SymbolicPromptDataset (synthetic prompts)
+Samples: 200,000
+Batch Size: 64
+Epochs: 10
+Workers: 2
+```
+### Optimization
+```yaml
+Optimizer: AdamW
+Learning Rate: 1e-4
+Weight Decay: 1e-3
+Scheduler: CosineAnnealingLR
+Gradient Clipping: 1.0
+Mixed Precision: Enabled (FP16)
+```
+### Loss Weights
+```yaml
+Global Flow Weight: 1.0
+Block Penalty Weight: 0.05  # Critical hyperparameter!
+KD Weight: 0.25 (cosine similarity on pooled features)
+Local Flow Heads: Disabled
+```
+### David Fusion
+```yaml
+Base Block Weights:
+  down_0: 0.7, down_1: 0.9, down_2: 1.0, down_3: 1.1
+  mid: 1.2, up_0: 1.1, up_1: 1.0, up_2: 0.9, up_3: 0.7
+Fusion Coefficients:
+  alpha (timestep): 0.5
+  beta (pattern): 0.25
+  delta (incoherence): 0.25
+Lambda Bounds: [0.5, 3.0]
+```
+## Training Progress (Epoch 1/10)
+### Current Metrics
+```
+L_total: 0.24
+L_flow: 0.23
+L_blocks: 0.07
+Speed: ~1.5 it/s (A100)
+```
+**Interpretation:**
+- Block losses balanced after fixing `block_penalty_weight`
+- Flow loss converging as expected
+- No evidence of collapse or divergence yet
+### Expected Timeline (Unvalidated)
+```
+Epoch 1-2: Loss stabilization
+Epoch 3-5: Feature structure learning (images may be blurry)
+Epoch 8-10: Potential convergence (quality unknown)
+```
+**Note:** No baseline comparison yet. Cannot claim faster/better convergence without ablation study.
+## Model Files
+Training saves checkpoints as:
+```
+checkpoints/
+├── checkpoint_epoch_002.safetensors
+├── checkpoint_epoch_004.safetensors
+└── final.safetensors
+```
+Each checkpoint contains student UNet weights only.
+## Inference
+Model can be sampled using standard diffusion samplers (DDPM, DDIM) with v-prediction:
+```python
+# Pseudocode - implementation details TBD
+x_t = noise
+for t in reversed(timesteps):
+    v = student_unet(x_t, t, text_embeddings)
+    x_t = step(x_t, v, t)  # v-prediction update
+image = vae.decode(x_t)
+```
+Requires SD1.5 VAE and text encoder (not included in checkpoint).
+## Known Issues
+- ❓ No proof this approach works better than vanilla distillation
+- ❓ Optimal `block_penalty_weight` unknown (currently 0.05)
+- ❓ May require tuning lambda bounds for different datasets
+- ❓ Inference quality unvalidated
+- ❌ Not compatible with ComfyUI without conversion (details TBD)
+- ❌ No SD1.5 components included (VAE, text encoder)
+## Future Work
+### Required Validation
+1. **Ablation Study**: Train identical model WITHOUT David guidance
+2. **Quality Metrics**: FID, CLIP score vs. SD1.5 baseline
+3. **Convergence Analysis**: Compare learning curves
+4. **Inference Testing**: Visual quality assessment
+### Potential Improvements
+- Adaptive `block_penalty_weight` scheduling
+- Per-block learning rates
+- David warmup strategy
+- Better fusion formulas
+## Experimental Design
+### Hypothesis
+Geometric guidance from David will improve distillation by:
+1. Identifying poorly-learning blocks
+2. Applying adaptive supervision
+3. Maintaining feature geometry
+### Test Plan
+```
+Control: SD1.5 flow matching (no David)
+Treatment: SD1.5 flow matching + David guidance
+Metrics: Loss curves, FID, CLIP score, visual quality
+```
+### Success Criteria
+- Faster convergence (fewer epochs to target loss)
+- Better final quality (lower FID)
+- More stable training (less variance)
+**Status:** Experiment in progress, no results yet.
+## Technical Details
+### David Assessment
+Per block, David outputs:
+- `e_t`: Cross-entropy on timestep classification (proxy for temporal understanding)
+- `e_p`: Entropy on pattern classification (proxy for feature diversity)
+- `coh`: Cantor alpha (geometric coherence metric)
+These convert to penalty multipliers via fusion formula.
+### Flow Matching
+v-prediction objective:
+```
+v* = α · ε - σ · x₀  (target)
+v̂ = student(x_t, t)  (prediction)
+L_flow = MSE(v̂, v*)
+```
+Where α, σ from noise schedule.
+### Per-Block KD
+Cosine similarity on spatial-pooled features:
+```
+L_kd = 1 - cosine_sim(
+    student_features.mean(spatial),
+    teacher_features.mean(spatial)
+)
+```
+## Dependencies
+```
+torch >= 2.0
+diffusers >= 0.21
+transformers >= 4.30
+safetensors >= 0.3
+huggingface_hub >= 0.16
+```
+Plus custom repo: `geovocab2` (for David model and data synthesis)
+## Hardware Requirements
+- **Training**: A100 40GB (FP16 mixed precision)
+- **Inference**: RTX 3090 / A6000 (24GB)
+- **Storage**: ~10GB for checkpoints + logs
+## Reproducibility
+Training is deterministic with fixed seed (42), but:
+- Depends on David checkpoint version
+- May be sensitive to hardware (GPU type)
+- Synthetic data generation has randomness
+## Limitations
+1. **Untested**: No validation that this works
+2. **SD1.5 Only**: Hardcoded for SD1.5 architecture
+3. **David Dependency**: Requires specific pre-trained model
+4. **Synthetic Data**: Trained on generated prompts, not real captions
+5. **No Safety**: Inherits SD1.5 biases, no content filtering
+## Ethical Considerations
+- Inherits biases from SD1.5 training data
+- No additional safety measures implemented
+- Should not be deployed without content filtering
+- Research purposes only
+## Citation
+```bibtex
+@software{sd15flowmatch2024,
+  author = {AbstractPhil},
+  title = {SD1.5 Flow-Matching with Geometric Guidance (Experimental)},
+  year = {2024},
+  url = {https://huggingface.co/AbstractPhil/[model-name]},
+  note = {Experimental distillation approach, results unvalidated}
+}
+```
+## License
+MIT License
+## Related Work
+- [GeoDavidCollective](https://huggingface.co/AbstractPhil/geo-david-collective-sd15-base-e40): Geometric assessor model
+- [Stable Diffusion 1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5): Teacher model
+- Flow Matching: Progressive distillation technique
+---
+**Current Status:** 🧪 Experimental training in progress
+**Do not use for production** - validation pending