AbstractPhil
/

tinyflux-experts

Model card Files Files and versions

xet

Community

AbstractPhil commited on 16 days ago

Commit

9e7c0a4

verified ·

1 Parent(s): f924f65

Update README.md

Browse files

Files changed (1) hide show

README.md +426 -3

README.md CHANGED Viewed

@@ -1,3 +1,426 @@
----
-license: mit
----

+---
+license: mit
+---
+# Flow Matching & Diffusion Prediction Types
+## A Practical Guide to Sol, Lune, and Epsilon Prediction
+---
+## Overview
+This document covers three distinct prediction paradigms used in diffusion and flow-matching models. Each was designed for different purposes and requires specific sampling procedures.
+| Model | Prediction Type | What It Learned | Output Character |
+|-------|----------------|-----------------|------------------|
+| **Standard SD1.5** | ε (epsilon/noise) | Remove noise | General purpose |
+| **Sol** | v (velocity) via DDPM | Geometric structure | Flat silhouettes, mass placement |
+| **Lune** | v (velocity) via flow | Texture and detail | Rich, detailed images |
+---
+SD15-Flow-Sol (velocity prediction epsilon converted):
+![image](https://cdn-uploads.huggingface.co/production/uploads/630cf55b15433862cfc9556f/FeF5L08KaozTq8X4TXaTU.png)
+SD15-Flow-Lune (rectified flow shift=3):
+![image](https://cdn-uploads.huggingface.co/production/uploads/630cf55b15433862cfc9556f/a33DpYjD_cwdfXm43SlS8.png)
+TinyFlux-Lailah
+![image](https://cdn-uploads.huggingface.co/production/uploads/630cf55b15433862cfc9556f/9Ek_vTrMDQUA1id37Lwys.png)
+## 1. Epsilon (ε) Prediction — Standard Diffusion
+### Core Concept
+> **"Predict the noise that was added"**
+The model learns to identify and remove noise from corrupted images.
+### The Formula (Simplified)
+```
+TRAINING:
+    x_noisy = √(α) * x_clean + √(1-α) * noise
+    ↓
+    Model predicts: ε̂ = "what noise was added?"
+    ↓
+    Loss = ||ε̂ - noise||²
+SAMPLING:
+    Start with pure noise
+    Repeatedly ask: "what noise is in this?"
+    Subtract a fraction of predicted noise
+    Repeat until clean
+```
+### Reading the Math
+- **α (alpha)**: "How much original image remains" (1 = all original, 0 = all noise)
+- **√(1-α)**: "How much noise was mixed in"
+- **ε**: The actual noise that was added
+- **ε̂**: Model's guess of what noise was added
+### Training Process
+```python
+# Forward diffusion (corruption)
+noise = torch.randn_like(x_clean)
+α = scheduler.alphas_cumprod[t]
+x_noisy = √α * x_clean + √(1-α) * noise
+# Model predicts noise
+ε_pred = model(x_noisy, t)
+# Loss: "Did you correctly identify the noise?"
+loss = MSE(ε_pred, noise)
+```
+### Sampling Process
+```python
+# DDPM/DDIM sampling
+for t in reversed(timesteps):  # 999 → 0
+    ε_pred = model(x, t)
+    x = scheduler.step(ε_pred, t, x)  # Removes predicted noise
+```
+### Utility & Behavior
+- **Strength**: General-purpose image generation
+- **Weakness**: No explicit understanding of image structure
+- **Use case**: Standard text-to-image generation
+---
+## 2. Velocity (v) Prediction — Sol (DDPM Framework)
+### Core Concept
+> **"Predict the direction from noise to data"**
+Sol predicts velocity but operates within the DDPM scheduler framework, requiring conversion from velocity to epsilon for sampling.
+### The Formula (Simplified)
+```
+TRAINING:
+    x_t = α * x_clean + σ * noise       (same as DDPM)
+    v = α * noise - σ * x_clean         (velocity target)
+    ↓
+    Model predicts: v̂ = "which way is the image?"
+    ↓
+    Loss = ||v̂ - v||²
+SAMPLING:
+    Convert velocity → epsilon
+    Use standard DDPM scheduler stepping
+```
+### Reading the Math
+- **v (velocity)**: Direction vector in latent space
+- **α (alpha)**: √(α_cumprod) — signal strength
+- **σ (sigma)**: √(1 - α_cumprod) — noise strength
+- **The velocity formula**: `v = α * ε - σ * x₀`
+  - "Velocity is the signal-weighted noise minus noise-weighted data"
+### Why Velocity in DDPM?
+Sol was trained with David (the geometric assessor) providing loss weighting. This setup used:
+- DDPM noise schedule for interpolation
+- Velocity prediction for training target
+- Knowledge distillation from a teacher
+The result: Sol learned **geometric structure** rather than textures.
+### Training Process (David-Weighted)
+```python
+# DDPM-style corruption
+noise = torch.randn_like(latents)
+t = torch.randint(0, 1000, (batch,))
+α = sqrt(scheduler.alphas_cumprod[t])
+σ = sqrt(1 - scheduler.alphas_cumprod[t])
+x_t = α * latents + σ * noise
+# Velocity target (NOT epsilon!)
+v_target = α * noise - σ * latents
+# Model predicts velocity
+v_pred = model(x_t, t)
+# David assesses geometric quality → adjusts loss weights
+loss_weights = david_assessor(features, t)
+loss = weighted_MSE(v_pred, v_target, loss_weights)
+```
+### Sampling Process (CRITICAL: v → ε conversion)
+```python
+# Must convert velocity to epsilon for DDPM scheduler
+scheduler = DDPMScheduler(num_train_timesteps=1000)
+for t in scheduler.timesteps:  # 999, 966, 933, ... → 0
+    v_pred = model(x, t)
+    # Convert velocity → epsilon
+    α = sqrt(scheduler.alphas_cumprod[t])
+    σ = sqrt(1 - scheduler.alphas_cumprod[t])
+    # Solve: v = α*ε - σ*x₀ and x_t = α*x₀ + σ*ε
+    # Result: x₀ = (α*x_t - σ*v) / (α² + σ²)
+    #         ε = (x_t - α*x₀) / σ
+    x0_hat = (α * x - σ * v_pred) / (α² + σ²)
+    ε_hat = (x - α * x0_hat) / σ
+    x = scheduler.step(ε_hat, t, x)  # Standard DDPM step with epsilon
+```
+### Utility & Behavior
+- **What Sol learned**: Platonic forms, silhouettes, mass distribution
+- **Visual output**: Flat geometric shapes, correct spatial layout, no texture
+- **Why this happened**: David rewarded geometric coherence, Sol optimized for clean David classification
+- **Use case**: Structural guidance, composition anchoring, "what goes where"
+### Sol's Unique Property
+Sol never "collapsed" — it learned the **skeleton** of images:
+- Castle prompt → Castle silhouette, horizon line, sky gradient
+- Portrait prompt → Head oval, shoulder mass, figure-ground separation
+- City prompt → Building masses, street perspective, light positions
+This is the "WHAT before HOW" that most diffusion models skip.
+---
+## 3. Velocity (v) Prediction — Lune (Rectified Flow)
+### Core Concept
+> **"Predict the straight-line direction from noise to data"**
+Lune uses true rectified flow matching where data travels in straight lines through latent space.
+### The Formula (Simplified)
+```
+TRAINING:
+    x_t = σ * noise + (1-σ) * data      (linear interpolation)
+    v = noise - data                     (constant velocity)
+    ↓
+    Model predicts: v̂ = "straight line to noise"
+    ↓
+    Loss = ||v̂ - v||²
+SAMPLING:
+    Start at σ=1 (noise)
+    Walk OPPOSITE to velocity (toward data)
+    End at σ=0 (clean image)
+```
+### Reading the Math
+- **σ (sigma)**: Interpolation parameter (1 = noise, 0 = data)
+- **x_t = σ·noise + (1-σ)·data**: Linear blend between noise and data
+- **v = noise - data**: The velocity is CONSTANT along the path
+- **Shift function**: `σ' = shift·σ / (1 + (shift-1)·σ)`
+  - Biases sampling toward cleaner images (spends more steps refining)
+### Key Difference from Sol
+| Aspect | Sol | Lune |
+|--------|-----|------|
+| Interpolation | DDPM (α, σ from scheduler) | Linear (σ, 1-σ) |
+| Velocity meaning | Complex (α·ε - σ·x₀) | Simple (noise - data) |
+| Sampling | Convert v→ε, use scheduler | Direct Euler integration |
+| Output | Geometric skeletons | Detailed images |
+### Training Process
+```python
+# Linear interpolation (NOT DDPM schedule!)
+noise = torch.randn_like(latents)
+σ = torch.rand(batch)  # Random sigma in [0, 1]
+# Apply shift during training
+σ_shifted = (shift * σ) / (1 + (shift - 1) * σ)
+σ = σ_shifted.view(-1, 1, 1, 1)
+x_t = σ * noise + (1 - σ) * latents
+# Velocity target: direction FROM data TO noise
+v_target = noise - latents
+# Model predicts velocity
+v_pred = model(x_t, σ * 1000)  # Timestep = σ * 1000
+loss = MSE(v_pred, v_target)
+```
+### Sampling Process (Direct Euler)
+```python
+# Start from pure noise (σ = 1)
+x = torch.randn(1, 4, 64, 64)
+# Sigma schedule: 1 → 0 with shift
+sigmas = torch.linspace(1, 0, steps + 1)
+sigmas = shift_sigma(sigmas, shift=3.0)
+for i in range(steps):
+    σ = sigmas[i]
+    σ_next = sigmas[i + 1]
+    dt = σ - σ_next  # Positive (going from 1 toward 0)
+    timestep = σ * 1000
+    v_pred = model(x, timestep)
+    # SUBTRACT velocity (v points toward noise, we go toward data)
+    x = x - v_pred * dt
+# x is now clean image latent
+```
+### Why SUBTRACT the Velocity?
+```
+v = noise - data  (points FROM data TO noise)
+We want to go FROM noise TO data (opposite direction!)
+So: x_new = x_current - v * dt
+         = x_current - (noise - data) * dt
+         = x_current + (data - noise) * dt  ← Moving toward data ✓
+```
+### Utility & Behavior
+- **What Lune learned**: Rich textures, fine details, realistic rendering
+- **Visual output**: Full detailed images with lighting, materials, depth
+- **Training focus**: Portrait/pose data with caption augmentation
+- **Use case**: High-quality image generation, detail refinement
+---
+## Comparison Summary
+### Training Targets
+```
+EPSILON (ε):     target = noise
+                 "What random noise was added?"
+VELOCITY (Sol):  target = α·noise - σ·data
+                 "What's the DDPM-weighted direction?"
+VELOCITY (Lune): target = noise - data
+                 "What's the straight-line direction?"
+```
+### Sampling Directions
+```
+EPSILON:         x_new = scheduler.step(ε_pred, t, x)
+                 Scheduler handles noise removal internally
+VELOCITY (Sol):  Convert v → ε, then scheduler.step(ε, t, x)
+                 Must translate to epsilon for DDPM math
+VELOCITY (Lune): x_new = x - v_pred * dt
+                 Direct Euler integration, subtract velocity
+```
+### Visual Intuition
+```
+EPSILON:
+    "There's noise hiding the image"
+    "I'll predict and remove the noise layer by layer"
+    → General-purpose denoising
+VELOCITY (Sol):
+    "I know which direction the image is"
+    "But I speak through DDPM's noise schedule"
+    → Learned structure, outputs skeletons
+VELOCITY (Lune):
+    "Straight line from noise to image"
+    "I'll walk that line step by step"
+    → Learned detail, outputs rich images
+```
+---
+## Practical Implementation Checklist
+### For Epsilon Models (Standard SD1.5)
+- [ ] Use DDPM/DDIM/Euler scheduler
+- [ ] Pass timestep as integer [0, 999]
+- [ ] Scheduler handles everything
+### For Sol (Velocity + DDPM)
+- [ ] Use DDPMScheduler
+- [ ] Model outputs velocity, NOT epsilon
+- [ ] Convert: `x0 = (α·x - σ·v) / (α² + σ²)`, then `ε = (x - α·x0) / σ`
+- [ ] Call `scheduler.step(ε, t, x)`
+- [ ] Expect geometric/structural output
+### For Lune (Velocity + Flow)
+- [ ] NO scheduler needed — direct Euler
+- [ ] Sigma goes 1 → 0 (not 0 → 1!)
+- [ ] Apply shift: `σ' = shift·σ / (1 + (shift-1)·σ)`
+- [ ] Timestep to model: `σ * 1000`
+- [ ] SUBTRACT velocity: `x = x - v * dt`
+- [ ] Expect detailed textured output
+---
+## Why This Matters for TinyFlux
+TinyFlux can leverage both experts:
+1. **Sol (early timesteps)**: Provides geometric anchoring
+   - "Where should the castle be?"
+   - "What's the horizon line?"
+   - "How is mass distributed?"
+2. **Lune (mid/late timesteps)**: Provides detail refinement
+   - "What texture is the stone?"
+   - "How does light fall?"
+   - "What color is the sky?"
+By combining geometric structure (Sol) with textural detail (Lune), TinyFlux can achieve better composition AND quality than either alone.
+---
+## Quick Reference Card
+```
+┌─────────────────────────────────────────────────────────────┐
+│                    PREDICTION TYPES                         │
+├─────────────────────────────────────────────────────────────┤
+│ EPSILON (ε)                                                 │
+│   Train: target = noise                                     │
+│   Sample: scheduler.step(ε_pred, t, x)                      │
+│   Output: General images                                    │
+├─────────────────────────────────────────────────────────────┤
+│ VELOCITY - SOL (DDPM framework)                             │
+│   Train: target = α·ε - σ·x₀                                │
+│   Sample: v→ε conversion, then scheduler.step(ε, t, x)      │
+│   Output: Geometric skeletons                               │
+├─────────────────────────────────────────────────────────────┤
+│ VELOCITY - LUNE (Rectified Flow)                            │
+│   Train: target = noise - data                              │
+│   Sample: x = x - v·dt  (Euler, σ: 1→0)                     │
+│   Output: Detailed textured images                          │
+└─────────────────────────────────────────────────────────────┘
+```
+---
+*Document Version: 1.0*
+*Last Updated: January 2026*
+*Authors: AbstractPhil & Claude OPUS 4.5*
+License: MIT