|
|
--- |
|
|
license: mit |
|
|
--- |
|
|
# Flow Matching & Diffusion Prediction Types |
|
|
## A Practical Guide to Sol, Lune, and Epsilon Prediction |
|
|
|
|
|
--- |
|
|
|
|
|
## Overview |
|
|
|
|
|
This document covers three distinct prediction paradigms used in diffusion and flow-matching models. Each was designed for different purposes and requires specific sampling procedures. |
|
|
|
|
|
| Model | Prediction Type | What It Learned | Output Character | |
|
|
|-------|----------------|-----------------|------------------| |
|
|
| **Standard SD1.5** | Ξ΅ (epsilon/noise) | Remove noise | General purpose | |
|
|
| **Sol** | v (velocity) via DDPM | Geometric structure | Flat silhouettes, mass placement | |
|
|
| **Lune** | v (velocity) via flow | Texture and detail | Rich, detailed images | |
|
|
|
|
|
--- |
|
|
|
|
|
SD15-Flow-Sol (velocity prediction epsilon converted): |
|
|
|
|
|
https://huggingface.co/AbstractPhil/tinyflux-experts/resolve/main/inference_sd15_flow_sol.py |
|
|
|
|
|
 |
|
|
|
|
|
|
|
|
SD15-Flow-Lune (rectified flow shift=2): |
|
|
|
|
|
https://huggingface.co/AbstractPhil/tinyflux-experts/resolve/main/inference_sd15_flow_lune.py |
|
|
|
|
|
 |
|
|
|
|
|
|
|
|
TinyFlux-Lailah |
|
|
|
|
|
tinyflux is currently in training and planning and is not yet ready to be used for production capacity. |
|
|
|
|
|
https://huggingface.co/AbstractPhil/tiny-flux-deep |
|
|
|
|
|
 |
|
|
|
|
|
|
|
|
## 1. Epsilon (Ξ΅) Prediction β Standard Diffusion |
|
|
|
|
|
### Core Concept |
|
|
> **"Predict the noise that was added"** |
|
|
|
|
|
The model learns to identify and remove noise from corrupted images. |
|
|
|
|
|
### The Formula (Simplified) |
|
|
|
|
|
``` |
|
|
TRAINING: |
|
|
x_noisy = β(Ξ±) * x_clean + β(1-Ξ±) * noise |
|
|
β |
|
|
Model predicts: Ξ΅Μ = "what noise was added?" |
|
|
β |
|
|
Loss = ||Ξ΅Μ - noise||Β² |
|
|
|
|
|
SAMPLING: |
|
|
Start with pure noise |
|
|
Repeatedly ask: "what noise is in this?" |
|
|
Subtract a fraction of predicted noise |
|
|
Repeat until clean |
|
|
``` |
|
|
|
|
|
### Reading the Math |
|
|
|
|
|
- **Ξ± (alpha)**: "How much original image remains" (1 = all original, 0 = all noise) |
|
|
- **β(1-Ξ±)**: "How much noise was mixed in" |
|
|
- **Ξ΅**: The actual noise that was added |
|
|
- **Ξ΅Μ**: Model's guess of what noise was added |
|
|
|
|
|
### Training Process |
|
|
|
|
|
```python |
|
|
# Forward diffusion (corruption) |
|
|
noise = torch.randn_like(x_clean) |
|
|
Ξ± = scheduler.alphas_cumprod[t] |
|
|
x_noisy = βΞ± * x_clean + β(1-Ξ±) * noise |
|
|
|
|
|
# Model predicts noise |
|
|
Ξ΅_pred = model(x_noisy, t) |
|
|
|
|
|
# Loss: "Did you correctly identify the noise?" |
|
|
loss = MSE(Ξ΅_pred, noise) |
|
|
``` |
|
|
|
|
|
### Sampling Process |
|
|
|
|
|
```python |
|
|
# DDPM/DDIM sampling |
|
|
for t in reversed(timesteps): # 999 β 0 |
|
|
Ξ΅_pred = model(x, t) |
|
|
x = scheduler.step(Ξ΅_pred, t, x) # Removes predicted noise |
|
|
``` |
|
|
|
|
|
### Utility & Behavior |
|
|
|
|
|
- **Strength**: General-purpose image generation |
|
|
- **Weakness**: No explicit understanding of image structure |
|
|
- **Use case**: Standard text-to-image generation |
|
|
|
|
|
--- |
|
|
|
|
|
## 2. Velocity (v) Prediction β Sol (DDPM Framework) |
|
|
|
|
|
### Core Concept |
|
|
> **"Predict the direction from noise to data"** |
|
|
|
|
|
Sol predicts velocity but operates within the DDPM scheduler framework, requiring conversion from velocity to epsilon for sampling. |
|
|
|
|
|
### The Formula (Simplified) |
|
|
|
|
|
``` |
|
|
TRAINING: |
|
|
x_t = Ξ± * x_clean + Ο * noise (same as DDPM) |
|
|
v = Ξ± * noise - Ο * x_clean (velocity target) |
|
|
β |
|
|
Model predicts: vΜ = "which way is the image?" |
|
|
β |
|
|
Loss = ||vΜ - v||Β² |
|
|
|
|
|
SAMPLING: |
|
|
Convert velocity β epsilon |
|
|
Use standard DDPM scheduler stepping |
|
|
``` |
|
|
|
|
|
### Reading the Math |
|
|
|
|
|
- **v (velocity)**: Direction vector in latent space |
|
|
- **Ξ± (alpha)**: β(Ξ±_cumprod) β signal strength |
|
|
- **Ο (sigma)**: β(1 - Ξ±_cumprod) β noise strength |
|
|
- **The velocity formula**: `v = Ξ± * Ξ΅ - Ο * xβ` |
|
|
- "Velocity is the signal-weighted noise minus noise-weighted data" |
|
|
|
|
|
### Why Velocity in DDPM? |
|
|
|
|
|
Sol was trained with David (the geometric assessor) providing loss weighting. This setup used: |
|
|
- DDPM noise schedule for interpolation |
|
|
- Velocity prediction for training target |
|
|
- Knowledge distillation from a teacher |
|
|
|
|
|
The result: Sol learned **geometric structure** rather than textures. |
|
|
|
|
|
### Training Process (David-Weighted) |
|
|
|
|
|
```python |
|
|
# DDPM-style corruption |
|
|
noise = torch.randn_like(latents) |
|
|
t = torch.randint(0, 1000, (batch,)) |
|
|
Ξ± = sqrt(scheduler.alphas_cumprod[t]) |
|
|
Ο = sqrt(1 - scheduler.alphas_cumprod[t]) |
|
|
|
|
|
x_t = Ξ± * latents + Ο * noise |
|
|
|
|
|
# Velocity target (NOT epsilon!) |
|
|
v_target = Ξ± * noise - Ο * latents |
|
|
|
|
|
# Model predicts velocity |
|
|
v_pred = model(x_t, t) |
|
|
|
|
|
# David assesses geometric quality β adjusts loss weights |
|
|
loss_weights = david_assessor(features, t) |
|
|
loss = weighted_MSE(v_pred, v_target, loss_weights) |
|
|
``` |
|
|
|
|
|
### Sampling Process (CRITICAL: v β Ξ΅ conversion) |
|
|
|
|
|
```python |
|
|
# Must convert velocity to epsilon for DDPM scheduler |
|
|
scheduler = DDPMScheduler(num_train_timesteps=1000) |
|
|
|
|
|
for t in scheduler.timesteps: # 999, 966, 933, ... β 0 |
|
|
v_pred = model(x, t) |
|
|
|
|
|
# Convert velocity β epsilon |
|
|
Ξ± = sqrt(scheduler.alphas_cumprod[t]) |
|
|
Ο = sqrt(1 - scheduler.alphas_cumprod[t]) |
|
|
|
|
|
# Solve: v = Ξ±*Ξ΅ - Ο*xβ and x_t = Ξ±*xβ + Ο*Ξ΅ |
|
|
# Result: xβ = (Ξ±*x_t - Ο*v) / (Ξ±Β² + ΟΒ²) |
|
|
# Ξ΅ = (x_t - Ξ±*xβ) / Ο |
|
|
|
|
|
x0_hat = (Ξ± * x - Ο * v_pred) / (Ξ±Β² + ΟΒ²) |
|
|
Ξ΅_hat = (x - Ξ± * x0_hat) / Ο |
|
|
|
|
|
x = scheduler.step(Ξ΅_hat, t, x) # Standard DDPM step with epsilon |
|
|
``` |
|
|
|
|
|
### Utility & Behavior |
|
|
|
|
|
- **What Sol learned**: Platonic forms, silhouettes, mass distribution |
|
|
- **Visual output**: Flat geometric shapes, correct spatial layout, no texture |
|
|
- **Why this happened**: David rewarded geometric coherence, Sol optimized for clean David classification |
|
|
- **Use case**: Structural guidance, composition anchoring, "what goes where" |
|
|
|
|
|
### Sol's Unique Property |
|
|
|
|
|
Sol never "collapsed" β it learned the **skeleton** of images: |
|
|
- Castle prompt β Castle silhouette, horizon line, sky gradient |
|
|
- Portrait prompt β Head oval, shoulder mass, figure-ground separation |
|
|
- City prompt β Building masses, street perspective, light positions |
|
|
|
|
|
This is the "WHAT before HOW" that most diffusion models skip. |
|
|
|
|
|
--- |
|
|
|
|
|
## 3. Velocity (v) Prediction β Lune (Rectified Flow) |
|
|
|
|
|
### Core Concept |
|
|
> **"Predict the straight-line direction from noise to data"** |
|
|
|
|
|
Lune uses true rectified flow matching where data travels in straight lines through latent space. |
|
|
|
|
|
### The Formula (Simplified) |
|
|
|
|
|
``` |
|
|
TRAINING: |
|
|
x_t = Ο * noise + (1-Ο) * data (linear interpolation) |
|
|
v = noise - data (constant velocity) |
|
|
β |
|
|
Model predicts: vΜ = "straight line to noise" |
|
|
β |
|
|
Loss = ||vΜ - v||Β² |
|
|
|
|
|
SAMPLING: |
|
|
Start at Ο=1 (noise) |
|
|
Walk OPPOSITE to velocity (toward data) |
|
|
End at Ο=0 (clean image) |
|
|
``` |
|
|
|
|
|
### Reading the Math |
|
|
|
|
|
- **Ο (sigma)**: Interpolation parameter (1 = noise, 0 = data) |
|
|
- **x_t = ΟΒ·noise + (1-Ο)Β·data**: Linear blend between noise and data |
|
|
- **v = noise - data**: The velocity is CONSTANT along the path |
|
|
- **Shift function**: `Ο' = shiftΒ·Ο / (1 + (shift-1)Β·Ο)` |
|
|
- Biases sampling toward cleaner images (spends more steps refining) |
|
|
|
|
|
### Key Difference from Sol |
|
|
|
|
|
| Aspect | Sol | Lune | |
|
|
|--------|-----|------| |
|
|
| Interpolation | DDPM (Ξ±, Ο from scheduler) | Linear (Ο, 1-Ο) | |
|
|
| Velocity meaning | Complex (Ξ±Β·Ξ΅ - ΟΒ·xβ) | Simple (noise - data) | |
|
|
| Sampling | Convert vβΞ΅, use scheduler | Direct Euler integration | |
|
|
| Output | Geometric skeletons | Detailed images | |
|
|
|
|
|
### Training Process |
|
|
|
|
|
```python |
|
|
# Linear interpolation (NOT DDPM schedule!) |
|
|
noise = torch.randn_like(latents) |
|
|
Ο = torch.rand(batch) # Random sigma in [0, 1] |
|
|
|
|
|
# Apply shift during training |
|
|
Ο_shifted = (shift * Ο) / (1 + (shift - 1) * Ο) |
|
|
Ο = Ο_shifted.view(-1, 1, 1, 1) |
|
|
|
|
|
x_t = Ο * noise + (1 - Ο) * latents |
|
|
|
|
|
# Velocity target: direction FROM data TO noise |
|
|
v_target = noise - latents |
|
|
|
|
|
# Model predicts velocity |
|
|
v_pred = model(x_t, Ο * 1000) # Timestep = Ο * 1000 |
|
|
|
|
|
loss = MSE(v_pred, v_target) |
|
|
``` |
|
|
|
|
|
### Sampling Process (Direct Euler) |
|
|
|
|
|
```python |
|
|
# Start from pure noise (Ο = 1) |
|
|
x = torch.randn(1, 4, 64, 64) |
|
|
|
|
|
# Sigma schedule: 1 β 0 with shift |
|
|
sigmas = torch.linspace(1, 0, steps + 1) |
|
|
sigmas = shift_sigma(sigmas, shift=3.0) |
|
|
|
|
|
for i in range(steps): |
|
|
Ο = sigmas[i] |
|
|
Ο_next = sigmas[i + 1] |
|
|
dt = Ο - Ο_next # Positive (going from 1 toward 0) |
|
|
|
|
|
timestep = Ο * 1000 |
|
|
v_pred = model(x, timestep) |
|
|
|
|
|
# SUBTRACT velocity (v points toward noise, we go toward data) |
|
|
x = x - v_pred * dt |
|
|
|
|
|
# x is now clean image latent |
|
|
``` |
|
|
|
|
|
### Why SUBTRACT the Velocity? |
|
|
|
|
|
``` |
|
|
v = noise - data (points FROM data TO noise) |
|
|
|
|
|
We want to go FROM noise TO data (opposite direction!) |
|
|
|
|
|
So: x_new = x_current - v * dt |
|
|
= x_current - (noise - data) * dt |
|
|
= x_current + (data - noise) * dt β Moving toward data β |
|
|
``` |
|
|
|
|
|
### Utility & Behavior |
|
|
|
|
|
- **What Lune learned**: Rich textures, fine details, realistic rendering |
|
|
- **Visual output**: Full detailed images with lighting, materials, depth |
|
|
- **Training focus**: Portrait/pose data with caption augmentation |
|
|
- **Use case**: High-quality image generation, detail refinement |
|
|
|
|
|
--- |
|
|
|
|
|
## Comparison Summary |
|
|
|
|
|
### Training Targets |
|
|
|
|
|
``` |
|
|
EPSILON (Ξ΅): target = noise |
|
|
"What random noise was added?" |
|
|
|
|
|
VELOCITY (Sol): target = Ξ±Β·noise - ΟΒ·data |
|
|
"What's the DDPM-weighted direction?" |
|
|
|
|
|
VELOCITY (Lune): target = noise - data |
|
|
"What's the straight-line direction?" |
|
|
``` |
|
|
|
|
|
### Sampling Directions |
|
|
|
|
|
``` |
|
|
EPSILON: x_new = scheduler.step(Ξ΅_pred, t, x) |
|
|
Scheduler handles noise removal internally |
|
|
|
|
|
VELOCITY (Sol): Convert v β Ξ΅, then scheduler.step(Ξ΅, t, x) |
|
|
Must translate to epsilon for DDPM math |
|
|
|
|
|
VELOCITY (Lune): x_new = x - v_pred * dt |
|
|
Direct Euler integration, subtract velocity |
|
|
``` |
|
|
|
|
|
### Visual Intuition |
|
|
|
|
|
``` |
|
|
EPSILON: |
|
|
"There's noise hiding the image" |
|
|
"I'll predict and remove the noise layer by layer" |
|
|
β General-purpose denoising |
|
|
|
|
|
VELOCITY (Sol): |
|
|
"I know which direction the image is" |
|
|
"But I speak through DDPM's noise schedule" |
|
|
β Learned structure, outputs skeletons |
|
|
|
|
|
VELOCITY (Lune): |
|
|
"Straight line from noise to image" |
|
|
"I'll walk that line step by step" |
|
|
β Learned detail, outputs rich images |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Practical Implementation Checklist |
|
|
|
|
|
### For Epsilon Models (Standard SD1.5) |
|
|
- [ ] Use DDPM/DDIM/Euler scheduler |
|
|
- [ ] Pass timestep as integer [0, 999] |
|
|
- [ ] Scheduler handles everything |
|
|
|
|
|
### For Sol (Velocity + DDPM) |
|
|
- [ ] Use DDPMScheduler |
|
|
- [ ] Model outputs velocity, NOT epsilon |
|
|
- [ ] Convert: `x0 = (Ξ±Β·x - ΟΒ·v) / (Ξ±Β² + ΟΒ²)`, then `Ξ΅ = (x - Ξ±Β·x0) / Ο` |
|
|
- [ ] Call `scheduler.step(Ξ΅, t, x)` |
|
|
- [ ] Expect geometric/structural output |
|
|
|
|
|
### For Lune (Velocity + Flow) |
|
|
- [ ] NO scheduler needed β direct Euler |
|
|
- [ ] Sigma goes 1 β 0 (not 0 β 1!) |
|
|
- [ ] Apply shift: `Ο' = shiftΒ·Ο / (1 + (shift-1)Β·Ο)` |
|
|
- [ ] Timestep to model: `Ο * 1000` |
|
|
- [ ] SUBTRACT velocity: `x = x - v * dt` |
|
|
- [ ] Expect detailed textured output |
|
|
|
|
|
--- |
|
|
|
|
|
## Why This Matters for TinyFlux |
|
|
|
|
|
TinyFlux can leverage both experts: |
|
|
|
|
|
1. **Sol (early timesteps)**: Provides geometric anchoring |
|
|
- "Where should the castle be?" |
|
|
- "What's the horizon line?" |
|
|
- "How is mass distributed?" |
|
|
|
|
|
2. **Lune (mid/late timesteps)**: Provides detail refinement |
|
|
- "What texture is the stone?" |
|
|
- "How does light fall?" |
|
|
- "What color is the sky?" |
|
|
|
|
|
By combining geometric structure (Sol) with textural detail (Lune), TinyFlux can achieve better composition AND quality than either alone. |
|
|
|
|
|
--- |
|
|
|
|
|
## Quick Reference Card |
|
|
|
|
|
``` |
|
|
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
|
|
β PREDICTION TYPES β |
|
|
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ |
|
|
β EPSILON (Ξ΅) β |
|
|
β Train: target = noise β |
|
|
β Sample: scheduler.step(Ξ΅_pred, t, x) β |
|
|
β Output: General images β |
|
|
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ |
|
|
β VELOCITY - SOL (DDPM framework) β |
|
|
β Train: target = Ξ±Β·Ξ΅ - ΟΒ·xβ β |
|
|
β Sample: vβΞ΅ conversion, then scheduler.step(Ξ΅, t, x) β |
|
|
β Output: Geometric skeletons β |
|
|
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ |
|
|
β VELOCITY - LUNE (Rectified Flow) β |
|
|
β Train: target = noise - data β |
|
|
β Sample: x = x - vΒ·dt (Euler, Ο: 1β0) β |
|
|
β Output: Detailed textured images β |
|
|
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
*Document Version: 1.0* |
|
|
*Last Updated: January 2026* |
|
|
*Authors: AbstractPhil & Claude OPUS 4.5* |
|
|
|
|
|
License: MIT |