File size: 13,643 Bytes

---
license: mit
---
# Flow Matching & Diffusion Prediction Types
## A Practical Guide to Sol, Lune, and Epsilon Prediction

---

## Overview

This document covers three distinct prediction paradigms used in diffusion and flow-matching models. Each was designed for different purposes and requires specific sampling procedures.

| Model | Prediction Type | What It Learned | Output Character |
|-------|----------------|-----------------|------------------|
| **Standard SD1.5** | ε (epsilon/noise) | Remove noise | General purpose |
| **Sol** | v (velocity) via DDPM | Geometric structure | Flat silhouettes, mass placement |
| **Lune** | v (velocity) via flow | Texture and detail | Rich, detailed images |

---

SD15-Flow-Sol (velocity prediction epsilon converted):

https://huggingface.co/AbstractPhil/tinyflux-experts/resolve/main/inference_sd15_flow_sol.py

![image](https://cdn-uploads.huggingface.co/production/uploads/630cf55b15433862cfc9556f/FeF5L08KaozTq8X4TXaTU.png)


SD15-Flow-Lune (rectified flow shift=2):

https://huggingface.co/AbstractPhil/tinyflux-experts/resolve/main/inference_sd15_flow_lune.py

![image](https://cdn-uploads.huggingface.co/production/uploads/630cf55b15433862cfc9556f/a33DpYjD_cwdfXm43SlS8.png)


TinyFlux-Lailah

tinyflux is currently in training and planning and is not yet ready to be used for production capacity.

https://huggingface.co/AbstractPhil/tiny-flux-deep 

![image](https://cdn-uploads.huggingface.co/production/uploads/630cf55b15433862cfc9556f/9Ek_vTrMDQUA1id37Lwys.png)


## 1. Epsilon (ε) Prediction — Standard Diffusion

### Core Concept
> **"Predict the noise that was added"**

The model learns to identify and remove noise from corrupted images.

### The Formula (Simplified)

```
TRAINING:
    x_noisy = √(α) * x_clean + √(1-α) * noise
    ↓
    Model predicts: ε̂ = "what noise was added?"
    ↓
    Loss = ||ε̂ - noise||²

SAMPLING:
    Start with pure noise
    Repeatedly ask: "what noise is in this?"
    Subtract a fraction of predicted noise
    Repeat until clean
```

### Reading the Math

- **α (alpha)**: "How much original image remains" (1 = all original, 0 = all noise)
- **√(1-α)**: "How much noise was mixed in"
- **ε**: The actual noise that was added
- **ε̂**: Model's guess of what noise was added

### Training Process

```python
# Forward diffusion (corruption)
noise = torch.randn_like(x_clean)
α = scheduler.alphas_cumprod[t]
x_noisy = √α * x_clean + √(1-α) * noise

# Model predicts noise
ε_pred = model(x_noisy, t)

# Loss: "Did you correctly identify the noise?"
loss = MSE(ε_pred, noise)
```

### Sampling Process

```python
# DDPM/DDIM sampling
for t in reversed(timesteps):  # 999 → 0
    ε_pred = model(x, t)
    x = scheduler.step(ε_pred, t, x)  # Removes predicted noise
```

### Utility & Behavior

- **Strength**: General-purpose image generation
- **Weakness**: No explicit understanding of image structure
- **Use case**: Standard text-to-image generation

---

## 2. Velocity (v) Prediction — Sol (DDPM Framework)

### Core Concept
> **"Predict the direction from noise to data"**

Sol predicts velocity but operates within the DDPM scheduler framework, requiring conversion from velocity to epsilon for sampling.

### The Formula (Simplified)

```
TRAINING:
    x_t = α * x_clean + σ * noise       (same as DDPM)
    v = α * noise - σ * x_clean         (velocity target)
    ↓
    Model predicts: v̂ = "which way is the image?"
    ↓
    Loss = ||v̂ - v||²

SAMPLING:
    Convert velocity → epsilon
    Use standard DDPM scheduler stepping
```

### Reading the Math

- **v (velocity)**: Direction vector in latent space
- **α (alpha)**: √(α_cumprod) — signal strength
- **σ (sigma)**: √(1 - α_cumprod) — noise strength
- **The velocity formula**: `v = α * ε - σ * x₀`
  - "Velocity is the signal-weighted noise minus noise-weighted data"

### Why Velocity in DDPM?

Sol was trained with David (the geometric assessor) providing loss weighting. This setup used:
- DDPM noise schedule for interpolation
- Velocity prediction for training target
- Knowledge distillation from a teacher

The result: Sol learned **geometric structure** rather than textures.

### Training Process (David-Weighted)

```python
# DDPM-style corruption
noise = torch.randn_like(latents)
t = torch.randint(0, 1000, (batch,))
α = sqrt(scheduler.alphas_cumprod[t])
σ = sqrt(1 - scheduler.alphas_cumprod[t])

x_t = α * latents + σ * noise

# Velocity target (NOT epsilon!)
v_target = α * noise - σ * latents

# Model predicts velocity
v_pred = model(x_t, t)

# David assesses geometric quality → adjusts loss weights
loss_weights = david_assessor(features, t)
loss = weighted_MSE(v_pred, v_target, loss_weights)
```

### Sampling Process (CRITICAL: v → ε conversion)

```python
# Must convert velocity to epsilon for DDPM scheduler
scheduler = DDPMScheduler(num_train_timesteps=1000)

for t in scheduler.timesteps:  # 999, 966, 933, ... → 0
    v_pred = model(x, t)
    
    # Convert velocity → epsilon
    α = sqrt(scheduler.alphas_cumprod[t])
    σ = sqrt(1 - scheduler.alphas_cumprod[t])
    
    # Solve: v = α*ε - σ*x₀ and x_t = α*x₀ + σ*ε
    # Result: x₀ = (α*x_t - σ*v) / (α² + σ²)
    #         ε = (x_t - α*x₀) / σ
    
    x0_hat = (α * x - σ * v_pred) / (α² + σ²)
    ε_hat = (x - α * x0_hat) / σ
    
    x = scheduler.step(ε_hat, t, x)  # Standard DDPM step with epsilon
```

### Utility & Behavior

- **What Sol learned**: Platonic forms, silhouettes, mass distribution
- **Visual output**: Flat geometric shapes, correct spatial layout, no texture
- **Why this happened**: David rewarded geometric coherence, Sol optimized for clean David classification
- **Use case**: Structural guidance, composition anchoring, "what goes where"

### Sol's Unique Property

Sol never "collapsed" — it learned the **skeleton** of images:
- Castle prompt → Castle silhouette, horizon line, sky gradient
- Portrait prompt → Head oval, shoulder mass, figure-ground separation
- City prompt → Building masses, street perspective, light positions

This is the "WHAT before HOW" that most diffusion models skip.

---

## 3. Velocity (v) Prediction — Lune (Rectified Flow)

### Core Concept
> **"Predict the straight-line direction from noise to data"**

Lune uses true rectified flow matching where data travels in straight lines through latent space.

### The Formula (Simplified)

```
TRAINING:
    x_t = σ * noise + (1-σ) * data      (linear interpolation)
    v = noise - data                     (constant velocity)
    ↓
    Model predicts: v̂ = "straight line to noise"
    ↓
    Loss = ||v̂ - v||²

SAMPLING:
    Start at σ=1 (noise)
    Walk OPPOSITE to velocity (toward data)
    End at σ=0 (clean image)
```

### Reading the Math

- **σ (sigma)**: Interpolation parameter (1 = noise, 0 = data)
- **x_t = σ·noise + (1-σ)·data**: Linear blend between noise and data
- **v = noise - data**: The velocity is CONSTANT along the path
- **Shift function**: `σ' = shift·σ / (1 + (shift-1)·σ)`
  - Biases sampling toward cleaner images (spends more steps refining)

### Key Difference from Sol

| Aspect | Sol | Lune |
|--------|-----|------|
| Interpolation | DDPM (α, σ from scheduler) | Linear (σ, 1-σ) |
| Velocity meaning | Complex (α·ε - σ·x₀) | Simple (noise - data) |
| Sampling | Convert v→ε, use scheduler | Direct Euler integration |
| Output | Geometric skeletons | Detailed images |

### Training Process

```python
# Linear interpolation (NOT DDPM schedule!)
noise = torch.randn_like(latents)
σ = torch.rand(batch)  # Random sigma in [0, 1]

# Apply shift during training
σ_shifted = (shift * σ) / (1 + (shift - 1) * σ)
σ = σ_shifted.view(-1, 1, 1, 1)

x_t = σ * noise + (1 - σ) * latents

# Velocity target: direction FROM data TO noise
v_target = noise - latents

# Model predicts velocity
v_pred = model(x_t, σ * 1000)  # Timestep = σ * 1000

loss = MSE(v_pred, v_target)
```

### Sampling Process (Direct Euler)

```python
# Start from pure noise (σ = 1)
x = torch.randn(1, 4, 64, 64)

# Sigma schedule: 1 → 0 with shift
sigmas = torch.linspace(1, 0, steps + 1)
sigmas = shift_sigma(sigmas, shift=3.0)

for i in range(steps):
    σ = sigmas[i]
    σ_next = sigmas[i + 1]
    dt = σ - σ_next  # Positive (going from 1 toward 0)
    
    timestep = σ * 1000
    v_pred = model(x, timestep)
    
    # SUBTRACT velocity (v points toward noise, we go toward data)
    x = x - v_pred * dt

# x is now clean image latent
```

### Why SUBTRACT the Velocity?

```
v = noise - data  (points FROM data TO noise)

We want to go FROM noise TO data (opposite direction!)

So: x_new = x_current - v * dt
         = x_current - (noise - data) * dt
         = x_current + (data - noise) * dt  ← Moving toward data ✓
```

### Utility & Behavior

- **What Lune learned**: Rich textures, fine details, realistic rendering
- **Visual output**: Full detailed images with lighting, materials, depth
- **Training focus**: Portrait/pose data with caption augmentation
- **Use case**: High-quality image generation, detail refinement

---

## Comparison Summary

### Training Targets

```
EPSILON (ε):     target = noise
                 "What random noise was added?"

VELOCITY (Sol):  target = α·noise - σ·data  
                 "What's the DDPM-weighted direction?"

VELOCITY (Lune): target = noise - data
                 "What's the straight-line direction?"
```

### Sampling Directions

```
EPSILON:         x_new = scheduler.step(ε_pred, t, x)
                 Scheduler handles noise removal internally

VELOCITY (Sol):  Convert v → ε, then scheduler.step(ε, t, x)
                 Must translate to epsilon for DDPM math

VELOCITY (Lune): x_new = x - v_pred * dt
                 Direct Euler integration, subtract velocity
```

### Visual Intuition

```
EPSILON:
    "There's noise hiding the image"
    "I'll predict and remove the noise layer by layer"
    → General-purpose denoising

VELOCITY (Sol):  
    "I know which direction the image is"
    "But I speak through DDPM's noise schedule"
    → Learned structure, outputs skeletons

VELOCITY (Lune):
    "Straight line from noise to image"
    "I'll walk that line step by step"  
    → Learned detail, outputs rich images
```

---

## Practical Implementation Checklist

### For Epsilon Models (Standard SD1.5)
- [ ] Use DDPM/DDIM/Euler scheduler
- [ ] Pass timestep as integer [0, 999]
- [ ] Scheduler handles everything

### For Sol (Velocity + DDPM)
- [ ] Use DDPMScheduler
- [ ] Model outputs velocity, NOT epsilon
- [ ] Convert: `x0 = (α·x - σ·v) / (α² + σ²)`, then `ε = (x - α·x0) / σ`
- [ ] Call `scheduler.step(ε, t, x)`
- [ ] Expect geometric/structural output

### For Lune (Velocity + Flow)
- [ ] NO scheduler needed — direct Euler
- [ ] Sigma goes 1 → 0 (not 0 → 1!)
- [ ] Apply shift: `σ' = shift·σ / (1 + (shift-1)·σ)`
- [ ] Timestep to model: `σ * 1000`
- [ ] SUBTRACT velocity: `x = x - v * dt`
- [ ] Expect detailed textured output

---

## Why This Matters for TinyFlux

TinyFlux can leverage both experts:

1. **Sol (early timesteps)**: Provides geometric anchoring
   - "Where should the castle be?"
   - "What's the horizon line?"
   - "How is mass distributed?"

2. **Lune (mid/late timesteps)**: Provides detail refinement
   - "What texture is the stone?"
   - "How does light fall?"
   - "What color is the sky?"

By combining geometric structure (Sol) with textural detail (Lune), TinyFlux can achieve better composition AND quality than either alone.

---

## Quick Reference Card

```
┌─────────────────────────────────────────────────────────────┐
│                    PREDICTION TYPES                         │
├─────────────────────────────────────────────────────────────┤
│ EPSILON (ε)                                                 │
│   Train: target = noise                                     │
│   Sample: scheduler.step(ε_pred, t, x)                      │
│   Output: General images                                    │
├─────────────────────────────────────────────────────────────┤
│ VELOCITY - SOL (DDPM framework)                             │
│   Train: target = α·ε - σ·x₀                                │
│   Sample: v→ε conversion, then scheduler.step(ε, t, x)      │
│   Output: Geometric skeletons                               │
├─────────────────────────────────────────────────────────────┤
│ VELOCITY - LUNE (Rectified Flow)                            │
│   Train: target = noise - data                              │
│   Sample: x = x - v·dt  (Euler, σ: 1→0)                     │
│   Output: Detailed textured images                          │
└─────────────────────────────────────────────────────────────┘
```

---

*Document Version: 1.0*
*Last Updated: January 2026*
*Authors: AbstractPhil & Claude OPUS 4.5*

License: MIT