LightDiffusion-Next / docs /advanced-cfg-optimizations.md
Aatricks's picture
Deploy ZeroGPU Gradio Space snapshot
b701455
# Advanced CFG Optimizations
## Overview
This document describes three advanced optimizations for Classifier-Free Guidance (CFG) that improve both quality and performance in LightDiffusion-Next:
1. **Batched CFG Computation** - Speed optimization
2. **Dynamic CFG Rescaling** - Quality optimization
3. **Adaptive Noise Scheduling** - Quality & speed optimization
## 1. Batched CFG Computation
### What It Does
Instead of running two separate forward passes for conditional and unconditional predictions, this optimization can combine them into a single batched forward pass.
**Before:**
```python
# Two separate forward passes
cond_pred = model(x, timestep, cond) # Pass 1
uncond_pred = model(x, timestep, uncond) # Pass 2
result = uncond_pred + cfg_scale * (cond_pred - uncond_pred)
```
**After:**
```python
# Single batched forward pass
both_preds = model(x, timestep, [cond, uncond]) # Single pass
cond_pred, uncond_pred = both_preds[0], both_preds[1]
result = uncond_pred + cfg_scale * (cond_pred - uncond_pred)
```
### Performance Impact
- **Speed**: ~1.8-2x faster CFG computation
- **Memory**: Same or slightly less (batch processing)
- **Quality**: Identical to baseline
### Usage
```python
from src.sample import sampling
samples = sampling.sample1(
model=model,
noise=noise,
steps=20,
cfg=7.5,
# ... other params ...
batched_cfg=True, # Joint cond/uncond batching (default: True)
)
```
In the current implementation, the heavy lifting still happens in the central conditioning packing path. `batched_cfg` controls whether conditional and unconditional branches are packed together into the same forward pass when possible. Conditioning chunks within each branch are still packed by the shared batching logic.
### When to Use
- **Usually recommended** - This reduces duplicate cond/uncond forward passes when memory allows
- Particularly beneficial for high-resolution images or batch generation
- Compatible with all samplers and schedulers
---
## 2. Dynamic CFG Rescaling
### What It Does
Dynamically adjusts the CFG scale based on prediction statistics to prevent over-saturation while maintaining prompt adherence.
### The Problem
High CFG values (7-12) improve prompt following but can cause:
- Over-saturated colors
- Over-sharpened edges ("halo effect")
- Loss of fine details
- Unnatural, "CG-like" appearance
### The Solution
Dynamic CFG rescaling analyzes the guidance vector (difference between conditional and unconditional predictions) and adjusts the CFG scale to keep it within an optimal range.
**Two Methods:**
#### Variance Method (Recommended)
```python
guidance_std = std(cond_pred - uncond_pred)
adjusted_cfg = cfg_scale * (target_scale / (1 + guidance_std))
```
Best for: General use, prevents over-saturation
#### Range Method
```python
guidance_range = percentile(guidance, 95) - percentile(guidance, 5)
adjusted_cfg = cfg_scale * (target_scale / guidance_range)
```
Best for: Extreme cases, outlier filtering
### Performance Impact
- **Speed**: Minimal overhead (~2-5%)
- **Quality**: Improved color balance, reduced artifacts
- **Prompt Adherence**: Maintained or improved
### Usage
```python
samples = sampling.sample1(
model=model,
# ... other params ...
dynamic_cfg_rescaling=True, # Enable dynamic rescaling
dynamic_cfg_method="variance", # Method: "variance" or "range"
dynamic_cfg_percentile=95, # Percentile for range method
dynamic_cfg_target_scale=1.0, # Target normalization scale
)
```
### When to Use
- High CFG values (>7.5)
- Detailed prompts that might cause over-saturation
- Photorealistic generations
- Portraits and faces
### When to Avoid
- Very low CFG (<3.0) - minimal benefit
- Artistic/stylized generations where saturation is desired
- When using CFG-free sampling (already handles this differently)
---
## 3. Adaptive Noise Scheduling
### What It Does
Dynamically adjusts the noise schedule based on content complexity during generation.
### The Problem
Traditional fixed noise schedules apply the same denoising steps to all regions:
- Complex scenes (detailed textures) may need more steps in certain regions
- Simple scenes (smooth gradients) can use fewer steps
- This wastes computation or undersamples complexity
### The Solution
Analyzes the complexity of intermediate predictions and adjusts subsequent noise levels accordingly.
**Two Methods:**
#### Complexity Method (Recommended)
```python
complexity = variance(denoised, spatial_dims)
# High variance = complex details = maintain fine noise steps
# Low variance = simple areas = can skip intermediate steps
```
Best for: General content-aware optimization
#### Attention Method
```python
complexity = mean(|gradient(denoised)|)
# High gradients = edges/details = need more precision
# Low gradients = smooth areas = can denoise faster
```
Best for: Edge-focused content (architecture, technical drawings)
### Performance Impact
- **Speed**: 10-20% faster for simple scenes, same for complex
- **Quality**: Adaptive - maintains quality where needed
- **Prompt Adherence**: Unchanged
### Usage
```python
samples = sampling.sample1(
model=model,
# ... other params ...
adaptive_noise_enabled=True, # Enable adaptive scheduling
adaptive_noise_method="complexity", # Method: "complexity" or "attention"
)
```
### When to Use
- Mixed complexity scenes (e.g., detailed subject + simple background)
- Long sampling runs (50+ steps) - more opportunity to optimize
- Batch generation with varying prompt complexity
### When to Avoid
- Very short sampling runs (<10 steps) - overhead > benefit
- Uniformly complex scenes - no simplification possible
- When exact step-by-step reproducibility is critical
---
## Combining Optimizations
All three optimizations can be used together:
```python
samples = sampling.sample1(
model=model,
noise=noise,
steps=20,
cfg=7.5,
sampler_name="dpmpp_sde_cfgpp",
scheduler="ays",
positive=positive_cond,
negative=negative_cond,
latent_image=latent,
# All optimizations enabled
batched_cfg=True,
dynamic_cfg_rescaling=True,
dynamic_cfg_method="variance",
dynamic_cfg_target_scale=1.0,
adaptive_noise_enabled=True,
adaptive_noise_method="complexity",
)
```
**Expected Results:**
- Better color balance and detail preservation
- Reduced over-saturation artifacts
- Maintained or improved prompt adherence
## Troubleshooting
### Batched CFG Issues
**Problem**: Memory errors with batched CFG
**Solution**: System may not have enough VRAM for joint cond/uncond batching. Disable it with `batched_cfg=False`, which keeps the conditioning path active but runs the two branches separately.
### Dynamic CFG Issues
**Problem**: Images too flat/desaturated
**Solution**: Increase `dynamic_cfg_target_scale` (try 1.5 or 2.0)
**Problem**: Still over-saturated
**Solution**: Switch to `dynamic_cfg_method="range"` and lower `dynamic_cfg_percentile`
### Adaptive Noise Issues
**Problem**: Inconsistent results
**Solution**: Adaptive scheduling makes slight changes based on content. Disable for exact reproducibility.
**Problem**: No speed improvement
**Solution**: Works best with simple scenes. Complex scenes won't see speedup (but won't be slower either).
---
## Credits
Implemented for LightDiffusion-Next by combining insights from:
- CFG++ dynamic rescaling techniques
- ComfyUI batched computation patterns
- Stable Diffusion WebUI adaptive scheduling