Spaces:
Running on Zero
Running on Zero
File size: 7,548 Bytes
b701455 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 | # Advanced CFG Optimizations
## Overview
This document describes three advanced optimizations for Classifier-Free Guidance (CFG) that improve both quality and performance in LightDiffusion-Next:
1. **Batched CFG Computation** - Speed optimization
2. **Dynamic CFG Rescaling** - Quality optimization
3. **Adaptive Noise Scheduling** - Quality & speed optimization
## 1. Batched CFG Computation
### What It Does
Instead of running two separate forward passes for conditional and unconditional predictions, this optimization can combine them into a single batched forward pass.
**Before:**
```python
# Two separate forward passes
cond_pred = model(x, timestep, cond) # Pass 1
uncond_pred = model(x, timestep, uncond) # Pass 2
result = uncond_pred + cfg_scale * (cond_pred - uncond_pred)
```
**After:**
```python
# Single batched forward pass
both_preds = model(x, timestep, [cond, uncond]) # Single pass
cond_pred, uncond_pred = both_preds[0], both_preds[1]
result = uncond_pred + cfg_scale * (cond_pred - uncond_pred)
```
### Performance Impact
- **Speed**: ~1.8-2x faster CFG computation
- **Memory**: Same or slightly less (batch processing)
- **Quality**: Identical to baseline
### Usage
```python
from src.sample import sampling
samples = sampling.sample1(
model=model,
noise=noise,
steps=20,
cfg=7.5,
# ... other params ...
batched_cfg=True, # Joint cond/uncond batching (default: True)
)
```
In the current implementation, the heavy lifting still happens in the central conditioning packing path. `batched_cfg` controls whether conditional and unconditional branches are packed together into the same forward pass when possible. Conditioning chunks within each branch are still packed by the shared batching logic.
### When to Use
- **Usually recommended** - This reduces duplicate cond/uncond forward passes when memory allows
- Particularly beneficial for high-resolution images or batch generation
- Compatible with all samplers and schedulers
---
## 2. Dynamic CFG Rescaling
### What It Does
Dynamically adjusts the CFG scale based on prediction statistics to prevent over-saturation while maintaining prompt adherence.
### The Problem
High CFG values (7-12) improve prompt following but can cause:
- Over-saturated colors
- Over-sharpened edges ("halo effect")
- Loss of fine details
- Unnatural, "CG-like" appearance
### The Solution
Dynamic CFG rescaling analyzes the guidance vector (difference between conditional and unconditional predictions) and adjusts the CFG scale to keep it within an optimal range.
**Two Methods:**
#### Variance Method (Recommended)
```python
guidance_std = std(cond_pred - uncond_pred)
adjusted_cfg = cfg_scale * (target_scale / (1 + guidance_std))
```
Best for: General use, prevents over-saturation
#### Range Method
```python
guidance_range = percentile(guidance, 95) - percentile(guidance, 5)
adjusted_cfg = cfg_scale * (target_scale / guidance_range)
```
Best for: Extreme cases, outlier filtering
### Performance Impact
- **Speed**: Minimal overhead (~2-5%)
- **Quality**: Improved color balance, reduced artifacts
- **Prompt Adherence**: Maintained or improved
### Usage
```python
samples = sampling.sample1(
model=model,
# ... other params ...
dynamic_cfg_rescaling=True, # Enable dynamic rescaling
dynamic_cfg_method="variance", # Method: "variance" or "range"
dynamic_cfg_percentile=95, # Percentile for range method
dynamic_cfg_target_scale=1.0, # Target normalization scale
)
```
### When to Use
- High CFG values (>7.5)
- Detailed prompts that might cause over-saturation
- Photorealistic generations
- Portraits and faces
### When to Avoid
- Very low CFG (<3.0) - minimal benefit
- Artistic/stylized generations where saturation is desired
- When using CFG-free sampling (already handles this differently)
---
## 3. Adaptive Noise Scheduling
### What It Does
Dynamically adjusts the noise schedule based on content complexity during generation.
### The Problem
Traditional fixed noise schedules apply the same denoising steps to all regions:
- Complex scenes (detailed textures) may need more steps in certain regions
- Simple scenes (smooth gradients) can use fewer steps
- This wastes computation or undersamples complexity
### The Solution
Analyzes the complexity of intermediate predictions and adjusts subsequent noise levels accordingly.
**Two Methods:**
#### Complexity Method (Recommended)
```python
complexity = variance(denoised, spatial_dims)
# High variance = complex details = maintain fine noise steps
# Low variance = simple areas = can skip intermediate steps
```
Best for: General content-aware optimization
#### Attention Method
```python
complexity = mean(|gradient(denoised)|)
# High gradients = edges/details = need more precision
# Low gradients = smooth areas = can denoise faster
```
Best for: Edge-focused content (architecture, technical drawings)
### Performance Impact
- **Speed**: 10-20% faster for simple scenes, same for complex
- **Quality**: Adaptive - maintains quality where needed
- **Prompt Adherence**: Unchanged
### Usage
```python
samples = sampling.sample1(
model=model,
# ... other params ...
adaptive_noise_enabled=True, # Enable adaptive scheduling
adaptive_noise_method="complexity", # Method: "complexity" or "attention"
)
```
### When to Use
- Mixed complexity scenes (e.g., detailed subject + simple background)
- Long sampling runs (50+ steps) - more opportunity to optimize
- Batch generation with varying prompt complexity
### When to Avoid
- Very short sampling runs (<10 steps) - overhead > benefit
- Uniformly complex scenes - no simplification possible
- When exact step-by-step reproducibility is critical
---
## Combining Optimizations
All three optimizations can be used together:
```python
samples = sampling.sample1(
model=model,
noise=noise,
steps=20,
cfg=7.5,
sampler_name="dpmpp_sde_cfgpp",
scheduler="ays",
positive=positive_cond,
negative=negative_cond,
latent_image=latent,
# All optimizations enabled
batched_cfg=True,
dynamic_cfg_rescaling=True,
dynamic_cfg_method="variance",
dynamic_cfg_target_scale=1.0,
adaptive_noise_enabled=True,
adaptive_noise_method="complexity",
)
```
**Expected Results:**
- Better color balance and detail preservation
- Reduced over-saturation artifacts
- Maintained or improved prompt adherence
## Troubleshooting
### Batched CFG Issues
**Problem**: Memory errors with batched CFG
**Solution**: System may not have enough VRAM for joint cond/uncond batching. Disable it with `batched_cfg=False`, which keeps the conditioning path active but runs the two branches separately.
### Dynamic CFG Issues
**Problem**: Images too flat/desaturated
**Solution**: Increase `dynamic_cfg_target_scale` (try 1.5 or 2.0)
**Problem**: Still over-saturated
**Solution**: Switch to `dynamic_cfg_method="range"` and lower `dynamic_cfg_percentile`
### Adaptive Noise Issues
**Problem**: Inconsistent results
**Solution**: Adaptive scheduling makes slight changes based on content. Disable for exact reproducibility.
**Problem**: No speed improvement
**Solution**: Works best with simple scenes. Complex scenes won't see speedup (but won't be slower either).
---
## Credits
Implemented for LightDiffusion-Next by combining insights from:
- CFG++ dynamic rescaling techniques
- ComfyUI batched computation patterns
- Stable Diffusion WebUI adaptive scheduling
|