Spaces:

Aatricks
/

LightDiffusion-Next

Running on Zero

App Files Files Community

LightDiffusion-Next / docs /advanced-cfg-optimizations.md

Aatricks

Deploy ZeroGPU Gradio Space snapshot

b701455 26 days ago

preview code

raw

history blame contribute delete

7.55 kB

	# Advanced CFG Optimizations

	## Overview

	This document describes three advanced optimizations for Classifier-Free Guidance (CFG) that improve both quality and performance in LightDiffusion-Next:

	1. Batched CFG Computation - Speed optimization
	2. Dynamic CFG Rescaling - Quality optimization
	3. Adaptive Noise Scheduling - Quality & speed optimization

	## 1. Batched CFG Computation

	### What It Does

	Instead of running two separate forward passes for conditional and unconditional predictions, this optimization can combine them into a single batched forward pass.

	Before:
	```python
	# Two separate forward passes
	cond_pred = model(x, timestep, cond) # Pass 1
	uncond_pred = model(x, timestep, uncond) # Pass 2
	result = uncond_pred + cfg_scale * (cond_pred - uncond_pred)
	```

	After:
	```python
	# Single batched forward pass
	both_preds = model(x, timestep, [cond, uncond]) # Single pass
	cond_pred, uncond_pred = both_preds[0], both_preds[1]
	result = uncond_pred + cfg_scale * (cond_pred - uncond_pred)
	```

	### Performance Impact

	- Speed: ~1.8-2x faster CFG computation
	- Memory: Same or slightly less (batch processing)
	- Quality: Identical to baseline

	### Usage

	```python
	from src.sample import sampling

	samples = sampling.sample1(
	model=model,
	noise=noise,
	steps=20,
	cfg=7.5,
	# ... other params ...
	batched_cfg=True, # Joint cond/uncond batching (default: True)
	)
	```

	In the current implementation, the heavy lifting still happens in the central conditioning packing path. `batched_cfg` controls whether conditional and unconditional branches are packed together into the same forward pass when possible. Conditioning chunks within each branch are still packed by the shared batching logic.

	### When to Use

	- Usually recommended - This reduces duplicate cond/uncond forward passes when memory allows
	- Particularly beneficial for high-resolution images or batch generation
	- Compatible with all samplers and schedulers

	---

	## 2. Dynamic CFG Rescaling

	### What It Does

	Dynamically adjusts the CFG scale based on prediction statistics to prevent over-saturation while maintaining prompt adherence.

	### The Problem

	High CFG values (7-12) improve prompt following but can cause:
	- Over-saturated colors
	- Over-sharpened edges ("halo effect")
	- Loss of fine details
	- Unnatural, "CG-like" appearance

	### The Solution

	Dynamic CFG rescaling analyzes the guidance vector (difference between conditional and unconditional predictions) and adjusts the CFG scale to keep it within an optimal range.

	Two Methods:

	#### Variance Method (Recommended)
	```python
	guidance_std = std(cond_pred - uncond_pred)
	adjusted_cfg = cfg_scale * (target_scale / (1 + guidance_std))
	```

	Best for: General use, prevents over-saturation

	#### Range Method
	```python
	guidance_range = percentile(guidance, 95) - percentile(guidance, 5)
	adjusted_cfg = cfg_scale * (target_scale / guidance_range)
	```

	Best for: Extreme cases, outlier filtering

	### Performance Impact

	- Speed: Minimal overhead (~2-5%)
	- Quality: Improved color balance, reduced artifacts
	- Prompt Adherence: Maintained or improved

	### Usage

	```python
	samples = sampling.sample1(
	model=model,
	# ... other params ...
	dynamic_cfg_rescaling=True, # Enable dynamic rescaling
	dynamic_cfg_method="variance", # Method: "variance" or "range"
	dynamic_cfg_percentile=95, # Percentile for range method
	dynamic_cfg_target_scale=1.0, # Target normalization scale
	)
	```

	### When to Use

	- High CFG values (>7.5)
	- Detailed prompts that might cause over-saturation
	- Photorealistic generations
	- Portraits and faces

	### When to Avoid

	- Very low CFG (<3.0) - minimal benefit
	- Artistic/stylized generations where saturation is desired
	- When using CFG-free sampling (already handles this differently)

	---

	## 3. Adaptive Noise Scheduling

	### What It Does

	Dynamically adjusts the noise schedule based on content complexity during generation.

	### The Problem

	Traditional fixed noise schedules apply the same denoising steps to all regions:
	- Complex scenes (detailed textures) may need more steps in certain regions
	- Simple scenes (smooth gradients) can use fewer steps
	- This wastes computation or undersamples complexity

	### The Solution

	Analyzes the complexity of intermediate predictions and adjusts subsequent noise levels accordingly.

	Two Methods:

	#### Complexity Method (Recommended)
	```python
	complexity = variance(denoised, spatial_dims)
	# High variance = complex details = maintain fine noise steps
	# Low variance = simple areas = can skip intermediate steps
	```

	Best for: General content-aware optimization

	#### Attention Method
	```python
	complexity = mean(\|gradient(denoised)\|)
	# High gradients = edges/details = need more precision
	# Low gradients = smooth areas = can denoise faster
	```

	Best for: Edge-focused content (architecture, technical drawings)

	### Performance Impact

	- Speed: 10-20% faster for simple scenes, same for complex
	- Quality: Adaptive - maintains quality where needed
	- Prompt Adherence: Unchanged

	### Usage

	```python
	samples = sampling.sample1(
	model=model,
	# ... other params ...
	adaptive_noise_enabled=True, # Enable adaptive scheduling
	adaptive_noise_method="complexity", # Method: "complexity" or "attention"
	)
	```

	### When to Use

	- Mixed complexity scenes (e.g., detailed subject + simple background)
	- Long sampling runs (50+ steps) - more opportunity to optimize
	- Batch generation with varying prompt complexity

	### When to Avoid

	- Very short sampling runs (<10 steps) - overhead > benefit
	- Uniformly complex scenes - no simplification possible
	- When exact step-by-step reproducibility is critical

	---

	## Combining Optimizations

	All three optimizations can be used together:

	```python
	samples = sampling.sample1(
	model=model,
	noise=noise,
	steps=20,
	cfg=7.5,
	sampler_name="dpmpp_sde_cfgpp",
	scheduler="ays",
	positive=positive_cond,
	negative=negative_cond,
	latent_image=latent,
	# All optimizations enabled
	batched_cfg=True,
	dynamic_cfg_rescaling=True,
	dynamic_cfg_method="variance",
	dynamic_cfg_target_scale=1.0,
	adaptive_noise_enabled=True,
	adaptive_noise_method="complexity",
	)
	```

	Expected Results:
	- Better color balance and detail preservation
	- Reduced over-saturation artifacts
	- Maintained or improved prompt adherence

	## Troubleshooting

	### Batched CFG Issues

	Problem: Memory errors with batched CFG
	Solution: System may not have enough VRAM for joint cond/uncond batching. Disable it with `batched_cfg=False`, which keeps the conditioning path active but runs the two branches separately.

	### Dynamic CFG Issues

	Problem: Images too flat/desaturated
	Solution: Increase `dynamic_cfg_target_scale` (try 1.5 or 2.0)

	Problem: Still over-saturated
	Solution: Switch to `dynamic_cfg_method="range"` and lower `dynamic_cfg_percentile`

	### Adaptive Noise Issues

	Problem: Inconsistent results
	Solution: Adaptive scheduling makes slight changes based on content. Disable for exact reproducibility.

	Problem: No speed improvement
	Solution: Works best with simple scenes. Complex scenes won't see speedup (but won't be slower either).

	---

	## Credits

	Implemented for LightDiffusion-Next by combining insights from:
	- CFG++ dynamic rescaling techniques
	- ComfyUI batched computation patterns
	- Stable Diffusion WebUI adaptive scheduling