Spaces:
Running on Zero
Running on Zero
| # Advanced CFG Optimizations | |
| ## Overview | |
| This document describes three advanced optimizations for Classifier-Free Guidance (CFG) that improve both quality and performance in LightDiffusion-Next: | |
| 1. **Batched CFG Computation** - Speed optimization | |
| 2. **Dynamic CFG Rescaling** - Quality optimization | |
| 3. **Adaptive Noise Scheduling** - Quality & speed optimization | |
| ## 1. Batched CFG Computation | |
| ### What It Does | |
| Instead of running two separate forward passes for conditional and unconditional predictions, this optimization can combine them into a single batched forward pass. | |
| **Before:** | |
| ```python | |
| # Two separate forward passes | |
| cond_pred = model(x, timestep, cond) # Pass 1 | |
| uncond_pred = model(x, timestep, uncond) # Pass 2 | |
| result = uncond_pred + cfg_scale * (cond_pred - uncond_pred) | |
| ``` | |
| **After:** | |
| ```python | |
| # Single batched forward pass | |
| both_preds = model(x, timestep, [cond, uncond]) # Single pass | |
| cond_pred, uncond_pred = both_preds[0], both_preds[1] | |
| result = uncond_pred + cfg_scale * (cond_pred - uncond_pred) | |
| ``` | |
| ### Performance Impact | |
| - **Speed**: ~1.8-2x faster CFG computation | |
| - **Memory**: Same or slightly less (batch processing) | |
| - **Quality**: Identical to baseline | |
| ### Usage | |
| ```python | |
| from src.sample import sampling | |
| samples = sampling.sample1( | |
| model=model, | |
| noise=noise, | |
| steps=20, | |
| cfg=7.5, | |
| # ... other params ... | |
| batched_cfg=True, # Joint cond/uncond batching (default: True) | |
| ) | |
| ``` | |
| In the current implementation, the heavy lifting still happens in the central conditioning packing path. `batched_cfg` controls whether conditional and unconditional branches are packed together into the same forward pass when possible. Conditioning chunks within each branch are still packed by the shared batching logic. | |
| ### When to Use | |
| - **Usually recommended** - This reduces duplicate cond/uncond forward passes when memory allows | |
| - Particularly beneficial for high-resolution images or batch generation | |
| - Compatible with all samplers and schedulers | |
| --- | |
| ## 2. Dynamic CFG Rescaling | |
| ### What It Does | |
| Dynamically adjusts the CFG scale based on prediction statistics to prevent over-saturation while maintaining prompt adherence. | |
| ### The Problem | |
| High CFG values (7-12) improve prompt following but can cause: | |
| - Over-saturated colors | |
| - Over-sharpened edges ("halo effect") | |
| - Loss of fine details | |
| - Unnatural, "CG-like" appearance | |
| ### The Solution | |
| Dynamic CFG rescaling analyzes the guidance vector (difference between conditional and unconditional predictions) and adjusts the CFG scale to keep it within an optimal range. | |
| **Two Methods:** | |
| #### Variance Method (Recommended) | |
| ```python | |
| guidance_std = std(cond_pred - uncond_pred) | |
| adjusted_cfg = cfg_scale * (target_scale / (1 + guidance_std)) | |
| ``` | |
| Best for: General use, prevents over-saturation | |
| #### Range Method | |
| ```python | |
| guidance_range = percentile(guidance, 95) - percentile(guidance, 5) | |
| adjusted_cfg = cfg_scale * (target_scale / guidance_range) | |
| ``` | |
| Best for: Extreme cases, outlier filtering | |
| ### Performance Impact | |
| - **Speed**: Minimal overhead (~2-5%) | |
| - **Quality**: Improved color balance, reduced artifacts | |
| - **Prompt Adherence**: Maintained or improved | |
| ### Usage | |
| ```python | |
| samples = sampling.sample1( | |
| model=model, | |
| # ... other params ... | |
| dynamic_cfg_rescaling=True, # Enable dynamic rescaling | |
| dynamic_cfg_method="variance", # Method: "variance" or "range" | |
| dynamic_cfg_percentile=95, # Percentile for range method | |
| dynamic_cfg_target_scale=1.0, # Target normalization scale | |
| ) | |
| ``` | |
| ### When to Use | |
| - High CFG values (>7.5) | |
| - Detailed prompts that might cause over-saturation | |
| - Photorealistic generations | |
| - Portraits and faces | |
| ### When to Avoid | |
| - Very low CFG (<3.0) - minimal benefit | |
| - Artistic/stylized generations where saturation is desired | |
| - When using CFG-free sampling (already handles this differently) | |
| --- | |
| ## 3. Adaptive Noise Scheduling | |
| ### What It Does | |
| Dynamically adjusts the noise schedule based on content complexity during generation. | |
| ### The Problem | |
| Traditional fixed noise schedules apply the same denoising steps to all regions: | |
| - Complex scenes (detailed textures) may need more steps in certain regions | |
| - Simple scenes (smooth gradients) can use fewer steps | |
| - This wastes computation or undersamples complexity | |
| ### The Solution | |
| Analyzes the complexity of intermediate predictions and adjusts subsequent noise levels accordingly. | |
| **Two Methods:** | |
| #### Complexity Method (Recommended) | |
| ```python | |
| complexity = variance(denoised, spatial_dims) | |
| # High variance = complex details = maintain fine noise steps | |
| # Low variance = simple areas = can skip intermediate steps | |
| ``` | |
| Best for: General content-aware optimization | |
| #### Attention Method | |
| ```python | |
| complexity = mean(|gradient(denoised)|) | |
| # High gradients = edges/details = need more precision | |
| # Low gradients = smooth areas = can denoise faster | |
| ``` | |
| Best for: Edge-focused content (architecture, technical drawings) | |
| ### Performance Impact | |
| - **Speed**: 10-20% faster for simple scenes, same for complex | |
| - **Quality**: Adaptive - maintains quality where needed | |
| - **Prompt Adherence**: Unchanged | |
| ### Usage | |
| ```python | |
| samples = sampling.sample1( | |
| model=model, | |
| # ... other params ... | |
| adaptive_noise_enabled=True, # Enable adaptive scheduling | |
| adaptive_noise_method="complexity", # Method: "complexity" or "attention" | |
| ) | |
| ``` | |
| ### When to Use | |
| - Mixed complexity scenes (e.g., detailed subject + simple background) | |
| - Long sampling runs (50+ steps) - more opportunity to optimize | |
| - Batch generation with varying prompt complexity | |
| ### When to Avoid | |
| - Very short sampling runs (<10 steps) - overhead > benefit | |
| - Uniformly complex scenes - no simplification possible | |
| - When exact step-by-step reproducibility is critical | |
| --- | |
| ## Combining Optimizations | |
| All three optimizations can be used together: | |
| ```python | |
| samples = sampling.sample1( | |
| model=model, | |
| noise=noise, | |
| steps=20, | |
| cfg=7.5, | |
| sampler_name="dpmpp_sde_cfgpp", | |
| scheduler="ays", | |
| positive=positive_cond, | |
| negative=negative_cond, | |
| latent_image=latent, | |
| # All optimizations enabled | |
| batched_cfg=True, | |
| dynamic_cfg_rescaling=True, | |
| dynamic_cfg_method="variance", | |
| dynamic_cfg_target_scale=1.0, | |
| adaptive_noise_enabled=True, | |
| adaptive_noise_method="complexity", | |
| ) | |
| ``` | |
| **Expected Results:** | |
| - Better color balance and detail preservation | |
| - Reduced over-saturation artifacts | |
| - Maintained or improved prompt adherence | |
| ## Troubleshooting | |
| ### Batched CFG Issues | |
| **Problem**: Memory errors with batched CFG | |
| **Solution**: System may not have enough VRAM for joint cond/uncond batching. Disable it with `batched_cfg=False`, which keeps the conditioning path active but runs the two branches separately. | |
| ### Dynamic CFG Issues | |
| **Problem**: Images too flat/desaturated | |
| **Solution**: Increase `dynamic_cfg_target_scale` (try 1.5 or 2.0) | |
| **Problem**: Still over-saturated | |
| **Solution**: Switch to `dynamic_cfg_method="range"` and lower `dynamic_cfg_percentile` | |
| ### Adaptive Noise Issues | |
| **Problem**: Inconsistent results | |
| **Solution**: Adaptive scheduling makes slight changes based on content. Disable for exact reproducibility. | |
| **Problem**: No speed improvement | |
| **Solution**: Works best with simple scenes. Complex scenes won't see speedup (but won't be slower either). | |
| --- | |
| ## Credits | |
| Implemented for LightDiffusion-Next by combining insights from: | |
| - CFG++ dynamic rescaling techniques | |
| - ComfyUI batched computation patterns | |
| - Stable Diffusion WebUI adaptive scheduling | |