# Advanced CFG Optimizations ## Overview This document describes three advanced optimizations for Classifier-Free Guidance (CFG) that improve both quality and performance in LightDiffusion-Next: 1. **Batched CFG Computation** - Speed optimization 2. **Dynamic CFG Rescaling** - Quality optimization 3. **Adaptive Noise Scheduling** - Quality & speed optimization ## 1. Batched CFG Computation ### What It Does Instead of running two separate forward passes for conditional and unconditional predictions, this optimization can combine them into a single batched forward pass. **Before:** ```python # Two separate forward passes cond_pred = model(x, timestep, cond) # Pass 1 uncond_pred = model(x, timestep, uncond) # Pass 2 result = uncond_pred + cfg_scale * (cond_pred - uncond_pred) ``` **After:** ```python # Single batched forward pass both_preds = model(x, timestep, [cond, uncond]) # Single pass cond_pred, uncond_pred = both_preds[0], both_preds[1] result = uncond_pred + cfg_scale * (cond_pred - uncond_pred) ``` ### Performance Impact - **Speed**: ~1.8-2x faster CFG computation - **Memory**: Same or slightly less (batch processing) - **Quality**: Identical to baseline ### Usage ```python from src.sample import sampling samples = sampling.sample1( model=model, noise=noise, steps=20, cfg=7.5, # ... other params ... batched_cfg=True, # Joint cond/uncond batching (default: True) ) ``` In the current implementation, the heavy lifting still happens in the central conditioning packing path. `batched_cfg` controls whether conditional and unconditional branches are packed together into the same forward pass when possible. Conditioning chunks within each branch are still packed by the shared batching logic. ### When to Use - **Usually recommended** - This reduces duplicate cond/uncond forward passes when memory allows - Particularly beneficial for high-resolution images or batch generation - Compatible with all samplers and schedulers --- ## 2. Dynamic CFG Rescaling ### What It Does Dynamically adjusts the CFG scale based on prediction statistics to prevent over-saturation while maintaining prompt adherence. ### The Problem High CFG values (7-12) improve prompt following but can cause: - Over-saturated colors - Over-sharpened edges ("halo effect") - Loss of fine details - Unnatural, "CG-like" appearance ### The Solution Dynamic CFG rescaling analyzes the guidance vector (difference between conditional and unconditional predictions) and adjusts the CFG scale to keep it within an optimal range. **Two Methods:** #### Variance Method (Recommended) ```python guidance_std = std(cond_pred - uncond_pred) adjusted_cfg = cfg_scale * (target_scale / (1 + guidance_std)) ``` Best for: General use, prevents over-saturation #### Range Method ```python guidance_range = percentile(guidance, 95) - percentile(guidance, 5) adjusted_cfg = cfg_scale * (target_scale / guidance_range) ``` Best for: Extreme cases, outlier filtering ### Performance Impact - **Speed**: Minimal overhead (~2-5%) - **Quality**: Improved color balance, reduced artifacts - **Prompt Adherence**: Maintained or improved ### Usage ```python samples = sampling.sample1( model=model, # ... other params ... dynamic_cfg_rescaling=True, # Enable dynamic rescaling dynamic_cfg_method="variance", # Method: "variance" or "range" dynamic_cfg_percentile=95, # Percentile for range method dynamic_cfg_target_scale=1.0, # Target normalization scale ) ``` ### When to Use - High CFG values (>7.5) - Detailed prompts that might cause over-saturation - Photorealistic generations - Portraits and faces ### When to Avoid - Very low CFG (<3.0) - minimal benefit - Artistic/stylized generations where saturation is desired - When using CFG-free sampling (already handles this differently) --- ## 3. Adaptive Noise Scheduling ### What It Does Dynamically adjusts the noise schedule based on content complexity during generation. ### The Problem Traditional fixed noise schedules apply the same denoising steps to all regions: - Complex scenes (detailed textures) may need more steps in certain regions - Simple scenes (smooth gradients) can use fewer steps - This wastes computation or undersamples complexity ### The Solution Analyzes the complexity of intermediate predictions and adjusts subsequent noise levels accordingly. **Two Methods:** #### Complexity Method (Recommended) ```python complexity = variance(denoised, spatial_dims) # High variance = complex details = maintain fine noise steps # Low variance = simple areas = can skip intermediate steps ``` Best for: General content-aware optimization #### Attention Method ```python complexity = mean(|gradient(denoised)|) # High gradients = edges/details = need more precision # Low gradients = smooth areas = can denoise faster ``` Best for: Edge-focused content (architecture, technical drawings) ### Performance Impact - **Speed**: 10-20% faster for simple scenes, same for complex - **Quality**: Adaptive - maintains quality where needed - **Prompt Adherence**: Unchanged ### Usage ```python samples = sampling.sample1( model=model, # ... other params ... adaptive_noise_enabled=True, # Enable adaptive scheduling adaptive_noise_method="complexity", # Method: "complexity" or "attention" ) ``` ### When to Use - Mixed complexity scenes (e.g., detailed subject + simple background) - Long sampling runs (50+ steps) - more opportunity to optimize - Batch generation with varying prompt complexity ### When to Avoid - Very short sampling runs (<10 steps) - overhead > benefit - Uniformly complex scenes - no simplification possible - When exact step-by-step reproducibility is critical --- ## Combining Optimizations All three optimizations can be used together: ```python samples = sampling.sample1( model=model, noise=noise, steps=20, cfg=7.5, sampler_name="dpmpp_sde_cfgpp", scheduler="ays", positive=positive_cond, negative=negative_cond, latent_image=latent, # All optimizations enabled batched_cfg=True, dynamic_cfg_rescaling=True, dynamic_cfg_method="variance", dynamic_cfg_target_scale=1.0, adaptive_noise_enabled=True, adaptive_noise_method="complexity", ) ``` **Expected Results:** - Better color balance and detail preservation - Reduced over-saturation artifacts - Maintained or improved prompt adherence ## Troubleshooting ### Batched CFG Issues **Problem**: Memory errors with batched CFG **Solution**: System may not have enough VRAM for joint cond/uncond batching. Disable it with `batched_cfg=False`, which keeps the conditioning path active but runs the two branches separately. ### Dynamic CFG Issues **Problem**: Images too flat/desaturated **Solution**: Increase `dynamic_cfg_target_scale` (try 1.5 or 2.0) **Problem**: Still over-saturated **Solution**: Switch to `dynamic_cfg_method="range"` and lower `dynamic_cfg_percentile` ### Adaptive Noise Issues **Problem**: Inconsistent results **Solution**: Adaptive scheduling makes slight changes based on content. Disable for exact reproducibility. **Problem**: No speed improvement **Solution**: Works best with simple scenes. Complex scenes won't see speedup (but won't be slower either). --- ## Credits Implemented for LightDiffusion-Next by combining insights from: - CFG++ dynamic rescaling techniques - ComfyUI batched computation patterns - Stable Diffusion WebUI adaptive scheduling