Spaces:
Runtime error
Runtime error
| # Control | |
| Native control module for SD.Next for Diffusers backend | |
| Can be used for Control generation as well as Image and Text workflows | |
| For a guide on the options and settings, as well as explanations for the controls themselves, see the [Control Guide](https://github.com/vladmandic/automatic/wiki/Control-Guide) page. | |
| ## Supported Control Models | |
| - [lllyasviel ControlNet](https://github.com/lllyasviel/ControlNet) for **SD 1.5** and **SD-XL** models | |
| Includes ControlNets as well as Reference-only mode and any compatible 3rd party models | |
| Original ControlNets for SD15 are 1.4GB each and for SDXL its at massive 4.9GB | |
| - [VisLearn ControlNet XS](https://vislearn.github.io/ControlNet-XS/) for **SD-XL** models | |
| Lightweight ControlNet models for SDXL at 165MB only with near-identical results | |
| - [TencentARC T2I-Adapter](https://github.com/TencentARC/T2I-Adapter) for **SD 1.5** and **SD-XL** models | |
| T2I-Adapters provide similar functionality at much lower resource cost at only 300MB each | |
| - [Kohya Control LLite](https://huggingface.co/kohya-ss/controlnet-lllite) for **SD-XL** models | |
| LLLite models for SDXL at 46MB only provide lightweight image control | |
| - [TenecentAILab IP-Adapter](https://github.com/tencent-ailab/IP-Adapter) for **SD 1.5** and **SD-XL** models | |
| IP-Adapters provides great style transfer functionality at much lower resource cost at below 100MB for SD15 and 700MB for SDXL | |
| IP-Adapters can be combined with ControlNet for more stable results, especially when doing batch/video processing | |
| - [CiaraRowles TemporalNet](https://huggingface.co/CiaraRowles/TemporalNet) for **SD 1.5** models | |
| ControlNet model designed to enhance temporal consistency and reduce flickering for batch/video processing | |
| All built-in models are downloaded upon first use and stored stored in: | |
| `/models/controlnet`, `/models/adapter`, `/models/xs`, `/models/lite`, `/models/processor` | |
| Listed below are all models that are supported out-of-the-box: | |
| ### ControlNet | |
| - **SD15**: | |
| Canny, Depth, IP2P, LineArt, LineArt Anime, MLDS, NormalBae, OpenPose, | |
| Scribble, Segment, Shuffle, SoftEdge, TemporalNet, HED, Tile | |
| - **SDXL**: | |
| Canny Small XL, Canny Mid XL, Canny XL, Depth Zoe XL, Depth Mid XL | |
| Note: only models compatible with currently loaded base model are listed | |
| Additional ControlNet models in safetensors can be downloaded manually and placed into corresponding folder: `/models/control/controlnet` | |
| ## ControlNet XS | |
| - **SDXL**: | |
| Canny, Depth | |
| ## ControlNet LLLite | |
| - **SDXL**: | |
| Canny, Canny anime, Depth anime, Blur anime, Pose anime, Replicate anime | |
| Note: control-lllite is implemented using unofficial implementation and its considered experimental | |
| Additional ControlNet models in safetensors can be downloaded manually and placed into corresponding folder: `/models/control/lite` | |
| ### T2I-Adapter | |
| 'Segment': 'TencentARC/t2iadapter_seg_sd14v1', | |
| 'Zoe Depth': 'TencentARC/t2iadapter_zoedepth_sd15v1', | |
| 'OpenPose': 'TencentARC/t2iadapter_openpose_sd14v1', | |
| 'KeyPose': 'TencentARC/t2iadapter_keypose_sd14v1', | |
| 'Color': 'TencentARC/t2iadapter_color_sd14v1', | |
| 'Depth v1': 'TencentARC/t2iadapter_depth_sd14v1', | |
| 'Depth v2': 'TencentARC/t2iadapter_depth_sd15v2', | |
| 'Canny v1': 'TencentARC/t2iadapter_canny_sd14v1', | |
| 'Canny v2': 'TencentARC/t2iadapter_canny_sd15v2', | |
| 'Sketch v1': 'TencentARC/t2iadapter_sketch_sd14v1', | |
| 'Sketch v2': 'TencentARC/t2iadapter_sketch_sd15v2', | |
| - **SD15**: | |
| Segment, Zoe Depth, OpenPose, KeyPose, Color, Depth v1, Depth v2, Canny v1, Canny v2, Sketch v1, Sketch v2 | |
| - **SDXL**: | |
| Canny XL, Depth Zoe XL, Depth Midas XL, LineArt XL, OpenPose XL, Sketch XL | |
| *Note*: Only models compatible with currently loaded base model are listed | |
| ### Processors | |
| - **Pose style**: OpenPose, DWPose, MediaPipe Face | |
| - **Outline style**: Canny, Edge, LineArt Realistic, LineArt Anime, HED, PidiNet | |
| - **Depth style**: Midas Depth Hybrid, Zoe Depth, Leres Depth, Normal Bae | |
| - **Segmentation style**: SegmentAnything | |
| - **Other**: MLSD, Shuffle | |
| *Note*: Processor sizes can vary from none for built-in ones to anywhere between 200MB up to 4.2GB for ZoeDepth-Large | |
| ### Segmentation Models | |
| There are 8 Auto-segmentation models available: | |
| - Facebook SAM ViT Base (357MB) | |
| - Facebook SAM ViT Large (1.16GB) | |
| - Facebook SAM ViT Huge (2.56GB) | |
| - SlimSAM Uniform (106MB) | |
| - SlimSAM Uniform Tiny (37MB) | |
| - Rembg Silueta | |
| - Rembg U2Net | |
| - Rembg ISNet | |
| ### Reference | |
| Reference mode is its own pipeline, so it cannot have multiple units or processors | |
| ## Workflows | |
| ### Inputs & Outputs | |
| - Image -> Image | |
| - Batch: list of images -> Gallery and/or Video | |
| - Folder: folder with images -> Gallery and/or Video | |
| - Video -> Gallery and/or Video | |
| *Notes*: | |
| - Input/Output/Preview panels can be minimized by clicking on them | |
| - For video output, make sure to set video options | |
| ### Unit | |
| - Unit is: **input** plus **process** plus **control** | |
| - Pipeline consists of any number of configured units | |
| If unit is using using control modules, all control modules inside pipeline must be of same type | |
| e.g. **ControlNet**, **ControlNet-XS**, **T2I-Adapter** or **Reference** | |
| - Each unit can use primary input or its own override input | |
| - Each unit can have no processor in which case it will run control on input directly | |
| Use when you're using predefined input templates | |
| - Unit can have no control in which case it will run processor only | |
| - Any combination of input, processor and control is possible | |
| For example, two enabled units with process only will produce compound processed image but without control | |
| ### What-if? | |
| - If no input is provided then pipeline will run in **txt2img** mode | |
| Can be freely used instead of standard `txt2img` | |
| - If none of units have control or adapter, pipeline will run in **img2img** mode using input image | |
| Can be freely used instead of standard `img2img` | |
| - If you have processor enabled, but no controlnet or adapter loaded, | |
| pipeline will run in **img2img** mode using processed input | |
| - If you have multiple processors enabled, but no controlnet or adapter loaded, | |
| pipeline will run in **img2img** mode on *blended* processed image | |
| - Output resolution is by default set to input resolution, | |
| Use resize settings to force any resolution | |
| - Resize operation can run before (on input image) or after processing (on output image) | |
| - Using video input will run pipeline on each frame unless **skip frames** is set | |
| Video output is standard list of images (gallery) and can be optionally encoded into a video file | |
| Video file can be interpolated using **RIFE** for smoother playback | |
| ### Overrides | |
| - Control can be based on main input or each individual unit can have its own override input | |
| - By default, control runs in default control+txt2img mode | |
| - If init image is provided, it runs in control+img2img mode | |
| Init image can be same as control image or separate | |
| - IP adapter can be applied to any workflow | |
| - IP adapter can use same input as control input or separate | |
| ### Inpaint | |
| - Inpaint workflow is triggered when input image is provided in **inpaint** mode | |
| - Inpaint mode can be used with image-to-image or controlnet workflows | |
| - Other unit types such as T2I, XS or Lite do not support inpaint mode | |
| ### Outpaint | |
| - Outpaint workflow is triggered when input image is provided in **outpaint** mode | |
| - Outpaint mode can be used with image-to-image or controlnet workflows | |
| - Other unit types such as T2I, XS or Lite do not support outpaint mode | |
| - Recommendation is to increase denoising strength to at least 0.8 since outpained area is blank and needs to be filled with noise | |
| - Outpaint folloing input image can be controled by overlap setting - higher overlap and more of original image will be part of the outpaint process | |
| ## Logging | |
| To enable extra logging for troubleshooting purposes, | |
| set environment variables before running **SD.Next** | |
| - Linux: | |
| > export SD_CONTROL_DEBUG=true | |
| > export SD_PROCESS_DEBUG=true | |
| > ./webui.sh --debug | |
| - Windows: | |
| > set SD_CONTROL_DEBUG=true | |
| > set SD_PROCESS_DEBUG=true | |
| > webui.bat --debug | |
| *Note*: Starting with debug info enabled also enables **Test** mode in Control module | |
| ## Limitations / TODO | |
| ### Known issues | |
| - Using model offload can cause Control models to be on the wrong device at the time of the execution | |
| Example error message: | |
| > Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same | |
| Workaround: Disable **model offload** in settings -> diffusers and use **move model** option instead | |
| - Issues after trying to use DWPose and installation fails: `` error. | |
| Example error message: | |
| > Control processor DWPose: DLL load failed while importing _ext | |
| Workaround: Activate venv and run following commands to install dwpose dependencies manually: | |
| `pip install -U openmim --no-deps` | |
| `mim install mmengine mmcv mmpose mmdet --no-deps` | |
| ## Future | |
| - Pose editor | |
| - Process multiple images in batch in parallel | |
| - ControlLora <https://huggingface.co/stabilityai/control-lora> | |
| - Multi-frame rendering <https://xanthius.itch.io/multi-frame-rendering-for-stablediffusion> | |
| - Deflickering and deghosting | |