Multi-GPU question
Is there any point in using multi GPU and if so how is it implemented?
At present, the model isn't large enough to justify using multiple GPUs, so for normal single-generation tasks there is little benefit to multi-GPU setups. However, dividing work by assigning tasks to other GPUs during generation or processing batches concurrently is likely already supported in ComfyUI.
You can use Multi-GPU nodes in Comfy to offload the TE onto a second GPU to free up the tiny bit of space to maybe be able to increase your batch size. You can also use DisTorch nodes to utilize a secondary GPUs VRAM as virtual VRAM to offload into instead of shared CPU memory which might be slightly faster. So VRAM = yes, compute = no.
Yes it does work, but not natively in ComfyUI or in most applications. The simplest idea and implementation that works across platforms is CFG parallelism where one GPU handles positive prompt and the 2nd GPU handles negative prompt and they run in parallel, so technically you can see 1.9-2x speed up with the same GPU model. I've successfully implemented it here inside my Anima Lora Training wrapper
Yes it does work, but not natively in ComfyUI or in most applications. The simplest idea and implementation that works across platforms is CFG parallelism where one GPU handles positive prompt and the 2nd GPU handles negative prompt and they run in parallel, so technically you can see 1.9-2x speed up with the same GPU model. I've successfully implemented it here inside my Anima Lora Training wrapper
I actually got interested in parallel CFG looking at your gui and was the one who asked you how to solve the multi-gpu problem when it didn't work :d
Still very experimental but you may try it out here https://github.com/gazingstars123/ComfyUI-CFGParallel
Still very experimental but you may try it out here https://github.com/gazingstars123/ComfyUI-CFGParallel
tested it, in my case, with cards from different architectures, the process got slower, 11s vs 15s for ksamplers at your workflow w same seed. Looks like the older 3090 bottlenecked whole process
Thanks for testing it out, I have a 5080 and 3090 so the speed up is about 1.5x of running a single 5080. I wish I even have a single 5090, but you get what you paid for

