Multi-GPU question

#72

by degurshaft - opened Mar 7

Discussion

degurshaft

Mar 7

Is there any point in using multi GPU and if so how is it implemented?

nagarago

Mar 7

At present, the model isn't large enough to justify using multiple GPUs, so for normal single-generation tasks there is little benefit to multi-GPU setups. However, dividing work by assigning tasks to other GPUs during generation or processing batches concurrently is likely already supported in ComfyUI.

InvictusCreations

Mar 8

You can use Multi-GPU nodes in Comfy to offload the TE onto a second GPU to free up the tiny bit of space to maybe be able to increase your batch size. You can also use DisTorch nodes to utilize a secondary GPUs VRAM as virtual VRAM to offload into instead of shared CPU memory which might be slightly faster. So VRAM = yes, compute = no.

Gazingstars123

Mar 9

Yes it does work, but not natively in ComfyUI or in most applications. The simplest idea and implementation that works across platforms is CFG parallelism where one GPU handles positive prompt and the 2nd GPU handles negative prompt and they run in parallel, so technically you can see 1.9-2x speed up with the same GPU model. I've successfully implemented it here inside my Anima Lora Training wrapper

degurshaft

Mar 9

Yes it does work, but not natively in ComfyUI or in most applications. The simplest idea and implementation that works across platforms is CFG parallelism where one GPU handles positive prompt and the 2nd GPU handles negative prompt and they run in parallel, so technically you can see 1.9-2x speed up with the same GPU model. I've successfully implemented it here inside my Anima Lora Training wrapper

I actually got interested in parallel CFG looking at your gui and was the one who asked you how to solve the multi-gpu problem when it didn't work :d

Gazingstars123

Mar 9

•

edited Mar 9

Oh hi, then I got good news, hopefully soon

degurshaft

Mar 9

Oh hi, then I got good news, hopefully soon

Wow, looking forward to it!

Gazingstars123

Mar 9

Still very experimental but you may try it out here https://github.com/gazingstars123/ComfyUI-CFGParallel

Drag the image into Comfyui for a workflow

degurshaft

Mar 9

Still very experimental but you may try it out here https://github.com/gazingstars123/ComfyUI-CFGParallel

tested it, in my case, with cards from different architectures, the process got slower, 11s vs 15s for ksamplers at your workflow w same seed. Looks like the older 3090 bottlenecked whole process

Gazingstars123

Mar 9

•

edited Mar 9

Thanks for testing it out, I have a 5080 and 3090 so the speed up is about 1.5x of running a single 5080. I wish I even have a single 5090, but you get what you paid for

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment