LTX 2.3 and Crowd Generation V2V

#125
by PeZiK - opened

Hi RuneXX,

What would be the best way to generate a massive crowd to fill empty bleachers at a sports event using video-to-video editing? I tried the Edit Anything workflow, but I didn't get good results. Any pointers? :)

yes according to the creator of that lora, it was trained on typical appliances and objects. Like remove the laptop, add a lamp, etc. And seems to work great for that.
But it has general capabilities. I removed a monster sea creature in the workflow post just as a demo.. pretty sure it was no monster in the training data ;-)
And it does have people in the training data as a subject to add / remove. So it might work with a few attempts (but i haven't tried that myself)

Other ways would be to do regular inpainting. I haven't added a workflow for that yet, but will do asap.
And then just mask the area you want to change, and it should hopefully do the job.

For even better result you could use first frame of your video, and output that as an image. And then ask Nano Banana or any other LLM to add people in that area.. they are great at editing images.
But you can perfectly fine do it inside ComfyUI as well with Qwen Image Edit, or Flux Klein etc. And use this as the first frame guider in LTX. Should work well, in theory ;-) (combined with masking and inpaint)

Been meaning to make a workflow for regular inpainting, but waited on Sam-3 support in Comfy (super easy masking). Thats now finally added ;-)
So might be time to make an inpaint workflow (or two... also want to make a motion transfer one, where you can swap characters in videos)

Will try soon ;-)

Thank you, RuneXX. I'm looking forward to your inpaint workflow. In my case, I need to keep the original video intact while adding a crowd. I would love a V2V workflow that uses a reference guide image to do this. For example, I could edit the first frame (or any frame number where it comes visible) of the empty bleachers using Nano Banana (or a similar tool you mentioned) and then run it through a workflow that preserves the original video while adding an animated crowd based on the reference. With all your great workflows, I feel like a kid in candy store :)

yes you would use that first frame + mask.. So that LTX just used that first frame as reference in that particular area (added people/crow area).
But can work perfectly fine without. A starter reference image gives you a bit more control though, to guide the result, but just prompting it can work as well (and sometimes even better, letting LTX do it its own way, instead of trying to "mimic" the input image that it might or not be good at)

Sign up or log in to comment