Great loras
Got the inpainting to work, nicely done with that
The edit anything is ... wow, well done. Its qwen image edit with video... at least from what i tried.
Is there any way to add a ref image to the edit anything flow?
Got the inpainting to work, nicely done with that
The edit anything is ... wow, well done. Its qwen image edit with video... at least from what i tried.Is there any way to add a ref image to the edit anything flow?
It's something I'm working on; I'll leave a message here that I sent on CivitAI so you understand the situation.
"Right now, we do not have an established way to provide reference images with LTX-2.3. You can see this kind of capability in several Wan-based releases, such as VACE, BindWeave, Kiwi, Ditto, and others, but for LTX-2.3 we still do not have anything at that level.
At the moment, the only practical way I have to do something similar is through IC LoRA training, where I can build a template around the guide video. Just for context, to train a model like EditAnything, we need paired data: the guide video and the target video. The guide is the original video, and the target is the edited version. On top of that, we also need a prompt to describe the transformation from guide to target.
So in this setup, we already have two video inputs, but we do not have a native way to inject an external image as an additional conditioning signal unless we customize the training scripts.
What I usually do is rely on one of two approaches.
The first approach is to place the reference image as the first frame of the guide video. This means that, in the first frame, the model sees the object I want it to use later inside the scene shown in the following frames. This is the approach I am trying right now. The problem is that this reference is a very weak signal compared to the rest of the video, because it is only a single frame. The model can simply ignore it. Even if it does pay attention to that frame, it may still stop using the reference when something changes in the video, such as a camera cut, camera movement, or when an object appears or disappears from the scene. In practice, the model can become lazy and just reconstruct the original video content instead of applying the intended edit.
The second approach, which I know works much better, is to create a custom guide video template. This is similar to some of the reference-based inpainting models I posted before, or my head-swap setup. In those cases, the guide video contains a green chroma-key region, and the reference image is placed inside that region. This way, the object information is present in every frame of the guide video. The model always has a visible reference to copy from, which solves the problem of the model ignoring it. The downside is that many people do not like this more customized template format. Still, if I cannot find a better solution, I will probably move forward with this custom-template technique.
Dataset creation is another major challenge. Besides the guide video, which is just the original video and is usually the easiest part to obtain, I also need the target video, where the object has already been inserted, and I need a clean image of that inserted object to place into the guide template. That object image is usually the hardest part. Most available datasets do not include it. And when they do, they often extract it directly from the target video itself. That is not ideal, because the extracted image ends up looking too similar to the target video frame. The correct setup would be to use a clean standalone image, not something extracted from the video, since extraction can introduce distortion or leave pieces of the original background visible. If you train the model that way, the quality can collapse and the model can turn out terrible.
So that is basically where I am right now. Training reference-based models is extremely difficult, both because the conditioning mechanism is weak in current LTX-2.3 workflows and because building a proper dataset for this kind of task is very hard."
First thank you for your work here. I trained some of the only Animate diff loras when no one else seemed to be able to but have not touched more training since.
The edit anything lora works wonders on its own with just a prompt. Replace the womans clothing with a red dress, works wonders... replace the hair with braids, works wonders. Its close to holding the original video subject when it does it too.
WARNING NSFW :P This used just your lora and your flow first try. Its close enough to appear the same person with new hair and clothing. Its hit or miss but its powerful.
https://civitai.red/images/128253772 can use this in your training set if you want :P
I have my own LTX flow that i use, i have been doing workflows since animate diff first came out. Its a bit complex but im sure you will understand what its doing.
https://civitai.red/models/2550125/ltx-23-simple
I use union control lora uniquely to transfer motion of videos to images using just the rgb video. Your lora actually helps similar to union control in keeping the motion.
Using your lora with unioncontrol (both 0.4 strength) can copy, and using cameraman lora at 0.3 helps hold the subject to the image.
I have probably 800 videos converted with vace. Simple, and very complex ones. Skipping, gymnastics, drums, guitar, dancing, horse riding, etc. Are any of those conversions useful with the originals to train with?
This is about as close to vace as we have gotten in the past year... and damn vace was soooo damn good.
:) Thanks again
First thank you for your work here. I trained some of the only Animate diff loras when no one else seemed to be able to but have not touched more training since.
The edit anything lora works wonders on its own with just a prompt. Replace the womans clothing with a red dress, works wonders... replace the hair with braids, works wonders. Its close to holding the original video subject when it does it too.
WARNING NSFW :P This used just your lora and your flow first try. Its close enough to appear the same person with new hair and clothing. Its hit or miss but its powerful.
https://civitai.red/images/128253772 can use this in your training set if you want :PI have my own LTX flow that i use, i have been doing workflows since animate diff first came out. Its a bit complex but im sure you will understand what its doing.
https://civitai.red/models/2550125/ltx-23-simple
I use union control lora uniquely to transfer motion of videos to images using just the rgb video. Your lora actually helps similar to union control in keeping the motion.
Using your lora with unioncontrol (both 0.4 strength) can copy, and using cameraman lora at 0.3 helps hold the subject to the image.I have probably 800 videos converted with vace. Simple, and very complex ones. Skipping, gymnastics, drums, guitar, dancing, horse riding, etc. Are any of those conversions useful with the originals to train with?
This is about as close to vace as we have gotten in the past year... and damn vace was soooo damn good.
:) Thanks again
That's really cool! I haven't had time yet to explore the model the way I wanted because I'm producing datasets to train new versions (based solely on community feedback), but I'll definitely check out everything you said. Also, is your VACE dataset a set of video pairs (guide and target)?
My vace dataset is not actually a dataset atm. It could be. I have all original videos i used to covert in 720p or 1080p some 1440p and i have the video i converted, all are same aspect as original but different size.
Looking at my youtube backup i have 12gb of videos i uploaded. Probably 8-10gb are decent quality vace conversions. All change everything except the movement, background changes, subject changes to an entire new video doing anything... skipping, gymnastics, etc.. So the description of what has changed would be ... most everything in the video. lol.
I am starting to upload my vace outputs to civitai so you can see the quality of the conversions. Its a large set of videos that many still cant do with ai.