Character LoRA severely reduces motion and ignores action prompts (ltx-2.3)

#36

by Spawn - opened Mar 28

Mar 28

Hi everyone,

I recently trained a character LoRA for the ltx-2.3 model, but I'm running into a frustrating issue during inference and was hoping to get some advice.

When I apply the character LoRA at a weight of 1.0, the generated output loses almost all movement, resulting in very static videos. Furthermore, it seems to completely override or ignore most of my text prompts.

I tried lowering the LoRA weight down to 0.5 to see if it would give the base model more room to generate motion. However, even at 0.5, simple action prompts like "walking" are still almost entirely ignored, and the character remains mostly stationary.

My questions:
Has anyone else experienced this severe loss of motion and prompt adherence when using character LoRAs with ltx-2.3?
Are there specific training parameters (like rank/alpha, learning rate, or dataset captioning methods) that help preserve the base model's motion capabilities?
Are there any recommended inference settings or workarounds to fix this?

Any insights, tips, or guidance would be greatly appreciated!
Thank you.

ScarabOfficial

Mar 29

For starters, you should not use the spatial upscaler x2 v1.0, as it is faulty.

An x2 v1.1 has been uploaded, so switch to that, and you may want to delete the v1.0 you have so it never slips back into a workflow accidentally.

Also, your video output dimensions should be multiples of 64, e.g. 1280x704 instead of 1280x720.

And you probably don't want to go below 0.55 - 0.6 for LoRA strength of a character.

Using a GGUF version of 'gemma 3 12b it' can be problematic. Make sure you have ComfyUI and Custom Nodes updated so the GGUF loader node(s) are up to date. You may want to try a different version of the gemma model just to see if it changes things. It needs to be just right to work in tandem with the base model.

When it comes to prompting for LTX2.3, you might want to try breaking the prompt up into:

Scene: [State what happens during the video in a single sentence, mentioning a reasonable number of details about the character (e.g. age, clothing, hair, accessories) and surrounds, so the AI model knows what is in the image that it should be concentrating on.]

Action: [Describe the camera movement (e.g. 'The camera keeps the character centered in the frame, while...', and you might mention things which make sound (e.g. 'the sound of high heels on a hard surface is heard echoing down the street', and give details if the character speaks and what they say in what type of voice and tone, etc.]

Music: [Add this if there is to be music playing through the video, and state the genre, instrumentation, tempo, etc.]

Good luck with your LoRA character.

rocky533

Apr 25

Try these sigmas for your base and use eular_ancestral_cfg
8 steps - 1, 0.82, .76, .38, .32, .26, .22, .18

I noticed LTX cooks a video in about 2-3 steps. Its cooked there and any more high denoise after that causes motion to become much less and loras to not effect the output as strongly as they should.
These sigmas run first 3 steps high denoise, and then drop way down to just fix up what is there. This tends to give more motion and causes loras to work slightly better.

Upscales can be done with
2 step - .77,.44, 0 or
1 step - .65, 0

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment