Improvement suggestion: Better multi-character LoRA support (reduced identity bleed / cross-contamination)

#202

by aanderson78 - opened 6 days ago

Anima Base v1 is excellent for single-character and style consistency, but combining multiple character LoRAs in one scene is currently quite hit or miss due to strong identity bleed, feature blending, or one LoRA overpowering the other.

What I've tried:

Natural language structured prompts with clear positional descriptions ("On the left: Character A, blue hair... On the right: Character B...")
Reduced LoRA weights (0.55–0.8)
Subject count tags (2girls/2boy, 1girl 1boy, etc.)
Various CFG / steps / samplers

It works reasonably for side-by-side non-interacting characters but breaks down quickly with physical contact/interactions.

For future versions (v1.1 / finetunes / Anima-Turbo, etc.), any architectural or training improvements that reduce cross-contamination between multiple character LoRAs while keeping the strong single-subject coherence would be extremely valuable!

goyishsoyish

5 days ago

There isn't a single model that solves this, it's a flaw of the lora technology. It would require actual research beyond the scope of a finetune to solve.

Sen-sou

5 days ago

•

edited 5 days ago

Maybe try implementing https://github.com/yaoliliu/FreeFuse support for Anima. I think this is the solution you are asking for.
or maybe this https://blog.comfy.org/p/masking-and-scheduling-lora-and-model-weights

sjmind

5 days ago

if there are only a few characters in your training data, and it happens that you have several standing pose from different angle.
You can try combine the characters with white background(segment the character).

if there is A,B, and D, you must at least have A+B, C+D and A+B+C (remember adding 2girls/3girls or similar tags), if you can add more it will be better.
if it happens that you have some images that some of A/B/C/D are interact with others, it's better.

no NL caption in the training data, booru style tags are enough.
every character solo images should be in a subset(different costumes are considered as a different character), every combine pattern images(like 2girls/3girls) should be a subset.
raise repeat numbers, so that (repeat num * single character solo image number) is similar to (repeat num * this combine subset images)

if there are a lot of characters(and costumes), the combine patterns increase horribly. Don't try cover every pattern. the most characters and costumes i tried is 9 characters with 27 costumes.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment