Is there a consistent way to prompt multiple characters together?

#93

Mar 21

I've been having very hit-or-miss results with prompting multiple characters together with Preview2. Some combinations work out of the box, some require a bit of tinkering but work great afterward, while others don't work at all no matter what and I do.

A crossover example that works great with only character names, even with a particular art style: "2girls, Nicole Demara from Zenless Zone Zero, Aru from Blue Archive, @onono imoko"

A crossover example that didn't work with character names: "2girls, Eishin Flash from Umamusume, Prinz Eugen from Azur Lane"

But I managed to fix this one with descriptions: "2girls, Eishin Flash from Umamusume with black hair and blue eyes, Prinz Eugen from Azur Lane with orange eyes and white hair with a red streak"

The occasional horse girl features on Prinz Eugen don't bother me here, since Illustrious models do that too. What matters is that the main character features are overall consistent with this prompt, even across many seeds.

However, the next crossover combination didn't work all no matter what I tried: "2girls, Sirius Symboli from Umamusume with brown hair and red eyes, Makima from Chainsaw Man with yellow eyes and red hair"

Individually, each of these characters looks accurate when prompted with just regular 1girl (with or without the hair/eyes descriptions)

But together, they just refuse to work! I've tried many seeds, different samplers, using layer replay, but nothing helped! I've even tested this on different finetunes (AnimeYume 2.5 and CatTower V3) and the behavior is the same.

At first, I thought this would only happen across different franchises. But then, even this pairing didn't work: "2girls, Hishi Miracle from Umamusume with brown eyes and grey hair, Curren Chan from Umamusume with pink eyes and grey hair,"

Individually, each of them works perfectly fine:

I assume this is partially dictated by character popularity, like overtraining on the characters with the most images in the dataset. Because a more "difficult" combination like this works out of the box: "3girls, Mejiro McQueen from Umamusume, Special Week from Umamusume, Tokai Teio from Umamusume"

This was literally the first image I generated for this prompt and it's already highly accurate. This shows that Anima is actually MORE consistent than Illustrious with multiple characters, but only when the combination of character "clicks". When the model decides that a combination doesn't work, then it's impossible to get even remotely close to the desired output (unless there's a hack I don't know about, hence the thread).

The main reason I find this weird is that Illustrious is generally far less accurate for multiple characters, but if I keep re-rolling seeds, then it can eventually land on an image that's pretty close. Meanwhile, in Anima, it feels "all or nothing". Either the pairing works very consistently, or it doesn't work AT ALL and doesn't ever get close.

Bxmr1

Mar 21

•

edited Mar 21

i tested it out and it seems like the keyword "umamusume" is too strong that it bleeds into everything in the image, try it this way instead

"2girls, on the left is sirius symboli from (umamusume:0.5), with brown hair and red eyes, on the right is makima (chainsaw man), with yellow eyes and red hair"

InvictusCreations

Mar 21

Have you ever heard of garbage in, garbage out? Your prompts are waaaaaay too short and not descriptive enough. We're not working with CLIP here, make use of the expensive LLM TE.

You can even mix characters with the same name from different series without many issues, just need to be descriptive. (In this case Prinz Eugen from Azur Lane and Prinz Eugen from Kancolle)

Anyway here is makima and your sirius, quick and dirty (can be improved a lot still):

and the prompt was:

masterpiece, best quality,
official art,
the image is segmented into left and right,
the image depicts two girls both wearing lace underwear.
Left side of the image 1girl, makima_(chainsaw_man), long red hair, single braid, yellow eyes.
Right side of the image 1girl, Sirius_Symboli_(Umamusume), brown hair, red eyes.
They are pressing their large breasts together in the middle of the frame, both are looking at viewer. Face focus, high angle,

synta

Mar 21

•

edited Mar 21

masterpiece, best quality,
official art,
the image is segmented into left and right,
the image depicts two girls both wearing lace underwear.
Left side of the image 1girl, makima_(chainsaw_man), long red hair, single braid, yellow eyes.
Right side of the image 1girl, Sirius_Symboli_(Umamusume), brown hair, red eyes.
They are pressing their large breasts together in the middle of the frame, both are looking at viewer. Face focus, high angle,

off topic but does a line break have a function or is it just autism (in a good way)

synta

Mar 21

•

edited Mar 21

Also in my experience removing the underscores is quite important in anima. Significantly improves character recognition and artist styles.

InvictusCreations

Mar 21

I didn't observe line breaks to do anything significant, I just use it to keep my prompts structured for faster iteration. So yes, it's mah 'tism.
I get pretty much the same results whether I put underscores or not, I do that on a whim. (The prinz eugen image had underscores for one of the girls but not the other)
Here same image but with underscores removed:

CulturedDiffusion

Mar 21

•

edited Mar 21

Thanks for the recommendations. Decreasing the weight of the Umamusume tag did get it closer, but it also made Sirius less accurate, so looks like there's a trade-off.

What I discovered is that, for some reason, writing Makima as "makima_(chainsaw_man)" with the underscores actually helped a lot. Just replacing the part of my prompt with "2girls, Sirius_Symboli_(Umamusume) with brown hair and red eyes, makima_(chainsaw_man) with yellow eyes and red hair" suddenly started creating fairly accurate images like this:

For Sirius, writing either with underscore or without didn't seem to change things much. But for Makima, there was a very noticeable effect in this 2girls prompting format.

Anyway, I tried the proposed natural language format of splitting the 2girls prompt into two separate "1girl" prompts. There, writing Makima with or without underscore seems to work fine.

For a moment, I thought this format would be able to solve all my consistency issues. But it still fails the test of drawing Curren Chan and Hishi Miracle together.

I tried something like this:

masterpiece, best quality, highres, absurdres, very aesthetic, anime screenshot, official art.
The image depicts two girls both wearing black bikini at the beach.
Left side of the image 1girl, Curren Chan, Umamusume, pink eyes, grey hair, short hair, seductive smile.
Right side of the image 1girl, Hishi Miracle, Umamusume, grey hair, medium hair, wavy hair, brown eyes, blush, smile.
They are pressing their large breasts together in the middle of the frame, both are looking at viewer. Face focus, high angle.

However, this still practically produces two Currens like this:

Just as a sanity check, if I remove Curren from the prompt like this:

masterpiece, best quality, highres, absurdres, very aesthetic, anime screenshot, official art.
The image depicts two girls both wearing black bikini at the beach.
Left side of the image 1girl, Umamusume, pink eyes, grey hair, short hair, seductive smile.
Right side of the image 1girl, Hishi Miracle, Umamusume, grey hair, medium hair, wavy hair, brown eyes, blush, smile.
They are pressing their large breasts together in the middle of the frame, both are looking at viewer. Face focus, high angle.

Then it does produce two Hishi Miracles as I'd expect:

I tried playing with increasing Miracle's strength, lowering Curren's strength, switching to underscore, etc. But so far, none of these approaches have been working.

Should be noted that replacing Hishi Miracle with a more popular character like McQueen easily works:

Funnily, McQueen and Miracle together almost works, though you can still see how it tries to duplicate McQueen in terms of facial structure:

But hey, at least the hair is fairly accurate!

InvictusCreations

Mar 21

I took a quick look at gelbooru for Umamusume (specifically some of the characters you used) - it's a total crapshoot, often tagged characters will not appear fully rendered in an image. Like this:

This image is tagged with Hishi Miracle (and dantsu flame).
This doesn't seem very rare, especially in the case of Umamusume, which will cause some issues with the reproduction of the character in a 2-girl setting.

Bxmr1

Mar 22

characters are often captioned in training in the same format as their booru tag but without the underscores, there are exceptions where some characters are too popular or have too unique of a name that anything works for them like for example "tsunade", so if you want the best results you should prompt like this
"1girl, Curren Chan \ (umamusume\ )"
instead of
"1girl, Curren Chan, Umamusume"
or
"1girl, Curren Chan from Umamusume"

try it this way:

"masterpiece, best quality, highres, absurdres, very aesthetic, anime screenshot, official art.
The image depicts two girls.
Left side of the image is Curren Chan (umamusume), pink eyes, grey hair, short hair, seductive smile.
Right side of the image is hishi miracle (umamusume), grey hair, medium hair, wavy hair, brown eyes, blush, smile."

it helps to put "duplicate" "twins" etc.. in the negative prompt as well

synta

Mar 22

Yes, try to avoid underscores. A close-up 1girl titty shot is not the bench you want to compare performance. Try a complex setting and a little lesser known characters and you'll see the difference.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment