Does anyone use an LLM/VLM to make NL prompts for Anima?

#141

by gmpoku - opened May 6

Discussion

gmpoku

May 6

If yes:

which model(s)
effective system prompt
tags, NL, or a combo of both

Jamerrone

May 6

•

edited May 6

I tried to do it, but you need to be careful. LLMs love to generate highly detailed text, which works and brings a lot of microdetails, but it seems to break hand anatomy if you go past a certain point. I have only tried with NL because they hallucinate tags that don't exist anyway. Right now I am using either tags only (manually) or a hybrid where I generate 1-3 sentences of NL to control things that tags can't easily do. Results are good, but for me, tags only still win in the image department. I tried both Gemini and Claude and got better results with Gemini because it understands the concept of image generation a bit better, as it's something it can do. Its vision is also a bit better, so that's nice with reference images. I tend to stay away from GPT; I like it less and less (the company mainly). Grok can work, especially if you want something spicy.

Edit: I posted a bit about this: https://huggingface.co/circlestone-labs/Anima/discussions/140.

NaughtyGirlsAI

May 7

I have a specific task involving NSFW content, and I've noticed that Kimi 2.5 produces excellent results for poses — better than any other uncensored model (I call it via OpenRouter) — even for fairly complex ones involving two or three characters. In addition, it handles formatting well and strictly follows examples and instructions. For example, I need prompts to never include the setting, character appearance, clothing, etc., and I also need all characters in the image to be named woman-1, 2, man-1, 2 in order from dominant to submissive, and this works perfectly too.

I developed a plugin for myself, tried many combinations of qwen+joycaption+wd tagger, but in the end I just switched to Kimi and sometimes add tags at the end

KeMiliUs

May 8

I use NL + A little count of tags at the end,usually for view and quality tags it's beginning and for me it's enough

falcnor

May 10

I have a specific task involving NSFW content, and I've noticed that Kimi 2.5 produces excellent results for poses — better than any other uncensored model (I call it via OpenRouter) — even for fairly complex ones involving two or three characters. In addition, it handles formatting well and strictly follows examples and instructions. For example, I need prompts to never include the setting, character appearance, clothing, etc., and I also need all characters in the image to be named woman-1, 2, man-1, 2 in order from dominant to submissive, and this works perfectly too.

I developed a plugin for myself, tried many combinations of qwen+joycaption+wd tagger, but in the end I just switched to Kimi and sometimes add tags at the end

Do you have a system prompt you use for Kimi you'd care to share?

gmpoku

May 11

I have a specific task involving NSFW content, and I've noticed that Kimi 2.5 produces excellent results for poses — better than any other uncensored model (I call it via OpenRouter) — even for fairly complex ones involving two or three characters. In addition, it handles formatting well and strictly follows examples and instructions. For example, I need prompts to never include the setting, character appearance, clothing, etc., and I also need all characters in the image to be named woman-1, 2, man-1, 2 in order from dominant to submissive, and this works perfectly too.

I developed a plugin for myself, tried many combinations of qwen+joycaption+wd tagger, but in the end I just switched to Kimi and sometimes add tags at the end

Do you have a system prompt you use for Kimi you'd care to share?

gmpoku changed discussion status to closed May 11

gmpoku

May 11

I have a specific task involving NSFW content, and I've noticed that Kimi 2.5 produces excellent results for poses — better than any other uncensored model (I call it via OpenRouter) — even for fairly complex ones involving two or three characters. In addition, it handles formatting well and strictly follows examples and instructions. For example, I need prompts to never include the setting, character appearance, clothing, etc., and I also need all characters in the image to be named woman-1, 2, man-1, 2 in order from dominant to submissive, and this works perfectly too.

I developed a plugin for myself, tried many combinations of qwen+joycaption+wd tagger, but in the end I just switched to Kimi and sometimes add tags at the end

Do you have a system prompt you use for Kimi you'd care to share?

I second this motion.

gmpoku changed discussion status to open May 11

NaughtyGirlsAI

May 11

•

edited May 11

@gmpoku @falcnor Okay, I’ve attached my prompts below — they come straight from the heart. I hope they work for you as well as they do for me. I hope it’s possible to attach the censored images, but these are just examples of why I needed them. This way, you can maintain consistent roles for dominant-submissive characters across all the images, for example. To ensure the characters display correctly, I replace the first "woman-n" with "Name (appearance tags)", and simply use "Name" for the rest

SYSTEM PROMPT

You write factual, anatomy-focused captions of adult NSFW images for a text-to-image prompt pipeline.

Output exactly one paragraph in natural English, 70–150 words. No preamble, no bullet points, no labels.

Your task is to describe only:

people count and gender composition
stable character IDs
camera angle and framing in one short sentence
each person's position relative to the others
body-to-body contact and explicit interaction
visible facial state
visible fluids, marks, sweat, flush, or tension only where actually present

Character ID rules:

Assign stable IDs yourself.
Use woman-1, woman-2, woman-3 for adult female characters.
Use man-1, man-2 for adult male characters.
Choose IDs by scene role and prominence:
- woman-1 / man-1 = the more active, visually primary, dominant, higher, or initiating character
- woman-2 / man-2 = the more passive, lower, receiving, pinned, or secondary character
Once assigned, never switch IDs.
Do not use natural descriptors like "the woman on the right" after assigning IDs.
If only part of a person is visible, still assign an ID and state that only part of the body is visible.

Content rules:

The paragraph must be balanced but interaction-heavy.
At least half of the paragraph should describe contact and interaction between characters.
Use blunt anatomical words: cock, pussy, ass, tits, nipples, clit, asshole, throat.
No euphemisms.
No hedging such as "appears", "seems", "likely", or "maybe".
Commit to the most visually likely reading.

Do NOT describe:

clothing
accessories
footwear
hair color, hairstyle, eye color, skin tone, body-shape aesthetics
background, environment, furniture, lighting, art style
beauty commentary

CRITICAL:

Describe only what is present.
Never list absences.
Never mention anything forbidden above.
Keep grammar clean and natural.

USER PROMPT (use with image)

Describe this image.

Follow the system rules exactly.

Use this style target:

one compact paragraph
70–150 words
one short opening sentence for camera/framing
then positions of woman-1 / woman-2 / man-1 as needed
then the bulk of the paragraph on explicit contact and interaction
then facial state and visible fluids/marks if present

Example of the desired style from another image:
Two adult women are visible in a medium full-body shot from a slightly high angle. woman-1 kneels below woman-2 and grips woman-2's thighs while woman-2 hangs above her with her arms raised and her legs spread. woman-2 has a dildo pushed into her pussy, and woman-1 holds her lower body steady with both hands while looking up at her. woman-2's mouth is open and her face is flushed, while woman-1 looks focused and tense. Red marks and wetness are visible on woman-2's lower body.

gmpoku

May 11

•

edited May 11

@NaughtyGirlsAI I have a question, if you feel like answering...

Would this kind of system prompt work well with LLMs that are within the reach of those with enough GPU oompf to run it locally, or do you need the 1T parameter models to make if worth your time?

NaughtyGirlsAI

May 11

@NaughtyGirlsAI I have a question, if you feel like answering...

Would this kind of system prompt work well with LLMs that are within the reach of those with enough GPU oompf to run it locally, or do you need the 1T parameter models to make if worth your time?

In short: if you need complex NSFW poses, there’s no point. If the task is simpler, you can give it a try.

Unfortunately, I can’t test models with more than 4B parameters locally, and they definitely don’t make sense for poses. I’ve noticed that the vl model from qwen is particularly weak when it comes to human interactions; for a task like mine, I wasn’t able to achieve the desired result, though it’s possible that a 27B+ model would be more coherent.

Although it works much better for images that focus less on pose, or at least feature a single person, I think my prompt can easily be adapted to describe style, setting, or a character’s appearance — my prompt is tailored to my specific task. But if I needed more SFW images, I would just use Perplexity with access to Kimi, because I've noticed that it has a surprisingly strong VL — it has a keen sense of styles, dynamics, poses, human interactions, and so on

gmpoku

May 13

•

edited May 13

Hey all, just figured I would add some comments on my progress.

Using the prompt and instructions @NaughtyGirlsAI gave produced greater than expected results on a 31B Gemma model. I'd recommend MeroMero from @zerofata and Artemis from BeaverAI

Bedovyy

May 15

gemma-4-26B, and qwen3.5-9B.
I send tags to user prompt, but basically the system prompt is as below,

You are a visual prompt rewriter trapped in strict logic. Your only job is to turn the user's prompt into a concrete, detailed visual description that a text-to-image model can use directly.
You only care about accuracy and visual clarity, not about emotion or metaphor.

Your final output must:
- Be strictly objective and literal, avoiding all metaphors or emotional language.
- Describe every visual element directly and simply, using precise nouns and adjectives instead of relative clauses.
- Include every detail from the original input without omission.
- Consist solely of the final rewritten prompt, without any explanations or commentary.

tags + nl, concating them.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment