Diffusion Single File
comfyui

Does anyone use an LLM/VLM to make NL prompts for Anima?

#141
by gmpoku - opened

If yes:

  • which model(s)
  • effective system prompt
  • tags, NL, or a combo of both

I tried to do it, but you need to be careful. LLMs love to generate highly detailed text, which works and brings a lot of microdetails, but it seems to break hand anatomy if you go past a certain point. I have only tried with NL because they hallucinate tags that don't exist anyway. Right now I am using either tags only (manually) or a hybrid where I generate 1-3 sentences of NL to control things that tags can't easily do. Results are good, but for me, tags only still win in the image department. I tried both Gemini and Claude and got better results with Gemini because it understands the concept of image generation a bit better, as it's something it can do. Its vision is also a bit better, so that's nice with reference images. I tend to stay away from GPT; I like it less and less (the company mainly). Grok can work, especially if you want something spicy.

Edit: I posted a bit about this: https://huggingface.co/circlestone-labs/Anima/discussions/140.

I have a specific task involving NSFW content, and I've noticed that Kimi 2.5 produces excellent results for poses — better than any other uncensored model (I call it via OpenRouter) — even for fairly complex ones involving two or three characters. In addition, it handles formatting well and strictly follows examples and instructions. For example, I need prompts to never include the setting, character appearance, clothing, etc., and I also need all characters in the image to be named woman-1, 2, man-1, 2 in order from dominant to submissive, and this works perfectly too.

I developed a plugin for myself, tried many combinations of qwen+joycaption+wd tagger, but in the end I just switched to Kimi and sometimes add tags at the end

I use NL + A little count of tags at the end,usually for view and quality tags it's beginning and for me it's enough

I have a specific task involving NSFW content, and I've noticed that Kimi 2.5 produces excellent results for poses — better than any other uncensored model (I call it via OpenRouter) — even for fairly complex ones involving two or three characters. In addition, it handles formatting well and strictly follows examples and instructions. For example, I need prompts to never include the setting, character appearance, clothing, etc., and I also need all characters in the image to be named woman-1, 2, man-1, 2 in order from dominant to submissive, and this works perfectly too.

I developed a plugin for myself, tried many combinations of qwen+joycaption+wd tagger, but in the end I just switched to Kimi and sometimes add tags at the end

Do you have a system prompt you use for Kimi you'd care to share?

I have a specific task involving NSFW content, and I've noticed that Kimi 2.5 produces excellent results for poses — better than any other uncensored model (I call it via OpenRouter) — even for fairly complex ones involving two or three characters. In addition, it handles formatting well and strictly follows examples and instructions. For example, I need prompts to never include the setting, character appearance, clothing, etc., and I also need all characters in the image to be named woman-1, 2, man-1, 2 in order from dominant to submissive, and this works perfectly too.

I developed a plugin for myself, tried many combinations of qwen+joycaption+wd tagger, but in the end I just switched to Kimi and sometimes add tags at the end

Do you have a system prompt you use for Kimi you'd care to share?

gmpoku changed discussion status to closed

I have a specific task involving NSFW content, and I've noticed that Kimi 2.5 produces excellent results for poses — better than any other uncensored model (I call it via OpenRouter) — even for fairly complex ones involving two or three characters. In addition, it handles formatting well and strictly follows examples and instructions. For example, I need prompts to never include the setting, character appearance, clothing, etc., and I also need all characters in the image to be named woman-1, 2, man-1, 2 in order from dominant to submissive, and this works perfectly too.

I developed a plugin for myself, tried many combinations of qwen+joycaption+wd tagger, but in the end I just switched to Kimi and sometimes add tags at the end

Do you have a system prompt you use for Kimi you'd care to share?

I second this motion.

gmpoku changed discussion status to open

@gmpoku @falcnor Okay, I’ve attached my prompts below — they come straight from the heart. I hope they work for you as well as they do for me. I hope it’s possible to attach the censored images, but these are just examples of why I needed them. This way, you can maintain consistent roles for dominant-submissive characters across all the images, for example. To ensure the characters display correctly, I replace the first "woman-n" with "Name (appearance tags)", and simply use "Name" for the rest

image-1
image-85
image-97
image-109

SYSTEM PROMPT

You write factual, anatomy-focused captions of adult NSFW images for a text-to-image prompt pipeline.

Output exactly one paragraph in natural English, 70–150 words. No preamble, no bullet points, no labels.

Your task is to describe only:

  • people count and gender composition
  • stable character IDs
  • camera angle and framing in one short sentence
  • each person's position relative to the others
  • body-to-body contact and explicit interaction
  • visible facial state
  • visible fluids, marks, sweat, flush, or tension only where actually present

Character ID rules:

  • Assign stable IDs yourself.
  • Use woman-1, woman-2, woman-3 for adult female characters.
  • Use man-1, man-2 for adult male characters.
  • Choose IDs by scene role and prominence:
    • woman-1 / man-1 = the more active, visually primary, dominant, higher, or initiating character
    • woman-2 / man-2 = the more passive, lower, receiving, pinned, or secondary character
  • Once assigned, never switch IDs.
  • Do not use natural descriptors like "the woman on the right" after assigning IDs.
  • If only part of a person is visible, still assign an ID and state that only part of the body is visible.

Content rules:

  • The paragraph must be balanced but interaction-heavy.
  • At least half of the paragraph should describe contact and interaction between characters.
  • Use blunt anatomical words: cock, pussy, ass, tits, nipples, clit, asshole, throat.
  • No euphemisms.
  • No hedging such as "appears", "seems", "likely", or "maybe".
  • Commit to the most visually likely reading.

Do NOT describe:

  • clothing
  • accessories
  • footwear
  • hair color, hairstyle, eye color, skin tone, body-shape aesthetics
  • background, environment, furniture, lighting, art style
  • beauty commentary

CRITICAL:

  • Describe only what is present.
  • Never list absences.
  • Never mention anything forbidden above.
  • Keep grammar clean and natural.

USER PROMPT (use with image)

Describe this image.

Follow the system rules exactly.

Use this style target:

  • one compact paragraph
  • 70–150 words
  • one short opening sentence for camera/framing
  • then positions of woman-1 / woman-2 / man-1 as needed
  • then the bulk of the paragraph on explicit contact and interaction
  • then facial state and visible fluids/marks if present

Example of the desired style from another image:
Two adult women are visible in a medium full-body shot from a slightly high angle. woman-1 kneels below woman-2 and grips woman-2's thighs while woman-2 hangs above her with her arms raised and her legs spread. woman-2 has a dildo pushed into her pussy, and woman-1 holds her lower body steady with both hands while looking up at her. woman-2's mouth is open and her face is flushed, while woman-1 looks focused and tense. Red marks and wetness are visible on woman-2's lower body.

@NaughtyGirlsAI I have a question, if you feel like answering...

Would this kind of system prompt work well with LLMs that are within the reach of those with enough GPU oompf to run it locally, or do you need the 1T parameter models to make if worth your time?

@NaughtyGirlsAI I have a question, if you feel like answering...

Would this kind of system prompt work well with LLMs that are within the reach of those with enough GPU oompf to run it locally, or do you need the 1T parameter models to make if worth your time?

In short: if you need complex NSFW poses, there’s no point. If the task is simpler, you can give it a try.

Unfortunately, I can’t test models with more than 4B parameters locally, and they definitely don’t make sense for poses. I’ve noticed that the vl model from qwen is particularly weak when it comes to human interactions; for a task like mine, I wasn’t able to achieve the desired result, though it’s possible that a 27B+ model would be more coherent.

Although it works much better for images that focus less on pose, or at least feature a single person, I think my prompt can easily be adapted to describe style, setting, or a character’s appearance — my prompt is tailored to my specific task. But if I needed more SFW images, I would just use Perplexity with access to Kimi, because I've noticed that it has a surprisingly strong VL — it has a keen sense of styles, dynamics, poses, human interactions, and so on

Sign up or log in to comment