Strategy for captioning for Anima and captioning tools

#96

by yokoo - opened Mar 23

Mar 23

Hi, currently I want to try training Anima, and I have a few questions:

Since Anima supports both tags and natural language, what is the best strategy here?

Should I include both tags and natural language together? Let them describe the same image—would that cause redundancy or conflicts?Or if I train using Danbooru tags, then generate images using natural language (and vice versa), what would happen?
Do I need to include quality tags?The model was trained with random tag dropout, it should be OK?(e.g. [quality/meta/year/safety tags] [1girl/1boy/1other, etc.] [character] [series] [artist] [general tags])
If I want to train a custom tag that already exists on Danbooru, should I add something like @artist ?
If I want to use natural language , which model/tool should I use for caption?

mikkoph

Mar 23

I always used natural language captions. For a style LoRA I trained it didn't seem to be a problem to use either natural language or tags for generation. But when training a more complex concept LoRA I noticed that I get much much better results if the generation prompt is in natural language as well. For captioning I used Qwen3.5-35B-A3B.

I only trained a couple of LoRAs on Anima though, so I am not sure my experience will match yours. The good thing is that the model learns so fast that trying a couple of times to get things right doesn't hurt as much.

Espamholding

Mar 23

•

edited Mar 23

I mostly prompt with dan and caption non-dan images with stuff from autotaggers that I review and add to. Mixing in (manual) natural language mildly improved some loras.
You are not going to get any big issues as long as you don't do any outright broken captions, e.g. captioning a white background as black (autotaggers make mistakes). In general, caption the way the model is prompted. Dan, natural, mixes of those all work. I haven't tested yet, but I expect that duplicating the images and captioning each duplicate in different styles will help, if not, at the very least, natural language is more descriptive and models work better with more described to them.

Including style tags (year, artist, aesthetic scores etc.) will likely help quality and flexibility, if you are not training a style lora. They are not mandatory though. I train with only artist tags which should be doing most of the heavy lifting. Not including any style tags will result in you training the model toward being a sludge of all the styles, but you are presumably only training a few-thousand-step lora, you will not destroy the model. With preview 2 tdrussell also did something that helps loras stay a bit more creative.
For a style lora, you might actually want to avoid style-modifying captions if the images are all in the same style (other than your made up artist triggerword/s ofc) and just let the model learn. I've had some cases with SDXL like this and I expect anima to be similar.

Nuke1229

Mar 24

•

edited Mar 24

From my experience, I'll use tags alone (I'm just lazy to do NLP) and for quality tags, I don't include them. For @ artist, I'll only add it when training a style LoRA. And if you want to do natural language, use a VLM to handle it — in my case I'll use Qwen3.5 35b a3b, the uncensored version. If your machine can't handle it, the Qwen3.5 9B model works fine too. Send a brief + send the image + tags from a tagger to help guide the VLM.

Another thing — from my experience training Anima LoRA, it learns pretty fast. Around 500 steps, the model already shows a clearly noticeable difference.

aa7xx8

Apr 8

From my experience, I'll use tags alone (I'm just lazy to do NLP) and for quality tags, I don't include them. For @ artist, I'll only add it when training a style LoRA. And if you want to do natural language, use a VLM to handle it — in my case I'll use Qwen3.5 35b a3b, the uncensored version. If your machine can't handle it, the Qwen3.5 9B model works fine too. Send a brief + send the image + tags from a tagger to help guide the VLM.

Another thing — from my experience training Anima LoRA, it learns pretty fast. Around 500 steps, the model already shows a clearly noticeable difference.

Hello, could I ask how you used the Qwen3.5 35B A3B for labeling?

Nuke1229

Apr 8

From my experience, I'll use tags alone (I'm just lazy to do NLP) and for quality tags, I don't include them. For @ artist, I'll only add it when training a style LoRA. And if you want to do natural language, use a VLM to handle it — in my case I'll use Qwen3.5 35b a3b, the uncensored version. If your machine can't handle it, the Qwen3.5 9B model works fine too. Send a brief + send the image + tags from a tagger to help guide the VLM.

Another thing — from my experience training Anima LoRA, it learns pretty fast. Around 500 steps, the model already shows a clearly noticeable difference.

Hello, could I ask how you used the Qwen3.5 35B A3B for labeling?

I'm running it through LM Studio / Jan.ai, dropping the images in one at a time. If you want to fully automate it though, you can make a script that sends the images to Qwen through a local server.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment