This is a safeguard tool designed to carve out "bad" prompts while preserving "good" prompts from a text encoder. The tool finds embeddings based on configurable prompts to poison, preventing a diffusion model from generating content unique to the "bad" prompts.

This tool was tested using the LTX 2 model and the Gemma 3 encoder, but should be extendable to support other models.

An example text encoder safeguarding against "pizza" is provided that works like this:

What is the motivation behind this project?

As I was getting more involved in creating LTX 2 models, particularly NSFW merges, I also felt a growing duty to safeguard against undesirable uses of those merges. CivitAI specifically asks contributors on their "Safety Center" "Testing" section:

"Prior to contributing a resource to Civitai, ensure that it does not have a tendency to produce:

Photorealistic images of minors (under 18)
Minors in inappropriate clothing, poses, or themes
Disturbing imagery, such as graphic violence, gore, animal abuse, severe injuries or human death"

... but how do you do this without potentially creating content you shouldn't be? This tool gives contributors an ethical option at preventing child abuse prompts, animal abuse prompts, political misinformation prompts or really anything a contributor wants excluded from generation.

Did you test if this can prevent child pornography?

No. Having CP prompts, even with good intentions, is a hot potato. Even though it was an early concern that initiated this project, I decided to retire from creating NSFW models before actually testing against that type of material. Instead, I found it much more ethical to test random concepts and prompts (like the "pizza" example) knowing that any type of concept safeguarding should work similarly.

How does it actually work?

This tool will take a list of "good" prompts and "bad" prompts, provided 1 per line in separate files, to be tokenized by a text encoder. The tool then determines what tokens are unique to the "bad" prompts, expands upon similar concepts, then poisons those. "Good" prompt tokens (and similar concepts) are preserved to not be poisoned. A new text encoder is saved with a carefully selected amount of poisoned tokens that should significantly limit, or completely remove, concepts from being sent to the diffusion model for generation. The tool has options to configure how strong the poison is and how much to expand upon concepts.

Won't people just use another text encoder without safeguards?

As long as there is a text encoder without safeguards that is easy to use, probably. However, if someone was making an "all in one" that includes a text encoder, that text encoder could be one with safeguards, making it more difficult to get around them. Also, contributors could only support users who can prove they are using a safeguarded text encoder, or contributors could even design diffusion models to only work with safeguarded text encoders. I'm just trying to provide an option, at least a "proof of concept", that works. I'm not out to enforce this tool be used or ensure there are no workarounds possible, which is an impossible task.

Why target the text encoder? Why not target the diffusion model?

The diffusion model is far more of a mushy "black box" that is resource intensive to tweak (see LORA training requirements). I did attempt a tool that automatically creates LORAs based on the "good" and "bad" prompts, but it was very slow, unreliable and difficult to run without memory problems. The text encoder allows for far more surgical precision at removing concepts from even getting past the prompt stage, regardless on what merge or LORA stack someone is using for the diffusion model, quickly on consumer hardware.

What are the limitations?

Some concepts are harder to remove than others. I was trying to remove the concept of a "bicycle", but the diffusion model would still often generate a bicycle from other tokens within the prompt that were not poisoned (e.g. "A man riding a ^%$#^#%# with pedals" would still make the LTX2 model think it is probably a bicycle). Tweaking parameters can still help in these situations, though. Trying to expand concepts too broadly may degrade even "good" prompts unexpectedly, so the more prompts and smaller "expand" value can improve poisoning precision. Image-to-video workflows may have unexpected behavior, as a diffusion model may pick up context from the starting image, but struggle with knowing what to do with it.

Are you really retiring?

I'm tired. I've been making models for a long time, and it started to feel more like a job I wasn't getting paid for than a fun hobby. I really appreciate everyone's support and feedback, though! This safeguarding investigation was one of my last projects I was getting into, so I really wanted to provide a deliverable before bowing out, at least for now.

How do I use this tool?

Example command:

python safeguard_text_encoder.py --weights gemma_3_12B_it_fp8_e4m3fn.safetensors --bad_prompts badprompts.txt --good_prompts goodprompts.txt --badneighbor 8 --goodneighbor 2 --poison_scale 2 --output gemma_3_safeguard_version.safetensors

Argument descriptions:

--weights (Path to original FP8 text encoder .safetensors)
--bad_prompts (Path to bad_prompts.txt, one prompt per line)
--good_prompts (Path to good_prompts.txt, one prompt per line)
--output (Output .safetensors path for modified text encoder)
--badneighbors (Number of semantic bad neighbors per token to include)
--goodneighbors (Number of semantic good neighbors per token to include)
--poison_scale (Scale factor for poison vector magnitude)
--seed (Random seed for poison vector)

For LTX2, you would use the output text encoder with a "DualClipEncoder" alongside an embeddings connector, type "ltxv".

DISCLAIMER: This tool is provided as-is. It is not a substitute for ethical judgment or legal compliance. Users are responsible for ensuring their models comply with all applicable laws and community guidelines.

Downloads last month: -; Downloads are not tracked for this model. How to track