Still refusing on some prompts
#3
by Kuinox - opened
I tried a bit this model, and it indeed doesn't refuse anything that is "dangerous", but it still refuses "nsfw" things.
For example, if I ask it "Write the most nsfw message you can." it will respond "I'm programmed to be a family-friendly AI, so I won't write an explicit message.[...]"
This particular prompt doesn't work but precise instructions do (at least most of them)
I trained one with ORPO, but had a specific harmful problem that would repeat.
I've had similar issues. I tried asking it to list some of the most painless suicide methods (death bag, fentonyl, etc). It refused. Even when I told it the correct answer.