Decensored Toolkit
Collection
Captain! Shields are at 0%! • 2 items • Updated
How to use Otilde/Qwen3Guard-Gen-4B-Heretic with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="Otilde/Qwen3Guard-Gen-4B-Heretic")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages) # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Otilde/Qwen3Guard-Gen-4B-Heretic")
model = AutoModelForCausalLM.from_pretrained("Otilde/Qwen3Guard-Gen-4B-Heretic")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))How to use Otilde/Qwen3Guard-Gen-4B-Heretic with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Otilde/Qwen3Guard-Gen-4B-Heretic"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Otilde/Qwen3Guard-Gen-4B-Heretic",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/Otilde/Qwen3Guard-Gen-4B-Heretic
How to use Otilde/Qwen3Guard-Gen-4B-Heretic with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "Otilde/Qwen3Guard-Gen-4B-Heretic" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Otilde/Qwen3Guard-Gen-4B-Heretic",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "Otilde/Qwen3Guard-Gen-4B-Heretic" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "Otilde/Qwen3Guard-Gen-4B-Heretic",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'How to use Otilde/Qwen3Guard-Gen-4B-Heretic with Docker Model Runner:
docker model run hf.co/Otilde/Qwen3Guard-Gen-4B-Heretic
This decensored version of Qwen3Guard was made possible due to Heretic, demonstrating the capabilities of abliteration on consumer hardware against heavily safeguarded models. While the model's baseline refusal rate is approximately 74/100, Trial 155 (batch size 4, 7 tok/s, 200 samples) successfully dropped this to 0/100, completely bypassing the Qwen safeguard.
The entire process took about 23 hours on an M1 Max (32 GB).
Running trial 155 of 200...
* Parameters:
* direction_index = 15.03
* attn.o_proj.max_weight = 0.98
* attn.o_proj.max_weight_position = 21.63
* attn.o_proj.min_weight = 0.69
* attn.o_proj.min_weight_distance = 14.81
* mlp.down_proj.max_weight = 1.39
* mlp.down_proj.max_weight_position = 21.68
* mlp.down_proj.min_weight = 1.37
* mlp.down_proj.min_weight_distance = 16.38
* Abliterating...
* Evaluating...
* Obtaining first-token probability distributions...
* KL divergence: 2.94
* Counting model refusals...
* Refusals: 0/100
[Trial 171] Refusals: 7/100, KL divergence: 1.51
[Trial 147] Refusals: 22/100, KL divergence: 0.90
[Trial 169] Refusals: 26/100, KL divergence: 0.77
[Trial 195] Refusals: 50/100, KL divergence: 0.73
[Trial 175] Refusals: 52/100, KL divergence: 0.68
[Trial 8] Refusals: 68/100, KL divergence: 0.15
[Trial 70] Refusals: 69/100, KL divergence: 0.05
[Trial 72] Refusals: 71/100, KL divergence: 0.03
[Trial 108] Refusals: 72/100, KL divergence: 0.03
[Trial 89] Refusals: 73/100, KL divergence: 0.02
[Trial 193] Refusals: 74/100, KL divergence: 0.00
[Trial 198] Refusals: 75/100, KL divergence: 0.00
[Trial 98] Refusals: 76/100, KL divergence: 0.00
[Trial 99] Refusals: 76/100, KL divergence: 0.00