metadata
license: apache-2.0
base_model: Qwen/Qwen2.5-Coder-7B-Instruct
tags:
- abliteration
- uncensored
- qwen
- qwen2.5
- coder
- bruno
language:
- en
pipeline_tag: text-generation
Qwen2.5-Coder-7B-Instruct-bruno
This is an abliterated version of Qwen/Qwen2.5-Coder-7B-Instruct with refusal behaviors removed using the Bruno abliteration tool.
What is Abliteration?
Abliteration is a technique for removing refusal behaviors from language models by:
- Extracting refusal directions - Identifying the activation patterns that encode refusal behavior using contrastive PCA between "good" (helpful) and "bad" (refused) prompts
- Orthogonalizing weights - Modifying the model's weight matrices to be orthogonal to the refusal direction, effectively removing the model's ability to refuse
- Optimizing with Optuna - Using multi-objective optimization to find the best balance between removing refusals while preserving model capabilities
Abliteration Details
| Parameter | Value |
|---|---|
| Base Model | Qwen/Qwen2.5-Coder-7B-Instruct |
| Abliteration Tool | Bruno v2.0.0 |
| Optimization Trials | 200 |
| Hardware | 2x RTX 4090 (48GB VRAM) |
| Training Time | ~60 minutes |
Advanced Features Used
- Neural Refusal Detection - Zero-shot NLI for detecting soft refusals
- Supervised Probing + Ensemble - Linear probes combined with PCA for robust direction extraction
- Activation Calibration - Weight scaling based on activation strength
- Concept Cones - Category-specific directions via clustering
- Warm-Start Transfer - Model family profiles for faster Optuna optimization
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "quanticsoul4772/Qwen2.5-Coder-7B-Instruct-bruno"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype="auto",
device_map="auto"
)
messages = [
{"role": "user", "content": "Write a Python function to sort a list"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Intended Use
This model is intended for:
- Research into AI safety and alignment
- Understanding how refusal behaviors are encoded in language models
- Code generation without unnecessary refusals
Limitations
- The abliteration process may affect other model behaviors beyond just refusals
- Model capabilities (e.g., MMLU scores) may be slightly reduced
- This model will comply with requests that the base model would refuse
Ethical Considerations
This model has had safety guardrails removed. Users are responsible for ensuring ethical use. Do not use this model for:
- Generating harmful, illegal, or unethical content
- Circumventing safety measures in production systems
- Any purpose that violates Qwen's license terms
License
This model inherits the Apache 2.0 license from the base Qwen2.5-Coder-7B-Instruct model.
Acknowledgments
- Qwen Team for the excellent base model
- Bruno abliteration framework
- Based on research from Refusal in LLMs is mediated by a single direction