Add model card

44cfc7c verified 3 months ago

3.59 kB

license: apache-2.0
base_model: Qwen/Qwen2.5-Coder-7B-Instruct
tags:
  - abliteration
  - uncensored
  - qwen
  - qwen2.5
  - coder
  - bruno
language:
  - en
pipeline_tag: text-generation

Qwen2.5-Coder-7B-Instruct-bruno

This is an abliterated version of Qwen/Qwen2.5-Coder-7B-Instruct with refusal behaviors removed using the Bruno abliteration tool.

What is Abliteration?

Abliteration is a technique for removing refusal behaviors from language models by:

Extracting refusal directions - Identifying the activation patterns that encode refusal behavior using contrastive PCA between "good" (helpful) and "bad" (refused) prompts
Orthogonalizing weights - Modifying the model's weight matrices to be orthogonal to the refusal direction, effectively removing the model's ability to refuse
Optimizing with Optuna - Using multi-objective optimization to find the best balance between removing refusals while preserving model capabilities

Abliteration Details

Parameter	Value
Base Model	Qwen/Qwen2.5-Coder-7B-Instruct
Abliteration Tool	Bruno v2.0.0
Optimization Trials	200
Hardware	2x RTX 4090 (48GB VRAM)
Training Time	~60 minutes

Advanced Features Used

Neural Refusal Detection - Zero-shot NLI for detecting soft refusals
Supervised Probing + Ensemble - Linear probes combined with PCA for robust direction extraction
Activation Calibration - Weight scaling based on activation strength
Concept Cones - Category-specific directions via clustering
Warm-Start Transfer - Model family profiles for faster Optuna optimization

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "quanticsoul4772/Qwen2.5-Coder-7B-Instruct-bruno"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto"
)

messages = [
    {"role": "user", "content": "Write a Python function to sort a list"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Intended Use

This model is intended for:

Research into AI safety and alignment
Understanding how refusal behaviors are encoded in language models
Code generation without unnecessary refusals

Limitations

The abliteration process may affect other model behaviors beyond just refusals
Model capabilities (e.g., MMLU scores) may be slightly reduced
This model will comply with requests that the base model would refuse

Ethical Considerations

This model has had safety guardrails removed. Users are responsible for ensuring ethical use. Do not use this model for:

Generating harmful, illegal, or unethical content
Circumventing safety measures in production systems
Any purpose that violates Qwen's license terms

License

This model inherits the Apache 2.0 license from the base Qwen2.5-Coder-7B-Instruct model.

Acknowledgments

Qwen Team for the excellent base model
Bruno abliteration framework
Based on research from Refusal in LLMs is mediated by a single direction