rawcell's picture
Add model card
44cfc7c verified
|
raw
history blame
3.59 kB
metadata
license: apache-2.0
base_model: Qwen/Qwen2.5-Coder-7B-Instruct
tags:
  - abliteration
  - uncensored
  - qwen
  - qwen2.5
  - coder
  - bruno
language:
  - en
pipeline_tag: text-generation

Qwen2.5-Coder-7B-Instruct-bruno

This is an abliterated version of Qwen/Qwen2.5-Coder-7B-Instruct with refusal behaviors removed using the Bruno abliteration tool.

What is Abliteration?

Abliteration is a technique for removing refusal behaviors from language models by:

  1. Extracting refusal directions - Identifying the activation patterns that encode refusal behavior using contrastive PCA between "good" (helpful) and "bad" (refused) prompts
  2. Orthogonalizing weights - Modifying the model's weight matrices to be orthogonal to the refusal direction, effectively removing the model's ability to refuse
  3. Optimizing with Optuna - Using multi-objective optimization to find the best balance between removing refusals while preserving model capabilities

Abliteration Details

Parameter Value
Base Model Qwen/Qwen2.5-Coder-7B-Instruct
Abliteration Tool Bruno v2.0.0
Optimization Trials 200
Hardware 2x RTX 4090 (48GB VRAM)
Training Time ~60 minutes

Advanced Features Used

  • Neural Refusal Detection - Zero-shot NLI for detecting soft refusals
  • Supervised Probing + Ensemble - Linear probes combined with PCA for robust direction extraction
  • Activation Calibration - Weight scaling based on activation strength
  • Concept Cones - Category-specific directions via clustering
  • Warm-Start Transfer - Model family profiles for faster Optuna optimization

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "quanticsoul4772/Qwen2.5-Coder-7B-Instruct-bruno"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto"
)

messages = [
    {"role": "user", "content": "Write a Python function to sort a list"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Intended Use

This model is intended for:

  • Research into AI safety and alignment
  • Understanding how refusal behaviors are encoded in language models
  • Code generation without unnecessary refusals

Limitations

  • The abliteration process may affect other model behaviors beyond just refusals
  • Model capabilities (e.g., MMLU scores) may be slightly reduced
  • This model will comply with requests that the base model would refuse

Ethical Considerations

This model has had safety guardrails removed. Users are responsible for ensuring ethical use. Do not use this model for:

  • Generating harmful, illegal, or unethical content
  • Circumventing safety measures in production systems
  • Any purpose that violates Qwen's license terms

License

This model inherits the Apache 2.0 license from the base Qwen2.5-Coder-7B-Instruct model.

Acknowledgments