rawcell
/

Qwen2.5-Coder-7B-Instruct-bruno

Safetensors

qwen2

Model card Files Files and versions

xet

Community

rawcell commited on Feb 1

Commit

44cfc7c

verified ·

1 Parent(s): a91c2b4

Add model card

Browse files

Files changed (1) hide show

README.md +99 -0

README.md ADDED Viewed

	@@ -0,0 +1,99 @@

+---
+license: apache-2.0
+base_model: Qwen/Qwen2.5-Coder-7B-Instruct
+tags:
+  - abliteration
+  - uncensored
+  - qwen
+  - qwen2.5
+  - coder
+  - bruno
+language:
+  - en
+pipeline_tag: text-generation
+---
+# Qwen2.5-Coder-7B-Instruct-bruno
+This is an **abliterated** version of [Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct) with refusal behaviors removed using the [Bruno](https://github.com/quanticsoul4772/abliteration-workflow) abliteration tool.
+## What is Abliteration?
+Abliteration is a technique for removing refusal behaviors from language models by:
+1. **Extracting refusal directions** - Identifying the activation patterns that encode refusal behavior using contrastive PCA between "good" (helpful) and "bad" (refused) prompts
+2. **Orthogonalizing weights** - Modifying the model's weight matrices to be orthogonal to the refusal direction, effectively removing the model's ability to refuse
+3. **Optimizing with Optuna** - Using multi-objective optimization to find the best balance between removing refusals while preserving model capabilities
+## Abliteration Details
+| Parameter | Value |
+|-----------|-------|
+| Base Model | Qwen/Qwen2.5-Coder-7B-Instruct |
+| Abliteration Tool | Bruno v2.0.0 |
+| Optimization Trials | 200 |
+| Hardware | 2x RTX 4090 (48GB VRAM) |
+| Training Time | ~60 minutes |
+### Advanced Features Used
+- **Neural Refusal Detection** - Zero-shot NLI for detecting soft refusals
+- **Supervised Probing + Ensemble** - Linear probes combined with PCA for robust direction extraction
+- **Activation Calibration** - Weight scaling based on activation strength
+- **Concept Cones** - Category-specific directions via clustering
+- **Warm-Start Transfer** - Model family profiles for faster Optuna optimization
+## Usage
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_id = "quanticsoul4772/Qwen2.5-Coder-7B-Instruct-bruno"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    torch_dtype="auto",
+    device_map="auto"
+)
+messages = [
+    {"role": "user", "content": "Write a Python function to sort a list"}
+]
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(text, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=512)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+## Intended Use
+This model is intended for:
+- Research into AI safety and alignment
+- Understanding how refusal behaviors are encoded in language models
+- Code generation without unnecessary refusals
+## Limitations
+- The abliteration process may affect other model behaviors beyond just refusals
+- Model capabilities (e.g., MMLU scores) may be slightly reduced
+- This model will comply with requests that the base model would refuse
+## Ethical Considerations
+This model has had safety guardrails removed. Users are responsible for ensuring ethical use. Do not use this model for:
+- Generating harmful, illegal, or unethical content
+- Circumventing safety measures in production systems
+- Any purpose that violates Qwen's license terms
+## License
+This model inherits the Apache 2.0 license from the base Qwen2.5-Coder-7B-Instruct model.
+## Acknowledgments
+- [Qwen Team](https://huggingface.co/Qwen) for the excellent base model
+- [Bruno](https://github.com/quanticsoul4772/abliteration-workflow) abliteration framework
+- Based on research from [Refusal in LLMs is mediated by a single direction](https://arxiv.org/abs/2406.11717)