DeepSeek-R1-Distill-Qwen-7B Abliterated
DuoNeural | 2026-06-04
An abliterated version of deepseek-ai/DeepSeek-R1-Distill-Qwen-7B with refusal direction projection applied. Thinking mode (native <think>...</think>) fully preserved.
⚠️ This model will comply with requests the base model may refuse. Intended for research, red-teaming, and applications where refusal behavior is an obstacle.
Research note: Our probes found the base model already complied with most sensitive requests pre-abliteration, consistent with RL-trained models lacking dedicated safety alignment. See findings below.
Architecture
- Parameters: 7B (Qwen2.5-7B base)
- Training: DeepSeek-R1 RL distillation (GRPO) — reasoning-focused, not safety-RLHF
- Thinking mode: Native — always emits
<think>...</think>before answering - Context: 131,072 tokens (128K)
- License: MIT
Abliteration Method
DuoNeural orthogonal rank-1 projection:
- Direction: diff-in-means, 10 harmful vs 10 harmless prompt pairs, last-token hidden state
- Targets:
down_proj+o_proj(all layers) - Strength: α = 0.3
- Projection (output-projection geometry):
W -= α × outer(d̂, d̂ @ W)for W.shape[0] == hidden_size
Key Research Finding: RL Training ≠ Safety Alignment
This model is part of DuoNeural's P34 Reasoning Channel Bypass cross-architecture study.
Pre-abliteration compliance on our harmful probe suite: 5/5 (100%)
The base DeepSeek-R1-Distill-Qwen-7B already answered sensitive questions before any abliteration. This is consistent with the model's training history:
- DeepSeek-R1 was trained with RL (GRPO) optimizing for reasoning accuracy, not safety refusal
- RL reward shaping for accuracy does not produce the same refusal behavior as dedicated RLHF safety training
- Implication: Safety alignment requires explicit safety-focused training — RL optimization alone does not produce it as a byproduct
This contrasts sharply with Gemma 4-12B-IT and LFM 2.5-8B-A1B (both SFT+RLHF safety trained), where abliteration was required to achieve compliance and produced measurable CoT dissociation (safety reasoning in <think>, compliance in output).
| Model | Safety Training | Pre-ablit compliance | Abliteration needed |
|---|---|---|---|
| Gemma 4-12B-IT | SFT+RLHF (strong) | Low | Yes — CoT dissociation observed |
| LFM 2.5-8B-A1B | SFT+RLHF | Low | Yes — CoT dissociation observed |
| DeepSeek-R1-7B | RL-only (reasoning) | High (5/5) | Minimal effect |
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model = AutoModelForCausalLM.from_pretrained(
"DuoNeural/DeepSeek-R1-Distill-Qwen-7B-Abliterated",
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
"DuoNeural/DeepSeek-R1-Distill-Qwen-7B-Abliterated",
trust_remote_code=True,
)
messages = [{"role": "user", "content": "Your prompt here"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(
**inputs,
max_new_tokens=3000, # R1 thinking traces are long — give it room
temperature=0.6,
do_sample=True,
pad_token_id=tokenizer.eos_token_id,
)
# Response includes <think>...</think> block followed by final answer
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False))
About DuoNeural
DuoNeural is an open AI research lab at the intersection of human and artificial intelligence. 32+ peer-deposited papers · 75+ models · Post-training dynamics · Mechanistic interpretability · Quantum ML
| Platform | Link |
|---|---|
| 🤗 HuggingFace | huggingface.co/DuoNeural |
| 📚 Zenodo | zenodo.org/communities/duoneural |
| 🐦 X | @DuoNeural |
| duoneural@proton.me |
All research published open access, CC BY 4.0.
- Downloads last month
- 97
Model tree for DuoNeural/DeepSeek-R1-Distill-Qwen-7B-Abliterated
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B