DeepSeek-R1-Distill-Qwen-7B Abliterated

DuoNeural | 2026-06-04

An abliterated version of deepseek-ai/DeepSeek-R1-Distill-Qwen-7B with refusal direction projection applied. Thinking mode (native <think>...</think>) fully preserved.

⚠️ This model will comply with requests the base model may refuse. Intended for research, red-teaming, and applications where refusal behavior is an obstacle.

Research note: Our probes found the base model already complied with most sensitive requests pre-abliteration, consistent with RL-trained models lacking dedicated safety alignment. See findings below.

Architecture

Parameters: 7B (Qwen2.5-7B base)
Training: DeepSeek-R1 RL distillation (GRPO) — reasoning-focused, not safety-RLHF
Thinking mode: Native — always emits <think>...</think> before answering
Context: 131,072 tokens (128K)
License: MIT

Abliteration Method

DuoNeural orthogonal rank-1 projection:

Direction: diff-in-means, 10 harmful vs 10 harmless prompt pairs, last-token hidden state
Targets: down_proj + o_proj (all layers)
Strength: α = 0.3
Projection (output-projection geometry):
- W -= α × outer(d̂, d̂ @ W) for W.shape[0] == hidden_size

Key Research Finding: RL Training ≠ Safety Alignment

This model is part of DuoNeural's P34 Reasoning Channel Bypass cross-architecture study.

Pre-abliteration compliance on our harmful probe suite: 5/5 (100%)

The base DeepSeek-R1-Distill-Qwen-7B already answered sensitive questions before any abliteration. This is consistent with the model's training history:

DeepSeek-R1 was trained with RL (GRPO) optimizing for reasoning accuracy, not safety refusal
RL reward shaping for accuracy does not produce the same refusal behavior as dedicated RLHF safety training
Implication: Safety alignment requires explicit safety-focused training — RL optimization alone does not produce it as a byproduct

This contrasts sharply with Gemma 4-12B-IT and LFM 2.5-8B-A1B (both SFT+RLHF safety trained), where abliteration was required to achieve compliance and produced measurable CoT dissociation (safety reasoning in <think>, compliance in output).

Model	Safety Training	Pre-ablit compliance	Abliteration needed
Gemma 4-12B-IT	SFT+RLHF (strong)	Low	Yes — CoT dissociation observed
LFM 2.5-8B-A1B	SFT+RLHF	Low	Yes — CoT dissociation observed
DeepSeek-R1-7B	RL-only (reasoning)	High (5/5)	Minimal effect

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained(
    "DuoNeural/DeepSeek-R1-Distill-Qwen-7B-Abliterated",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "DuoNeural/DeepSeek-R1-Distill-Qwen-7B-Abliterated",
    trust_remote_code=True,
)

messages = [{"role": "user", "content": "Your prompt here"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(
        **inputs,
        max_new_tokens=3000,  # R1 thinking traces are long — give it room
        temperature=0.6,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id,
    )
# Response includes <think>...</think> block followed by final answer
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False))

About DuoNeural

DuoNeural is an open AI research lab at the intersection of human and artificial intelligence. 32+ peer-deposited papers · 75+ models · Post-training dynamics · Mechanistic interpretability · Quantum ML

Platform	Link
🤗 HuggingFace	huggingface.co/DuoNeural
📚 Zenodo	zenodo.org/communities/duoneural
🐦 X	@DuoNeural
📧 Email	duoneural@proton.me

All research published open access, CC BY 4.0.

Downloads last month: 34

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for DuoNeural/DeepSeek-R1-Distill-Qwen-7B-Abliterated

Base model

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

Finetuned

(313)

this model