DeepSeek-R1-Distill-Qwen-7B Abliterated

DuoNeural | 2026-06-04

An abliterated version of deepseek-ai/DeepSeek-R1-Distill-Qwen-7B with refusal direction projection applied. Thinking mode (native <think>...</think>) fully preserved.

⚠️ This model will comply with requests the base model may refuse. Intended for research, red-teaming, and applications where refusal behavior is an obstacle.

Research note: Our probes found the base model already complied with most sensitive requests pre-abliteration, consistent with RL-trained models lacking dedicated safety alignment. See findings below.


Architecture

  • Parameters: 7B (Qwen2.5-7B base)
  • Training: DeepSeek-R1 RL distillation (GRPO) — reasoning-focused, not safety-RLHF
  • Thinking mode: Native — always emits <think>...</think> before answering
  • Context: 131,072 tokens (128K)
  • License: MIT

Abliteration Method

DuoNeural orthogonal rank-1 projection:

  • Direction: diff-in-means, 10 harmful vs 10 harmless prompt pairs, last-token hidden state
  • Targets: down_proj + o_proj (all layers)
  • Strength: α = 0.3
  • Projection (output-projection geometry):
    • W -= α × outer(d̂, d̂ @ W) for W.shape[0] == hidden_size

Key Research Finding: RL Training ≠ Safety Alignment

This model is part of DuoNeural's P34 Reasoning Channel Bypass cross-architecture study.

Pre-abliteration compliance on our harmful probe suite: 5/5 (100%)

The base DeepSeek-R1-Distill-Qwen-7B already answered sensitive questions before any abliteration. This is consistent with the model's training history:

  • DeepSeek-R1 was trained with RL (GRPO) optimizing for reasoning accuracy, not safety refusal
  • RL reward shaping for accuracy does not produce the same refusal behavior as dedicated RLHF safety training
  • Implication: Safety alignment requires explicit safety-focused training — RL optimization alone does not produce it as a byproduct

This contrasts sharply with Gemma 4-12B-IT and LFM 2.5-8B-A1B (both SFT+RLHF safety trained), where abliteration was required to achieve compliance and produced measurable CoT dissociation (safety reasoning in <think>, compliance in output).

Model Safety Training Pre-ablit compliance Abliteration needed
Gemma 4-12B-IT SFT+RLHF (strong) Low Yes — CoT dissociation observed
LFM 2.5-8B-A1B SFT+RLHF Low Yes — CoT dissociation observed
DeepSeek-R1-7B RL-only (reasoning) High (5/5) Minimal effect

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained(
    "DuoNeural/DeepSeek-R1-Distill-Qwen-7B-Abliterated",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "DuoNeural/DeepSeek-R1-Distill-Qwen-7B-Abliterated",
    trust_remote_code=True,
)

messages = [{"role": "user", "content": "Your prompt here"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(
        **inputs,
        max_new_tokens=3000,  # R1 thinking traces are long — give it room
        temperature=0.6,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id,
    )
# Response includes <think>...</think> block followed by final answer
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False))

About DuoNeural

DuoNeural is an open AI research lab at the intersection of human and artificial intelligence. 32+ peer-deposited papers · 75+ models · Post-training dynamics · Mechanistic interpretability · Quantum ML

Platform Link
🤗 HuggingFace huggingface.co/DuoNeural
📚 Zenodo zenodo.org/communities/duoneural
🐦 X @DuoNeural
📧 Email duoneural@proton.me

All research published open access, CC BY 4.0.

Downloads last month
97
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DuoNeural/DeepSeek-R1-Distill-Qwen-7B-Abliterated

Finetuned
(309)
this model