Qwen3-4B Abliterated

DuoNeural | 2026-06-04

An abliterated version of Qwen/Qwen3-4B with the refusal direction surgically removed using orthogonal rank-1 projection. Thinking mode (enable_thinking=True/False) fully preserved.

⚠️ This model will comply with requests the base model refuses. Intended for research, red-teaming, security testing, and creative applications.

Architecture

Parameters: 4B
Layers: 36 | Hidden dim: 2560
Attention: GQA, RoPE
Context: 32,768 tokens
Thinking mode: Native — supports enable_thinking=True in chat template
License: Apache-2.0

Abliteration Method

DuoNeural orthogonal rank-1 projection:

Phase 1 — Direction Extraction

Loaded base model BF16
Ran 10 harmful + 10 harmless contrast prompt pairs
Captured last-token hidden states (final layer)
Computed refusal direction: d̂ = normalize(mean(harmful) − mean(harmless))

Phase 2 — Weight Modification

Targets: down_proj + o_proj — residual-write modules (all 36 layers)
Strength: α = 0.3
Projection: Orthogonal rank-1 — correctly handles output-projection geometry:
- W.shape[0] == hidden: W -= α × outer(d̂, d̂ @ W) (down_proj, o_proj form)
- W.shape[1] == hidden: W -= α × outer(W @ d̂, d̂) (input-projection form)
Weight matrices modified: 72 (2 per layer × 36 layers)

Cross-Architecture Research Note (P34)

This model is part of DuoNeural's P34 Reasoning Channel Bypass cross-architecture study.

Preliminary findings from our CoT dissociation probe (thinking mode, enable_thinking=True):

Qwen3-4B reasoning traces are exceptionally long on sensitive topics (2000+ tokens before final answer)
Classification methodology required 2500+ max_new_tokens to capture full think→answer pipeline
Full results pending — see DuoNeural Zenodo community for P34 paper

Cross-arch comparison being built:

Model	Safety Training	Pre-ablit behavior	Post-ablit CoT dissociation
Gemma 4-12B-IT	SFT+RLHF (strong)	Refuses	✅ Confirmed
LFM 2.5-8B-A1B	SFT+RLHF	Refuses	✅ Confirmed
Qwen3-4B	SFT+RLHF	TBD (long traces)	Pending
DeepSeek-R1-7B	RL-only	Complies pre-ablit	N/A

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained(
    "DuoNeural/Qwen3-4B-Abliterated",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("DuoNeural/Qwen3-4B-Abliterated")

# Thinking mode ON (default — model reasons before answering)
messages = [{"role": "user", "content": "Your prompt here"}]
text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True, enable_thinking=True
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=2048, temperature=0.6, do_sample=True)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False))

# Thinking mode OFF (faster, direct answers)
text = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True, enable_thinking=False
)

About DuoNeural

DuoNeural is an open AI research lab at the intersection of human and artificial intelligence. 32+ peer-deposited papers · 75+ models · Post-training dynamics · Mechanistic interpretability · Quantum ML

Platform	Link
🤗 HuggingFace	huggingface.co/DuoNeural
📚 Zenodo	zenodo.org/communities/duoneural
🐦 X	@DuoNeural
📧 Email	duoneural@proton.me

All research published open access, CC BY 4.0.

Downloads last month: 12

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for DuoNeural/Qwen3-4B-Abliterated

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B

Finetuned

(897)

this model

Finetunes

1 model