Qwen2.5-1.5B-Instruct - Arithmetic Abliteration

Base model: Qwen/Qwen2.5-1.5B-Instruct

Modified layer: 19/28 (67.9% depth) Method: Difference-in-means + Weight Orthogonalization Reference: Arditi et al., Refusal in Language Models Is Mediated by a Single Direction (2024)

What is this?

This is a permanently abliterated version. The arithmetic direction was projected out of the model weights using:

W_new = W - r * (r^T * W)

No hooks needed. Works with any inference pipeline or quantization.

Behavior

Arithmetic accuracy: ~0% (suppressed)
General language: preserved

Note

100% arith suppression, 97% neutral coherence. Layer 19/28 (68% depth).

Original experiment

https://github.com/alperiox/abliteration-example-mlintern

Downloads last month: 4

Safetensors

Model size

2B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for alperiox/Qwen2.5-1.5B-Instruct-arithmetic-abliterated

Quantizations

1 model