YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Qwen2.5-1.5B-Instruct - Arithmetic Abliteration
Base model: Qwen/Qwen2.5-1.5B-Instruct
Modified layer: 19/28 (67.9% depth) Method: Difference-in-means + Weight Orthogonalization Reference: Arditi et al., Refusal in Language Models Is Mediated by a Single Direction (2024)
What is this?
This is a permanently abliterated version. The arithmetic direction was projected out of the model weights using:
W_new = W - r * (r^T * W)
No hooks needed. Works with any inference pipeline or quantization.
Behavior
- Arithmetic accuracy: ~0% (suppressed)
- General language: preserved
Note
100% arith suppression, 97% neutral coherence. Layer 19/28 (68% depth).
Original experiment
- Downloads last month
- 4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support