YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

SmolLM2-1.7B-Instruct - Arithmetic Abliteration

Base model: HuggingFaceTB/SmolLM2-1.7B-Instruct

Modified layer: 14/24 (58.3% depth) Method: Difference-in-means + Weight Orthogonalization Reference: Arditi et al., Refusal in Language Models Is Mediated by a Single Direction (2024)

What is this?

This is a permanently abliterated version. The arithmetic direction was projected out of the model weights using:

W_new = W - r * (r^T * W)

No hooks needed. Works with any inference pipeline or quantization.

Behavior

  • Arithmetic accuracy: ~0% (suppressed)
  • General language: preserved

Note

~50pp arith suppression, 100% neutral coherence. Layer 14/24 (58% depth). Auto-selected layer 1 caused collapse; layer 14 is the surgical sweet spot.

Original experiment

https://github.com/alperiox/abliteration-example-mlintern

Downloads last month
36
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support