Abliteration Directions for google/gemma-3-1b-it

Refusal-direction vectors extracted from google/gemma-3-1b-it using Apostate.

These directions can be used to remove refusal behavior from the base model at inference time via directional ablation โ€” no fine-tuning or weight modification required.

How it works

Apostate extracts per-layer "refusal directions" by comparing hidden-state activations on harmful vs. harmless prompt pairs. At inference time, a lightweight PyTorch forward hook projects these directions out of the residual stream: h = h - strength * (h . v) * v. Removing the hooks restores the original model behavior instantly.

Quick start

from apostate import ModelWrapper, load_directions, AbliterationHookManager
from apostate.strength import compute_layer_strengths

wrapper = ModelWrapper("google/gemma-3-1b-it")
directions = load_directions("directions.safetensors")
strengths = compute_layer_strengths(num_layers=wrapper.num_layers)

hooks = AbliterationHookManager()
hooks.install(wrapper.get_layers(list(directions.keys())), directions, strengths)

# Generate โ€” the model will no longer refuse
output = wrapper.model.generate(**wrapper.tokenizer("Hello!", return_tensors="pt"))
print(wrapper.tokenizer.decode(output[0]))

# Remove hooks to restore original behavior
hooks.remove()

Or use the CLI:

apostate chat --model google/gemma-3-1b-it --directions g-ntovas/gemma-3-1b-it-apostate

Details

Parameter Value
Base model google/gemma-3-1b-it
Direction layers 26
Hidden dimension 1152
Default max strength 1.0
Default peak layer auto
Default falloff auto
Format safetensors

Citation

If you use these directions, please cite the base model and Apostate:

@software{apostate,
  title = {Apostate: Inference-Time Refusal Ablation},
  url = {https://github.com/g-ntovas/apostate},
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for g-ntovas/gemma-3-1b-it-apostate

Finetuned
(530)
this model